Vijay V. (@vijaytarian) 's Twitter Profile
Vijay V.

@vijaytarian

Grad student at CMU. I do research on applied NLP. he/him

ID: 31239481

linkhttp://www.cs.cmu.edu/~vijayv/ calendar_today14-04-2009 21:56:20

1,1K Tweet

580 Followers

468 Following

Seungone Kim @ NAACL2025 (@seungonekim) 's Twitter Profile Photo

#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.

#NLProc 
Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? 

Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.
Seungone Kim @ NAACL2025 (@seungonekim) 's Twitter Profile Photo

🌟Our results show that LMs have distinct strengths! For example, while GPT-4o excels at generating new instances, Claude-3.5-Sonnet is better at refining existing instances. 🤯We also observe unexpected results that in some cases, LMs with stronger problem-solving abilities do

🌟Our results show that LMs have distinct strengths! For example, while GPT-4o excels at generating new instances, Claude-3.5-Sonnet is better at refining existing instances.

🤯We also observe unexpected results that in some cases, LMs with stronger problem-solving abilities do
Huan Sun (OSU) (@hhsun1) 's Twitter Profile Photo

I was extremely fortunate to recruit @Xiangyue96 as my Ph.D. student in 2018 and witness his remarkable growth into a rising star in NLP and AI. You might know him for his recent contributions like MMMU and MAmmoTH. But to me, long before these influential projects, Xiang

Danish Pruthi (@danish037) 's Twitter Profile Photo

At #ICML2025, introducing STAMP. A simple approach to verify whether your content (e.g., a dataset) is a part of the data used for training language models. ⤵️

Graham Neubig (@gneubig) 's Twitter Profile Photo

Yuchen Jin They didn't evaluate on 23 of the 500 instances though, so the actual score is: 74.9 * (500 - 23) / 500 = 71.4%, which is a few points below Claude Sonnet 4.

jack morris (@jxmnop) 's Twitter Profile Photo

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019.  they recently released GPT-OSS, which is reasoning-only...

or is it? 

turns out that underneath the surface, there is still a strong base model. so we extracted it.

introducing gpt-oss-20b-base 🧵