Yao Dou (@yaooo01) 's Twitter Profile
Yao Dou

@yaooo01

PhD student @GeorgiaTech, previously @uwnlp, @allen_ai.

ID: 912061597413122048

linkhttps://yao-dou.github.io/ calendar_today24-09-2017 21:10:04

50 Tweet

211 Takipçi

287 Takip Edilen

Yang Chen (@ychennlp) 's Twitter Profile Photo

If you are interested in multimodal RAG, check out our new paper! - We propose a unified multimodal retriever trained with instruction-tuning on 8 tasks (🖼️image/📝text query -> target) arxiv.org/abs/2311.17136 #RAG #multimodal

Yang Chen (@ychennlp) 's Twitter Profile Photo

Do you know GPT-4o is super good at image geolocate🌎? I've been taking random photos and testing it. The model just keep giving me surprise. What does this mean to user privacy when they post photos on social media? Check out our new paper and benchmark GPTGeoChat!👇

Do you know GPT-4o is super good at image geolocate🌎? I've been taking random photos and testing it. The model just keep giving me surprise.

What does this mean to user privacy when they post photos on social media?

Check out our new paper and benchmark GPTGeoChat!👇
Alan Ritter (@alan_ritter) 's Twitter Profile Photo

Check out our recent work that aims to protect privacy by helping people make more informed decisions about what they share online. Drop by our poster tomorrow morning at 11am if you are at ACL in Bankok. ACL 2025 arxiv.org/abs/2311.09538

Check out our recent work that aims to protect privacy by helping people make more informed decisions about what they share online.

Drop by our poster tomorrow morning at 11am if you are at ACL in Bankok.

<a href="/aclmeeting/">ACL 2025</a>
arxiv.org/abs/2311.09538
Wei Xu (@cocoweixu) 's Twitter Profile Photo

Excited to be in Thailand ☀️😎 🌺🥥 to attend #ACL2024! If you run into my PhD students from Georgia Tech NLP group - Yao Dou Yao Dou, Tarek Naous Tarek Naous, and Jonathan Zheng Jonathan Zheng, please say hi! 👋 Please also say hi to my other collaborators!

Excited to be in Thailand ☀️😎 🌺🥥 to attend #ACL2024!

If you run into my PhD students from Georgia Tech NLP group - Yao Dou <a href="/Yaooo01/">Yao Dou</a>, Tarek Naous <a href="/tareknaous/">Tarek Naous</a>, and Jonathan Zheng <a href="/JonathanQZheng/">Jonathan Zheng</a>, please say hi! 👋    

Please also say hi to my other collaborators!
Mu Cai (@mucai7) 's Twitter Profile Photo

1/N) Are current large multimodal models like #GPT4o really good at video understanding? 🚀 We are thrilled to introduce TemporalBench to examine temporal dynamics understanding for LMMs! Our TemporalBench reveals even the SOTA LMM #GPT4o achieves only 38.5, far from

1/N)  Are current large multimodal models like #GPT4o really good at video understanding?

🚀 We are thrilled to introduce TemporalBench to examine temporal dynamics understanding for LMMs! 

Our TemporalBench reveals even the SOTA LMM  #GPT4o achieves only 38.5, far from
Mu Cai (@mucai7) 's Twitter Profile Photo

Now TemporalBench is fully public! See how your video understanding model performs on TemporalBench before CVPR! 🤗 Dataset: huggingface.co/datasets/micro… 📎 Integrated to lmms-eval (systematic eval): github.com/EvolvingLMMs-L… (great work by Chunyuan Li Yuanhan (John) Zhang ) 📗 Our

Now TemporalBench is fully public! See how your video understanding model performs on TemporalBench before CVPR! 

🤗 Dataset: huggingface.co/datasets/micro…
📎 Integrated to lmms-eval (systematic eval): github.com/EvolvingLMMs-L… (great work by <a href="/ChunyuanLi/">Chunyuan Li</a> <a href="/zhang_yuanhan/">Yuanhan (John) Zhang</a> )
📗 Our
Tarek Naous (@tareknaous) 's Twitter Profile Photo

What causes entity-related cultural biases in LMs? Is it just pre-training data? Our latest paper shows how varying linguistic phenomena exhibited by entities (such as word sense in Arabic) impact the cross-cultural performance of LMs. arxiv.org/abs/2501.04662

What causes entity-related cultural biases in LMs? Is it just pre-training data?

Our latest paper shows how varying linguistic phenomena exhibited by entities (such as word sense in Arabic) impact the cross-cultural performance of LMs.

arxiv.org/abs/2501.04662
Yang Chen (@ychennlp) 's Twitter Profile Photo

excited to introduce AceMath, a series of math LLMs and reward models 🤗ckpt and training data are on huggingface: huggingface.co/collections/nv… paper: arxiv.org/abs/2412.15084 accelerating

Ethan Mendes (@ethanmendes3) 's Twitter Profile Photo

🚨New Paper: Better search for reasoning (e.g., web tasks) usually requires costly💰demos/rewards What if we only self-improve LLMs on state transitions—capturing a classic RL method in natural language? Spoiler: It works (⬆️39% over base model) & enables efficient search!🚀 🧵

🚨New Paper: Better search for reasoning (e.g., web tasks) usually requires costly💰demos/rewards

What if we only self-improve LLMs on state transitions—capturing a classic RL method in natural language?

Spoiler: It works (⬆️39% over base model) &amp; enables efficient search!🚀
🧵
Jonathan Zheng (@jonathanqzheng) 's Twitter Profile Photo

🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task! Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts Unlike conventional math and logical reasoning, this is difficult for both humans and AI models. 1/7

🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task!

Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts

Unlike conventional math and logical reasoning, this is difficult for both humans and AI models.

1/7
Kai Zhang (@drogokhal4) 's Twitter Profile Photo

🚀Big WebDreamer update! We train 💭Dreamer-7B, a small but strong world model for real-world web planning. 💥Beats Qwen2-72B ⚖️Matches #GPT-4o Trained on 3M synthetic examples — and yes, all data + models are open-sourced.

🚀Big WebDreamer  update!
We train 💭Dreamer-7B, a small but strong world model for real-world web planning.
💥Beats Qwen2-72B
⚖️Matches #GPT-4o
Trained on 3M synthetic examples — and yes, all data + models are open-sourced.
Agam A. Shah (@shahagam4) 's Twitter Profile Photo

Thrilled to share our new preprint: "Beyond the Reported Cutoff: Where LLMs Fall Short on Financial Knowledge" We evaluated 197,011 revenue questions across 17,621 U.S. companies (1980–2022) using 6 top LLMs. Key insights 🧵

Thrilled to share our new preprint: 
 "Beyond the Reported Cutoff: Where LLMs Fall Short on Financial Knowledge" 

We evaluated 197,011 revenue questions across 17,621 U.S. companies (1980–2022) using 6 top LLMs.

Key insights 🧵
Yang Chen (@ychennlp) 's Twitter Profile Photo

Had a lot of fun to scale up RL to improve math reasoning! Excited to introduce AceMath-RL-Nemotron-7B with a scalable training recipe 📑Full blog: research.nvidia.com/labs/adlr/acem… 🔗Model: huggingface.co/nvidia/AceMath…

Wei Xu (@cocoweixu) 's Twitter Profile Photo

I am giving a keynote at PrivateNLP Workshop (sites.google.com/view/privatenl…) at #NAACL2025 (Sunday 9am CT). * GPT4-v is a performant geolocator, predicting exact GPS coordinates of image > any SOTA * LLMs can estimate privacy risk based on probabilistic reasoning > chain-of-thoughts

I am giving a keynote at PrivateNLP Workshop (sites.google.com/view/privatenl…) at #NAACL2025 (Sunday 9am CT). 

* GPT4-v is a performant geolocator, predicting exact GPS coordinates of image &gt; any SOTA
* LLMs can estimate privacy risk based on probabilistic reasoning &gt; chain-of-thoughts
Yang Chen (@ychennlp) 's Twitter Profile Photo

with just math-RL, AceReason-Nemotron-14B surpass DeepCoder-14B on LiveCodeBench v5. we then did code-RL and found training becomes so much easier

with just math-RL, AceReason-Nemotron-14B surpass DeepCoder-14B on LiveCodeBench v5. 
we then did code-RL and found training becomes so much easier
Yang Chen (@ychennlp) 's Twitter Profile Photo

Does RL incentive reasoning capability over the starting SFT model? We show an interesting result with our recent published AceReason-Nemotron-7B model, which was trained with RL pass@K from 1 to 1024 consistently +10% on LiveCodeBench v6 perhaps scaling RL is the key