Ethan Mendes (@ethanmendes3) 's Twitter Profile
Ethan Mendes

@ethanmendes3

ML PhD at @GeorgiaTech.

ID: 1334014171642523648

linkhttps://ethanm88.github.io/ calendar_today02-12-2020 05:59:06

48 Tweet

107 Followers

425 Following

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

LLMs use fixed strategies for all questions. This is inefficient for complex reasoning. This paper introduces self-taught lookahead (STL). It improves value estimation in language models by learning from state transitions without ground truth rewards. 📌 Self-taught lookahead

LLMs use fixed strategies for all questions.

This is inefficient for complex reasoning. This paper introduces self-taught lookahead (STL). It improves value estimation in language models by learning from state transitions without ground truth rewards.

📌 Self-taught lookahead
Jonathan Zheng (@jonathanqzheng) 's Twitter Profile Photo

🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task! Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts Unlike conventional math and logical reasoning, this is difficult for both humans and AI models. 1/7

🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task!

Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts

Unlike conventional math and logical reasoning, this is difficult for both humans and AI models.

1/7
Alan Ritter (@alan_ritter) 's Twitter Profile Photo

Want to learn about Llama's pre-training? Mike Lewis will be giving a Keynote at NAACL 2025 in Albuquerque, NM on May 1. 2025.naacl.org NAACL HLT 2025

Want to learn about Llama's pre-training?  Mike Lewis will be giving a Keynote at NAACL 2025 in Albuquerque, NM on May 1.
2025.naacl.org
<a href="/naaclmeeting/">NAACL HLT 2025</a>
Alan Ritter (@alan_ritter) 's Twitter Profile Photo

Wondering what review scores you need to get accepted at ACL? Maybe this data from NAACL 2025 can help: gist.github.com/aritter/8b65a9…

Yang Chen (@ychennlp) 's Twitter Profile Photo

📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models. The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks. ✅AIME2025 (math): 53.6% -> 64.8% ✅LiveCodeBench

📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models.

The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks.

✅AIME2025 (math): 53.6% -&gt; 64.8%
✅LiveCodeBench
Geyang Guo (@cherylolguo) 's Twitter Profile Photo

❤️🌎 Introducing CARE: Multilingual Multicultural Human Preference Learning 3490 culturally relevant prompts + 31.7k Human/AI-written responses rated by multilingual speakers 💡 Key insights: - Even a small amount of cultural data improves popular LLMs consistently. - Deepseek-v3

❤️🌎 Introducing CARE: Multilingual Multicultural Human Preference Learning
3490 culturally relevant prompts + 31.7k Human/AI-written responses rated by multilingual speakers
💡 Key insights:
- Even a small amount of cultural data improves popular LLMs consistently.
- Deepseek-v3
Alan Ritter (@alan_ritter) 's Twitter Profile Photo

🎉 Excited to see that our paper on cost-efficient data annotation for LLMs won an SAC Highlight Award! 🔗 Check out @mohit_rag18's work here: aclanthology.org/2025.acl-long.…

Mohit (@mohit_r9a) 's Twitter Profile Photo

Unfortunately, I had to miss out on attending in person in Vienna, but glad to see the recognition. We need more research on understanding data and posttraining of LLMs. Always a pleasure working with Alan and Junmo Kang

Jungsoo Park (@jungsoo___park) 's Twitter Profile Photo

What if LLMs can forecast their own scores on unseen benchmarks from just a task description? We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸

What if LLMs can forecast their own scores on unseen benchmarks from just a task description?

We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸
Ethan Mendes (@ethanmendes3) 's Twitter Profile Photo

New work led by Jungsoo Park with a really surprising result: LLMs can predict their performance on a benchmark using only a natural language description of the task