Ethan Mendes (@ethanmendes3) Twitter Tweets • TwiCopy

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

9 months ago

Language Models can Self-Improve at State-Value Estimation for Better Search

thumb_up_off_alt31

chat_bubble_outline1

repeat6

shareShare

Alan Ritter

@alan_ritter

9 months ago

Very excited about this new work by Ethan Mendes on self-imprving state value estimation for more efficient search without labels or rewards.

thumb_up_off_alt20

chat_bubble_outline0

repeat3

shareShare

LLMs use fixed strategies for all questions. This is inefficient for complex reasoning. This paper introduces self-taught lookahead (STL). It improves value estimation in language models by learning from state transitions without ground truth rewards. 📌 Self-taught lookahead

thumb_up_off_alt18

chat_bubble_outline0

repeat4

shareShare

Jonathan Zheng

@jonathanqzheng

8 months ago

🚨o3-mini vastly outperforms DeepSeek-R1 on an unseen probabilistic reasoning task! Introducing k-anonymity estimation: a novel task to assess privacy risks in sensitive texts Unlike conventional math and logical reasoning, this is difficult for both humans and AI models. 1/7

thumb_up_off_alt18

chat_bubble_outline1

repeat9

shareShare

Alan Ritter

@alan_ritter

8 months ago

Want to learn about Llama's pre-training? Mike Lewis will be giving a Keynote at NAACL 2025 in Albuquerque, NM on May 1. 2025.naacl.org NAACL HLT 2025

Want to learn about Llama's pre-training? Mike Lewis will be giving a Keynote at NAACL 2025 in Albuquerque, NM on May 1.
2025.naacl.org
<a href="/naaclmeeting/">NAACL HLT 2025</a>

thumb_up_off_alt274

chat_bubble_outline3

repeat24

shareShare

Alan Ritter

@alan_ritter

7 months ago

Wondering what review scores you need to get accepted at ACL? Maybe this data from NAACL 2025 can help: gist.github.com/aritter/8b65a9…

thumb_up_off_alt77

chat_bubble_outline3

repeat13

shareShare

Jungsoo Park

@jungsoo___park

6 months ago

Excited to share our paper on uncovering prompting patterns in Frontier LLMs using automated meta-analysis at ACL 2025 #ACL2025

thumb_up_off_alt10

chat_bubble_outline1

repeat2

shareShare

Yang Chen

@ychennlp

5 months ago

📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models. The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks. ✅AIME2025 (math): 53.6% -> 64.8% ✅LiveCodeBench

thumb_up_off_alt120

chat_bubble_outline4

repeat28

shareShare

Geyang Guo

@cherylolguo

5 months ago

❤️🌎 Introducing CARE: Multilingual Multicultural Human Preference Learning 3490 culturally relevant prompts + 31.7k Human/AI-written responses rated by multilingual speakers 💡 Key insights: - Even a small amount of cultural data improves popular LLMs consistently. - Deepseek-v3

thumb_up_off_alt61

chat_bubble_outline2

repeat14

shareShare

Alan Ritter

@alan_ritter

4 months ago

🎉 Excited to see that our paper on cost-efficient data annotation for LLMs won an SAC Highlight Award! 🔗 Check out @mohit_rag18's work here: aclanthology.org/2025.acl-long.…

thumb_up_off_alt41

chat_bubble_outline1

repeat4

shareShare

Mohit

@mohit_r9a

4 months ago

Unfortunately, I had to miss out on attending in person in Vienna, but glad to see the recognition. We need more research on understanding data and posttraining of LLMs. Always a pleasure working with Alan and Junmo Kang

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Jungsoo Park

@jungsoo___park

2 months ago

What if LLMs can forecast their own scores on unseen benchmarks from just a task description? We are the first to study text description→performance prediction, giving practitioners an early read on outcomes so they can plan what to build—before paying full price 💸

thumb_up_off_alt26

chat_bubble_outline3

repeat7

shareShare

Ethan Mendes

@ethanmendes3

2 months ago

New work led by Jungsoo Park with a really surprising result: LLMs can predict their performance on a benchmark using only a natural language description of the task

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Ethan Mendes

𝚐𝔪𝟾𝚡𝚡𝟾

Alan Ritter

Rohan Paul

Jonathan Zheng

Alan Ritter

Alan Ritter

Jungsoo Park

Yang Chen

Geyang Guo

Alan Ritter

Mohit

Jungsoo Park

Ethan Mendes