Junlin Wang (@junlinwang3) Twitter Tweets • TwiCopy

Linda He

7 months ago

Excited to share our work on scaling LLMs to handle million-token contexts! Training models for ultra-long sequences is challenging due to data scarcity. We introduce a novel hierarchical synthetic data generation pipeline to overcome this. Thrilled this will be presented at ICLR

thumb_up_off_alt235

chat_bubble_outline16

repeat46

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

7 months ago

Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods "This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models on challenging reasoning tasks." "Non-reasoning models

thumb_up_off_alt214

chat_bubble_outline2

repeat36

shareShare

Together AI

@togethercompute

6 months ago

🚀 Introducing Mixture-of-Agents Alignment (MoAA), a new method to "distill" the collective intelligence of open-source LLMs into a single, efficient model. MoAA outperforms GPT-4o as a teacher, boosting smaller models like Llama3.1-8B to rival models 10x their size!

thumb_up_off_alt30

chat_bubble_outline2

repeat10

shareShare

James Zou

@james_y_zou

6 months ago

Our new #icml2025 paper w/Together AI shows how to use synthetic data from Mixture-of-Agents to boost LM fine-tuning + RL. Turns out a mixture of small agents is much more effective/cheaper than using a large LM as teacher 🌐together.ai/blog/moaa 📜arxiv.org/abs/2505.03059

Our new #icml2025 paper w/<a href="/togethercompute/">Together AI</a> shows how to use synthetic data from Mixture-of-Agents to boost LM fine-tuning + RL.

Turns out a mixture of small agents is much more effective/cheaper than using a large LM as teacher
🌐together.ai/blog/moaa
📜arxiv.org/abs/2505.03059

thumb_up_off_alt110

chat_bubble_outline7

repeat18

shareShare

Junlin Wang

@junlinwang3

6 months ago

Curious why backtracking works so well? Dive into our latest study that unpacks the mechanisms behind its success.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Zach Xu

@nehzux

5 months ago

LLMs are getting more powerful, but they still struggle with super long documents. A common trick is "Divide and Conquer" - chop it up, process chunks, and combine. But... when does this actually work? And when does it fail catastrophically? We investigated. 🧵

thumb_up_off_alt10

chat_bubble_outline1

repeat5

shareShare

Junlin Wang

@junlinwang3

4 months ago

Work done during my internship at Together AI is being presented at #icml25. Come and check it out! We propose a new model alignment pipeline that harness collective intelligence from open-source llms!

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Together AI

@togethercompute

4 months ago

Most AI benchmarks test the past. But real intelligence is about predicting the future. Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with Hugging Face 🔍 Reasoning > memorization 📊 Real-world events 🧠 Dynamic,

thumb_up_off_alt89

chat_bubble_outline5

repeat17

shareShare

Sanxing Chen

@sanxing_chen

2 months ago

Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,

thumb_up_off_alt25

chat_bubble_outline1

repeat12

shareShare

Junlin Wang

@junlinwang3

2 months ago

takeaway: pretraining and fientuning still have merit. Not everything is learned by interactuon and feedback. Although I do think llm can be more bitter lession pilled. Can we perhaps: 1) generate infinite meaningful synthetic data 2) continur trainin even at test tme?

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Fan Nie

@fannie1208

2 months ago

Excited to share our #COLM2025 paper: “Weak-for-Strong (W4S): Training a Weak Meta-Agent to Harness Strong Executors.” How can we unleash the potential of powerful LLMs without directly fine-tuning them? We train a weak 7B meta-agent 🤖 to design and optimize workflows that

thumb_up_off_alt190

chat_bubble_outline9

repeat35

shareShare

Kaitlyn Zhou ✈️ CSCW, EMNLP!

@kaitlynzhou

a month ago

As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. arxiv.org/abs/2510.15951

thumb_up_off_alt81

chat_bubble_outline1

repeat36

shareShare

Roy Xie

@royxie_

a month ago

🤔Deepseek-OCR shows the potential of optical context compression for LLMs. But maybe LLMs do not need that much context to begin with! Check out our recent NeurIPS paper, "Language Models (Mostly) Know When to Stop Reading," which reduces the context needed to answer a query👇

thumb_up_off_alt7

chat_bubble_outline1

repeat3

shareShare

Kaitlyn Zhou ✈️ CSCW, EMNLP!

@kaitlynzhou

19 days ago

No better time to learn about that #AI thing everyone's talking about... 📢 I'm recruiting PhD students in Computer Science or Information Science Cornell Bowers Computing and Information Science! If you're interested, apply to either department (yes, either program!) and list me as a potential advisor!

No better time to learn about that #AI thing everyone's talking about...

📢 I'm recruiting PhD students in Computer Science or Information Science <a href="/Cornell_Bowers/">Cornell Bowers Computing and Information Science</a>!

If you're interested, apply to either department (yes, either program!) and list me as a potential advisor!

thumb_up_off_alt528

chat_bubble_outline17

repeat104

shareShare

Junlin Wang

@junlinwang3

15 days ago

Just had this realization that codex and Claude code are super strong baseline agents that’s easy to evaluate on any task. Should we start include them as baselines for our papers?

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare