Junlin Wang (@junlinwang3) 's Twitter Profile
Junlin Wang

@junlinwang3

PhD Student @duke_nlp. Interning at @togethercompute. Inference-time scaling, multi-agent systems

ID: 2224217683

linkhttp://junlinwang.com calendar_today01-12-2013 04:34:47

35 Tweet

185 Followers

225 Following

Linda He (@lindahe49140661) 's Twitter Profile Photo

Excited to share our work on scaling LLMs to handle million-token contexts! Training models for ultra-long sequences is challenging due to data scarcity. We introduce a novel hierarchical synthetic data generation pipeline to overcome this. Thrilled this will be presented at ICLR

Excited to share our work on scaling LLMs to handle million-token contexts! Training models for ultra-long sequences is challenging due to data scarcity. We introduce a novel hierarchical synthetic data generation pipeline to overcome this. Thrilled this will be presented at ICLR
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods "This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models on challenging reasoning tasks." "Non-reasoning models

Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods

"This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models on challenging reasoning tasks."

"Non-reasoning models
Together AI (@togethercompute) 's Twitter Profile Photo

🚀 Introducing Mixture-of-Agents Alignment (MoAA), a new method to "distill" the collective intelligence of open-source LLMs into a single, efficient model. MoAA outperforms GPT-4o as a teacher, boosting smaller models like Llama3.1-8B to rival models 10x their size!

🚀 Introducing Mixture-of-Agents Alignment (MoAA), a new method to "distill" the collective intelligence of open-source LLMs into a single, efficient model.

MoAA outperforms GPT-4o as a teacher, boosting smaller models like Llama3.1-8B to rival models 10x their size!
James Zou (@james_y_zou) 's Twitter Profile Photo

Our new #icml2025 paper w/Together AI shows how to use synthetic data from Mixture-of-Agents to boost LM fine-tuning + RL. Turns out a mixture of small agents is much more effective/cheaper than using a large LM as teacher 🌐together.ai/blog/moaa 📜arxiv.org/abs/2505.03059

Our new #icml2025 paper w/<a href="/togethercompute/">Together AI</a> shows how to use synthetic data from Mixture-of-Agents to boost LM fine-tuning + RL.

Turns out a mixture of small agents is much more effective/cheaper than using a large LM as teacher
🌐together.ai/blog/moaa
📜arxiv.org/abs/2505.03059
Zach Xu (@nehzux) 's Twitter Profile Photo

LLMs are getting more powerful, but they still struggle with super long documents. A common trick is "Divide and Conquer" - chop it up, process chunks, and combine. But... when does this actually work? And when does it fail catastrophically? We investigated. 🧵

Junlin Wang (@junlinwang3) 's Twitter Profile Photo

Work done during my internship at Together AI is being presented at #icml25. Come and check it out! We propose a new model alignment pipeline that harness collective intelligence from open-source llms!

Work done during my internship at Together AI is being presented at #icml25. Come and check it out! 

We propose a new model alignment pipeline that harness collective intelligence from open-source llms!
Together AI (@togethercompute) 's Twitter Profile Photo

Most AI benchmarks test the past. But real intelligence is about predicting the future. Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with Hugging Face 🔍 Reasoning > memorization 📊 Real-world events 🧠 Dynamic,

Most AI benchmarks test the past.

But real intelligence is about predicting the future.

Introducing FutureBench — a new benchmark for evaluating agents on real forecasting tasks that we developed with <a href="/huggingface/">Hugging Face</a> 

🔍 Reasoning &gt; memorization
📊 Real-world events
🧠 Dynamic,
Sanxing Chen (@sanxing_chen) 's Twitter Profile Photo

Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,

Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,
Junlin Wang (@junlinwang3) 's Twitter Profile Photo

takeaway: pretraining and fientuning still have merit. Not everything is learned by interactuon and feedback. Although I do think llm can be more bitter lession pilled. Can we perhaps: 1) generate infinite meaningful synthetic data 2) continur trainin even at test tme?

Fan Nie (@fannie1208) 's Twitter Profile Photo

Excited to share our #COLM2025 paper: “Weak-for-Strong (W4S): Training a Weak Meta-Agent to Harness Strong Executors.” How can we unleash the potential of powerful LLMs without directly fine-tuning them? We train a weak 7B meta-agent 🤖 to design and optimize workflows that

Excited to share our #COLM2025 paper:
 “Weak-for-Strong (W4S): Training a Weak Meta-Agent to Harness Strong Executors.”

How can we unleash the potential of powerful LLMs without directly fine-tuning them?

We train a weak 7B meta-agent 🤖 to design and optimize workflows that
Kaitlyn Zhou ✈️ CSCW, EMNLP! (@kaitlynzhou) 's Twitter Profile Photo

As of June 2025, 66% of Americans have never used ChatGPT. Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. arxiv.org/abs/2510.15951

As of June 2025, 66% of Americans have never used ChatGPT.  

Our new position paper, Attention to Non-Adopters, explores why this matters: LLM research is being shaped around adopters, leaving non-adopters’ needs and key research opportunities behind. 

arxiv.org/abs/2510.15951
Roy Xie (@royxie_) 's Twitter Profile Photo

🤔Deepseek-OCR shows the potential of optical context compression for LLMs. But maybe LLMs do not need that much context to begin with! Check out our recent NeurIPS paper, "Language Models (Mostly) Know When to Stop Reading," which reduces the context needed to answer a query👇

🤔Deepseek-OCR shows the potential of optical context compression for LLMs. But maybe LLMs do not need that much context to begin with!

Check out our recent NeurIPS paper, "Language Models (Mostly) Know When to Stop Reading," which reduces the context needed to answer a query👇
Kaitlyn Zhou ✈️ CSCW, EMNLP! (@kaitlynzhou) 's Twitter Profile Photo

No better time to learn about that #AI thing everyone's talking about... 📢 I'm recruiting PhD students in Computer Science or Information Science Cornell Bowers Computing and Information Science! If you're interested, apply to either department (yes, either program!) and list me as a potential advisor!

No better time to learn about that #AI thing everyone's talking about...

📢 I'm recruiting PhD students in Computer Science or Information Science <a href="/Cornell_Bowers/">Cornell Bowers Computing and Information Science</a>!

If you're interested, apply to either department (yes, either program!) and list me as a potential advisor!
Junlin Wang (@junlinwang3) 's Twitter Profile Photo

Just had this realization that codex and Claude code are super strong baseline agents that’s easy to evaluate on any task. Should we start include them as baselines for our papers?