Arian Hosseini (@ariantbd) 's Twitter Profile
Arian Hosseini

@ariantbd

Research Scientist @GoogleDeepMind - LLM reasoning and alignment - prev: @Google @MSFTResearch

ID: 274357354

linkhttps://arianhosseini.github.io/ calendar_today30-03-2011 05:33:50

360 Tweet

1,1K Followers

313 Following

Arian Hosseini (@ariantbd) 's Twitter Profile Photo

Thrilled to share that I recently joined Google DeepMind I’ll be working on LLM RL reasoning. Exciting times ahead and I'm eager to collaborate with and learn from the brilliant minds here.

Thrilled to share that I recently joined <a href="/GoogleDeepMind/">Google DeepMind</a> 

I’ll be working on LLM RL reasoning. Exciting times ahead and I'm eager to collaborate with and learn from the brilliant minds here.
Mehran Kazemi (@kazemi_sm) 's Twitter Profile Photo

Is BIG-Bench Hard too easy for your LLM? We just unleashed BIG-Bench EXTRA Hard (BBEH)! 😈 Every task, harder! Every model, humbled! (Poem Credit: Gemini 2.0 Flash) Massive headroom for progress across various areas in general reasoning 🤯

Is BIG-Bench Hard too easy for your LLM?
We just unleashed BIG-Bench EXTRA Hard (BBEH)! 😈
Every task, harder! Every model, humbled! (Poem Credit: Gemini 2.0 Flash)
Massive headroom for progress across various areas in general reasoning 🤯
Arian Hosseini (@ariantbd) 's Twitter Profile Photo

Happy to see a flavor 🧂 of Compositional GSM in this work Comp GSM: arxiv.org/abs/2410.01748 BBEH: arxiv.org/abs/2502.19187

Reyhane Askari (@reyhaneaskari) 's Twitter Profile Photo

🚀 New Paper Alert! Can we generate informative synthetic data that truly helps a downstream learner? Introducing Deliberate Practice for Synthetic Data (DP)—a dynamic framework that focuses on where the model struggles most to generate useful synthetic training examples. 🔥

🚀 New Paper Alert! 

Can we generate informative synthetic data that truly helps a downstream learner?

Introducing Deliberate Practice for Synthetic Data (DP)—a dynamic framework that focuses on where the model struggles most to generate useful synthetic training examples. 

🔥
Edward Grefenstette (@egrefen) 's Twitter Profile Photo

Our team in London is hiring! If you want to come work with a wonderful group of researchers on investigating the frontiers of autonomous open-ended agents that help humans be better at doing things we love, come have a look. Link in post below 👇

Arian Hosseini (@ariantbd) 's Twitter Profile Photo

Test-time scaling with GenRMs being “more efficient” than self-consistency (SC) is misleading 😶‍🌫️ In our new work, in a compute-matched setup, we show that SC outperforms GenRM across most budgets! 💰 Check out details/code and the paper here : arxiv.org/abs/2504.01005

Arian Hosseini (@ariantbd) 's Twitter Profile Photo

At higher budgets GenRM beats SC, but a question remains: How to allocate the budget to generate more solutions or verifications? We derive scaling laws for optimal # of solutions and verifications. Showing that # of solution should grow 1.5-2x faster than # of verifications

At higher budgets GenRM beats SC, but a question remains: How to allocate the budget to generate more solutions or verifications? 

We derive scaling laws for optimal # of solutions and verifications. Showing that # of solution should grow 1.5-2x faster than # of verifications
Nishad Singhi (@nishadsinghi) 's Twitter Profile Photo

We've released the models and data for this paper on huggingface 🤗: huggingface.co/sc-genrm-scali… 👉 Fine-tuned generative verifiers 👉 GPT-4o generated synthetic data for fine-tuning your own verifiers 👉 Verifications from lots of models (incl. QwQ-32B) on lots of datasets!

Arian Hosseini (@ariantbd) 's Twitter Profile Photo

is there a study on what sort of concepts should we let LLMs learn by themselves rather than injecting our preferences or inductive biases? For instance, Is CoT reasoning the kind of thing that needs (language) preference/bias?

Arian Hosseini (@ariantbd) 's Twitter Profile Photo

After a lovely week away from LLMs in Vietnam, I’ll be at ICLR 2026 🇸🇬 Will be at posters for GenRM and Training LLM Reasoners via Compute-Optimal Sampling . Also Google DeepMind booth on 25th (and more). Come chat about LLM Reasoning and RL, or for Vietnam recommendations

After a lovely week away from LLMs in Vietnam, I’ll be at <a href="/iclr_conf/">ICLR 2026</a> 🇸🇬 

Will be at posters for GenRM and Training LLM Reasoners via Compute-Optimal Sampling . Also <a href="/GoogleDeepMind/">Google DeepMind</a> booth on 25th (and more). Come chat about LLM Reasoning and RL, or for Vietnam recommendations
Mehran Kazemi (@kazemi_sm) 's Twitter Profile Photo

Upon some requests, we now have a BBEH Mini with 460 examples (20 per task) for faster and cheaper experimentation. The set can be downloaded from: github.com/google-deepmin… The results are reported in Table 3 of arxiv.org/pdf/2502.19187

Rishabh Agarwal (@agarwl_) 's Twitter Profile Photo

Idea: Merging generative verification and solution generarion during RL training of LLM reasoners. Why? This allows you to scale inference compute both sequentially (long CoT) and parallel (Best of N, weighted maj voting). Next? Generation and verification to be trained end

Jason Weston (@jaseweston) 's Twitter Profile Photo

🚨Announcing RAM 2 workshop @ COLM25 - call for papers🚨 - 10 years on, we present the sequel to the classic RAM🐏 (Reasoning, Attention, Memory) workshop that took place in 2015 at the cusp of major change in the area. Now in 2025 we reflect on what's happened and discuss the

🚨Announcing RAM 2 workshop @ COLM25 - call for papers🚨 
- 10 years on, we present the sequel to the classic RAM🐏 (Reasoning, Attention, Memory) workshop that took place in 2015 at the cusp of major change in the area. Now in 2025 we reflect on what's happened and discuss the