Pradeep Dasigi (@pdasigi) 's Twitter Profile
Pradeep Dasigi

@pdasigi

Senior Research Scientist @allen_ai; #NLProc, Post-training for OLMo

ID: 20038834

linkhttps://pdasigi.github.io/ calendar_today04-02-2009 09:00:52

442 Tweet

1,1K Takipçi

505 Takip Edilen

Costa Huang (@vwxyzjn) 's Twitter Profile Photo

😆 So happy OLMo 2 is out! We applied the same Tülu 3 RLVR recipe and it worked very nicely for our final 13B instruct model. Here are the gains/losses of allenai/OLMo-2-1124-13B-Instruct (RLVR's checkpoint) over hf-allenai_OLMo-2-1124-13B-DPO. More to share soon!

😆 So happy OLMo 2 is out! We applied the same Tülu 3 RLVR recipe and it worked very nicely for our final 13B instruct model.

Here are the gains/losses of allenai/OLMo-2-1124-13B-Instruct (RLVR's checkpoint) over hf-allenai_OLMo-2-1124-13B-DPO. More to share soon!
Ai2 (@allen_ai) 's Twitter Profile Photo

Calling all predoctoral candidates: our OLMo team is hiring! Apply to be a Predoctoral Young Investigator today at the link in-thread 🧵

Interconnects (@interconnectsai) 's Twitter Profile Photo

OpenAI's o1 using "search" was a PSYOP How to understand OpenAI's o1 models as really just one wacky, wonderful, long chain of thought. interconnects.ai/p/openais-o1-u…

Pradeep Dasigi (@pdasigi) 's Twitter Profile Photo

Our team at Ai2 (OLMo) is looking for a predoctoral researcher. You get to work on exciting research in building open LMs while preparing for a PhD. Apply here: job-boards.greenhouse.io/thealleninstit…

Ai2 (@allen_ai) 's Twitter Profile Photo

Remember Molmo? The full recipe is finally out! Training code, data, and everything you need to reproduce our models. Oh, and we have updated our tech report too! Links in thread 👇

Remember Molmo? The full recipe is finally out!

Training code, data, and everything you need to reproduce our models. Oh, and we have updated our tech report too!

Links in thread 👇
Faeze Brahman (@faeze_brh) 's Twitter Profile Photo

Just arrived in 🇨🇦 to attend NeurIPS 2024! Excited to connect and chat about AI reliability and safety, resource-efficient approaches to AI alignment , inference-time scaling and anything in between! You can drop me a message/email ([email protected]) or find me at the

Pradeep Dasigi (@pdasigi) 's Twitter Profile Photo

Here's a significant update to Tülu 3: we scaled up the post-training recipe to Llama 3.1 405B. Tülu 3 405B beats Llama's 405B instruct model and also Deepseek V3. You can now access the model and the entire post-training pipeline. Huge shoutout to Hamish Ivison and Costa Huang who

Hamish Ivison (@hamishivi) 's Twitter Profile Photo

One additional thing in the updated Tulu 3 paper that I'd like to highlight is that Pradeep Dasigi went back and re-evaluated our mid-stage checkpoints on our held-out evals (Section 7.4). This lets us see what decisions generalized beyond the exact test sets we used! I think this is

One additional thing in the updated Tulu 3 paper that I'd like to highlight is that <a href="/pdasigi/">Pradeep Dasigi</a> went back and re-evaluated our mid-stage checkpoints on our held-out evals (Section 7.4).

This lets us see what decisions generalized beyond the exact test sets we used! I think this is
Hanna Hajishirzi (@hannahajishirzi) 's Twitter Profile Photo

Excited to drive innovation and push the boundaries of open, scientific AI research & development! 🚀 Join us at Ai2 to shape the future of OLMo, Molmo, Tulu, and more. We’re hiring at all levels—apply now! 👇 #AI #Hiring Research Engineer job-boards.greenhouse.io/thealleninstit… Research

Ai2 (@allen_ai) 's Twitter Profile Photo

Introducing olmOCR, our open-source tool to extract clean plain text from PDFs! Built for scale, olmOCR handles many document types with high throughput. Run it on your own GPU for free—at over 3000 token/s, equivalent to $190 per million pages, or 1/32 the cost of GPT-4o!

Pradeep Dasigi (@pdasigi) 's Twitter Profile Photo

How to curate instruction tuning datasets while targeting specific skills? This is a common question developers face while post-training LMs. In this work led by Hamish Ivison, we found that simple embedding based methods scale much better than fancier computationally intensive

Ai2 (@allen_ai) 's Twitter Profile Photo

Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks. Comparable to best open-weight models, but a fraction of training compute. When you have a good recipe, ✨ magical things happen when you scale it up!

Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 &amp; GPT-4o mini on a suite of popular, multi-skill benchmarks.

Comparable to best open-weight models, but a fraction of training compute. When you have a good recipe, ✨ magical things happen when you scale it up!
Nathan Lambert (@natolambert) 's Twitter Profile Photo

A very exciting day for open-source AI! We're releasing our biggest open source model yet -- OLMo 2 32B -- and it beats the latest GPT 3.5, GPT 4o mini, and leading open weight models like Qwen and Mistral. As usual, all data, weights, code, etc. are available. For a long time,

A very exciting day for open-source AI! We're releasing our biggest open source model yet -- OLMo 2 32B -- and it beats the latest GPT 3.5, GPT 4o mini, and leading open weight models like Qwen and Mistral. As usual, all data, weights, code, etc. are available.

For a long time,
Ai2 (@allen_ai) 's Twitter Profile Photo

We're excited to round out the OLMo 2 family with its smallest member, OLMo 2 1B, surpassing peer models like Gemma 3 1B or Llama 3.2 1B. The 1B model should enable rapid iteration for researchers, more local development, and a more complete picture of how our recipe scales.

We're excited to round out the OLMo 2 family with its smallest member, OLMo 2 1B, surpassing peer models like Gemma 3 1B or Llama 3.2 1B. The 1B model should enable rapid iteration for researchers, more local development, and a more complete picture of how our recipe scales.
Jesse Dodge (@jessedodge) 's Twitter Profile Photo

Percy Liang EleutherAI nice! we also recently trained a set of models on 25 different pretraining corpora, each corpus having 14 model sizes trained (4M to 1B), to 5x Chinchilla. We released 30,000+ checkpoints! x.com/allen_ai/statu… arxiv.org/pdf/2504.11393

Valentina Pyatkin (@valentina__py) 's Twitter Profile Photo

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR.
But the set of constraints and verifier functions is limited and most models overfit on IFEval.
We introduce IFBench to measure model generalization to unseen constraints.