Jesse Dodge (@jessedodge) 's Twitter Profile
Jesse Dodge

@jessedodge

Senior Research Scientist at AI2 @ai2_allennlp. Responsibly open work on the science of AI and AI for science. Environmental impact of AI. he/him 🏳️‍🌈

ID: 24480294

linkhttps://jessedodge.github.io/ calendar_today15-03-2009 03:30:44

790 Tweet

3,3K Followers

1,1K Following

Ian Magnusson (@ianmagnusson) 's Twitter Profile Photo

Come chat with me at #NeurIPS2024 and learn about how to use Paloma to evaluate perplexity over hundreds of domains! ✨We have stickers too✨

Come chat with me at #NeurIPS2024 and learn about how to use Paloma to evaluate perplexity over hundreds of domains! ✨We have stickers too✨
Koustuv Sinha (@koustuvsinha) 's Twitter Profile Photo

🚨 We are pleased to announce the first, in-person event for the Machine Learning Reproducibility Challenge, MLRC 2025! Save your dates: August 21st, 2025 at Princeton! co-organized by Arvind Narayanan Peter Henderson Naila Murray Jessica Forde Adina Williams Mike Rabbat Joelle Pineau

Ai2 (@allen_ai) 's Twitter Profile Photo

Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks. Comparable to best open-weight models, but a fraction of training compute. When you have a good recipe, ✨ magical things happen when you scale it up!

Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks.

Comparable to best open-weight models, but a fraction of training compute. When you have a good recipe, ✨ magical things happen when you scale it up!
Ai2 (@allen_ai) 's Twitter Profile Photo

Ai2 is coming to #GoogleCloudNext! 🚀 Follow along as we bring our fully open AI to the main stage in Las Vegas, and don't miss CEO Ali Farhadi in a conversation on the new era of AI-powered innovation. From groundbreaking research to real-world impact.

Jesse Dodge (@jessedodge) 's Twitter Profile Photo

We've released #OLMoTrace! This tool matches spans in language model output to exact matches in the training data. It searches over trillions of pretraining tokens in seconds, showing where a model trained on facts or word sequences. Only possible with open data! 🎉

Ai2 (@allen_ai) 's Twitter Profile Photo

Here we go #googlecloudnext! Excited to connect with developers and builders here about our fully open models, now available in the Vertex AI Model Garden.

Here we go #googlecloudnext! Excited to connect with developers and builders here about our fully open models, now available in the Vertex AI Model Garden.
Jesse Dodge (@jessedodge) 's Twitter Profile Photo

We just released more than 30k model checkpoints, trained on 25 different pretraining corpora, all evaluated on 10+ benchmarks! We applied a rigorous, scientific approach for how to decide on what data to train / eval on. Check it out! 🎉

Gabriele Berton (@gabriberton) 's Twitter Profile Photo

How to select pre-training data for LLMs? Two papers came out last week from AllenAI and Nvidia that do it in a similar way, building on the intuition that good data is good regardless the size of the LLM. This intuition can be used to select good data in a cheap manner...

How to select pre-training data for LLMs?

Two papers came out last week from AllenAI and Nvidia that do it in a  similar way, building on the intuition that good data is good regardless  the size of the LLM.

This intuition can be used to select good data in a cheap manner...
Yuling Gu (@gu_yuling) 's Twitter Profile Photo

Excited to be at #NAACL2025 in Albuquerque this week! I'll be presenting "OLMES: A Standard for Language Model Evaluations" (arxiv.org/abs/2406.08446)! Work done with my wonderful collaborators at Ai2 ❤️

Ian Magnusson (@ianmagnusson) 's Twitter Profile Photo

Excited to share that DataDecide, our suite of language models pretrained over differences in data and scale, has been accepted at #ICML2025 💫 See you in Vancouver!

Jesse Dodge (@jessedodge) 's Twitter Profile Photo

Today we released SciArena! It's totally free, try asking questions about scientific topics, papers, citations, etc! Every query gets responses from two different models -- be sure to vote on which you prefer 😁 sciarena.allen.ai

Jesse Dodge (@jessedodge) 's Twitter Profile Photo

I unfortunately can't make this event this year, but it's an excellent list of people that will be there! Def go if you can!

Jesse Dodge (@jessedodge) 's Twitter Profile Photo

Personal update: I'm excited to be joining Meta! I'm deeply grateful for the opportunities I've had at Ai2 over the past 6 years (including three paper awards in the last two years). Onward to the next chapter! 🥳

Ai2 (@allen_ai) 's Twitter Profile Photo

📢 New paper from Ai2: Signal & Noise asks a simple question—can language model benchmarks detect a true difference in model performance? 🧵

📢 New paper from Ai2: Signal & Noise asks a simple question—can language model benchmarks detect a true difference in model performance? 🧵
Jesse Dodge (@jessedodge) 's Twitter Profile Photo

Really proud of this work with David Heineman et al! We found a useful automatic measurement to evaluate if a benchmark is useful for comparing models or if eval results on that benchmark are just noise. We've found it practically useful when building eval suites!

Yanai Elazar (@yanaiela) 's Twitter Profile Photo

🎉 CountCeption 🎉 Ever wondered how often a specific string actually appears in the massive datasets used to train LLMs? Now you can test your intuition, make a prediction, and see how your guess stacks up against everyone else's. Spoiler: it’s way trickier than you think.

🎉 CountCeption 🎉

Ever wondered how often a specific string actually appears in the massive datasets used to train LLMs?

Now you can test your intuition, make a prediction, and see how your guess stacks up against everyone else's.
Spoiler: it’s way trickier than you think.