Mahesh Sathiamoorthy (@madiator) 's Twitter Profile
Mahesh Sathiamoorthy

@madiator

Building @bespokelabsai. Ex-GoogleDeepMind.
LLMs and Tokens.
Discuss about data for LLMs: discord.gg/KqpXvpzVBS

ID: 13614262

linkhttp://smahesh.com calendar_today18-02-2008 08:15:50

3,3K Tweet

9,9K Followers

1,1K Following

Luca Soldaini 🎀 (@soldni) 's Twitter Profile Photo

Blows my mind that model souping Just Works™️ Same model, same data, train 3-5 times with different seeds, 1-2 extra points on MMLU, Hellaswag, ARC, GSM8k, etc

Ahmad Beirami (@abeirami) 's Twitter Profile Photo

There is a subtle distinction between RL in RLHF and RL in domains with a clear reward signal that captures what we want like winning in games, correctness in math With a clear reward, RL is quite effective and can lead to novel sequences of actions (e.g. move 37). But, ... 👇

Alex Dimakis (@alexgdimakis) 's Twitter Profile Photo

AI monoliths vs Unix Philosophy: The case for small specialized models. The current thinking in AI is that AGI is coming, and that one gigantic model will be able to reason and solve business problems ranging from customer support to product development. Currently, agents are

AI monoliths vs Unix Philosophy: 
The case for small specialized models. 

The current thinking in AI is that AGI is coming, and that one gigantic model will be able to reason and solve business problems ranging from customer support to product development. Currently, agents are
Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

Frontier models get a single-digit performance on FrontierMath, a new benchmark! What's interesting is that o1-preview lags behind Gemini 1.5 Pro, after having spent all those reasoning tokens.

Frontier models get a single-digit performance on FrontierMath, a new benchmark!

What's interesting is that o1-preview lags behind Gemini 1.5 Pro, after having spent all those reasoning tokens.
Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

This paper, "Scaling Laws for Precision" [1], is pretty cool and has some important insights. For example, they say "the more data seen during pretraining, the more sensitive the model becomes to quantization at inference-time" [2], which has real implications for data curation,

Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

"Data curation and synthetic data are very fruitful enterprises. We’ve only begun to address data about past model interactions and comprehensive user signals. The things that people refer to as data-flywheels, data engines, or closing the loop are very promising." Thank you🫡

Mahesh Sathiamoorthy (@madiator) 's Twitter Profile Photo

> llm scaling hits a wall* > people take smaller models and specialize them (🙋‍♂️) > apple wins > apple swallows nvidia *not entirely true, but more on this later.