Ameya P. (@amyprb) 's Twitter Profile
Ameya P.

@amyprb

Exploring Science of Benchmarking and Economics of Transformative AI.
Postdoc @bethgelab @uni_tue;
Previously: @OxfordTVG, @intelailabs

RT != endorsement

ID: 1439850368607850496

linkhttp://drimpossible.github.io calendar_today20-09-2021 07:15:24

645 Tweet

237 Followers

287 Following

Shashwat Goel (@shashwatgoel7) 's Twitter Profile Photo

I realized much of the disagreement is about "what is the right baseline". If the claim is "<our RL> improves reasoning". The null hypothesis is RL just tuned the model to your specific inference hparams, so it's important to show changing hparams doesn't give the same gains.

Xinyu Zhu (@tianhongzxy) 's Twitter Profile Photo

🔥The debate’s been wild: How does the reward in RLVR actually improve LLM reasoning?🤔 🚀Introducing our new paper👇 💡TL;DR: Just penalizing incorrect rollouts❌ — no positive reward needed — can boost LLM reasoning, and sometimes better than PPO/GRPO! 🧵[1/n]

🔥The debate’s been wild: How does the reward in RLVR actually improve LLM reasoning?🤔
🚀Introducing our new paper👇
💡TL;DR: Just penalizing incorrect rollouts❌ — no positive reward needed — can boost LLM reasoning, and sometimes better than PPO/GRPO!

🧵[1/n]
alz (@alz_zyd_) 's Twitter Profile Photo

IMO, a major driver of PhD attrition is: do you have the willpower to finish a paper A full published econ/finance paper is like well over 100 pages of work after revisions and such, it's longer than the longest thing many students have ever done at that point in their lives

Alexander Doria (@dorialexander) 's Twitter Profile Photo

Announcing the release of the official Common Corpus paper: a 20 page report detailing how we collected, processed and published 2 trillion tokens of reusable data for LLM pretraining.

Announcing the release of the official Common Corpus paper: a 20 page report detailing how we collected, processed and published 2 trillion tokens of reusable data for LLM pretraining.
Timothy B. Lee (@binarybits) 's Twitter Profile Photo

Thinking that mechanistic interpretability is the key to understanding AI safety is like thinking that neuroscience is the key to understanding political science.

Ameya P. (@amyprb) 's Twitter Profile Photo

Yep strong agree - 'having some way to consolidate experience is good. In the past, continual learning research never had a good response to “just retrain bro”, but maybe this will change in the age of experience'

Ameya P. (@amyprb) 's Twitter Profile Photo

Forecasting has a lot of potential, but lots of open problems to be tackled to get there. Really good reference for what critical problems it faces 👇

Chris (@chatgpt21) 's Twitter Profile Photo

Elon Musk’s recent Twitter meltdown over the $2.4 trillion infrastructure bill seems contradictory given his own AGI timeline. If Musk genuinely believes AI will surpass human intelligence by late 2025, then he must also acknowledge the unprecedented productivity boom AGI would

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Qwen3 Embedding Paper Embedding and reranking pipelines still struggle with task diversity and multilingual consistency. This report introduces a multi-stage training framework that uses Qwen3 LLMs to synthesize rich, multilingual relevance data, followed by contrastive

Qwen3 Embedding Paper

Embedding and reranking pipelines still struggle with task diversity and multilingual consistency.

This report introduces a multi-stage training framework that uses Qwen3 LLMs to synthesize rich, multilingual relevance data, followed by contrastive
Eric W. Tramel (@fujikanaeda) 's Twitter Profile Photo

If you need some synthetic personas/people for your data projects, my team just put out a CC-BY-4.0 open dataset on HF of 100k folks matching US-census distributions for your free use. nvidia/Nemotron-Personas

If you need some synthetic personas/people for your data projects, my team just put out a CC-BY-4.0 open dataset on HF of 100k folks matching US-census distributions for your free use. 

nvidia/Nemotron-Personas
Dwarkesh Patel (@dwarkesh_sp) 's Twitter Profile Photo

New episode with Kenneth S Rogoff, former chief economist of the IMF. Ken predicts that, within the next decade, the US will have a debt-induced inflation crisis, but not a Japan-type financial crisis (the latter is much worse, and can make a country poorer for generations). Ken

Kording Lab 🦖 (@kordinglab) 's Twitter Profile Photo

I missed this when it came out and I think it is important. It seems that generally AI systems are not stress tested enough before publication/ press releases!

Sebastian Dziadzio (@sbdzdz) 's Twitter Profile Photo

I'm in Nashville for CVPR and wow, the Music City name is not exaggerated. If you're around, we'll be presenting our work on temporal model merging with Vishaal Udandarao✈️CVPR'25, Karsten Roth, and Ameya P. (at CVPR) on Saturday 5-7 pm in ExHall D (poster #445). Come say hi!

I'm in Nashville for CVPR and wow, the Music City name is not exaggerated. If you're around, we'll be presenting our work on temporal model merging with <a href="/vishaal_urao/">Vishaal Udandarao✈️CVPR'25</a>, <a href="/confusezius/">Karsten Roth</a>, and <a href="/AmyPrb/">Ameya P. (at CVPR)</a> on Saturday 5-7 pm in ExHall D (poster #445). Come say hi!
Benjamin Todd (@ben_j_todd) 's Twitter Profile Photo

Why can AIs code for 1h but not 10h? A simple explanation: if there's a 10% chance of error per 10min step (say), the success rate is: 1h: 53% 4h: 8% 10h: 0.002% Toby Ord has tested this 'constant error rate' theory and shown it's a good fit for the data chance of

Why can AIs code for 1h but not 10h?

A simple explanation: if there's a 10% chance of error per 10min step (say), the success rate is:

1h: 53%
4h: 8%
10h: 0.002%

<a href="/tobyordoxford/">Toby Ord</a> has tested this 'constant error rate' theory and shown it's a good fit for the data

chance of
anton (@atroyn) 's Twitter Profile Photo

my maybe most heretical startup opinion is more founders should quit sooner and do something else. yeah yc loves to talk about how long it took airbnb to take off but we don’t have the counterfactual of the better company chesky could have built instead if he quit.