harsha (@sree_harsha_n) 's Twitter Profile
harsha

@sree_harsha_n

Applied Scientist intern @ Amazon | efficient DL | MSc | prev @cvml_mpiinf, @cispa, @medialab . Community lead @CohereForAI (views my own.)

ID: 408563023

calendar_today09-11-2011 15:54:16

2,2K Tweet

470 Followers

533 Following

Irem Ergün (@irombie) 's Twitter Profile Photo

I'm excited to share our new pre-print ShiQ: Bringing back Bellman to LLMs! arxiv.org/abs/2505.11081 In this work, we propose a new, Q-learning inspired RL algorithm for finetuning LLMs 🎉 (1/n)

Daniel D'souza  (@mrdanieldsouza) 's Twitter Profile Photo

🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨 Thrilled to share our latest work at Cohere Labs: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers“ that explores this phenomenon! Details in 🧵 ⤵️

🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨

Thrilled to share our latest work at <a href="/Cohere_Labs/">Cohere Labs</a>: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers“ that explores this phenomenon!

Details in 🧵 ⤵️
Matt Beton (@mattbeton) 's Twitter Profile Photo

Linear scaling achieved with multiple DeepSeek v3.1 instances. 4x macs = 4x throughput. 2x M3 Ultra Mac Studios = 1x DeepSeek @ 14 tok/sec 4x M3 Ultra Mac Studios = 2x DeepSeek @ 28 tok/sec DeepSeek V3.1 is a 671B parameter model - so at its native 8-bit quantization, it

Nando Fioretto (@nandofioretto) 's Twitter Profile Photo

🔊Open access version of the book 📖 "Differential Privacy in AI: From Theory to Practice" is now available! 👉shorturl.at/biTgO This was a tremendous effort of so many leaders in the DP community who contributed to it Hope it will be a useful resource for many!

Alex Cheema - e/acc (@alexocheema) 's Twitter Profile Photo

What if we rethought distributed AI training from the ground up for Apple Silicon? Tycho van der Ouderaa and Matt Beton present KPOP at Cohere Labs ML efficiency group. KPOP is an optimizer that leverages the high memory:FLOPS ratio on Apple Silicon. youtu.be/1DTSdYy2RcU?fe…

EXO Labs (@exolabs) 's Twitter Profile Photo

A deep dive on KPOP at Cohere Labs ML efficiency group. KPOP is an optimizer designed specifically for the hardware constraints of Apple Silicon. We're doubling the number of Apple Silicon macs that can train together coherently every 2 months. In 12 months we'll have rebuilt

A deep dive on KPOP at <a href="/Cohere_Labs/">Cohere Labs</a> ML efficiency group.

KPOP is an optimizer designed specifically for the hardware constraints of Apple Silicon.

We're doubling the number of Apple Silicon macs that can train together coherently every 2 months.

In 12 months we'll have rebuilt
harsha (@sree_harsha_n) 's Twitter Profile Photo

PSA: Franz Srambical (not at neurips cuz no capacity) (who has capacity now) will be presenting at the ml-efficency group Cohere Labs :). Amazing work and excited to hear all about it, you should be too! September 10, 1600 GMT/1800 CEST.

PSA: <a href="/lemergenz/">Franz Srambical (not at neurips cuz no capacity)</a> 
 (who has capacity now) will be presenting at the ml-efficency group <a href="/Cohere_Labs/">Cohere Labs</a> :). Amazing work and excited to hear all about it, you should be too! September 10, 1600 GMT/1800 CEST.
kalomaze (@kalomaze) 's Twitter Profile Photo

there are few pop science neuroscience theories i hate more than "the brain is actually quantum", it's surface level, reddit-tier neil degrasse tyson fan kinda bullshit there's no selection pressure for that kind of complexity, if anything, there was selection against it

Vedant Nanda (@_nvedant_) 's Twitter Profile Photo

Curious how to accelerate inference of some of the recent byte level models like HAT/HNet/BLT? Check out this vllm fork developed by my friends and colleagues, Pablo and Lukas! To my knowledge first demonstration of inference speedups from dynamic chunking in byte models!

mike64_t (@mike64_t) 's Twitter Profile Photo

Systems complexity doesn’t just mean deliberately induced complexity. Its often subtle cost that comes from deciding to use off the shelf tools, and losing inspectability for what seemed like a fine trade-off at the time. Every line of code checked into your project is a

Franz Srambical (not at neurips cuz no capacity) (@lemergenz) 's Twitter Profile Photo

My Cohere Labs talk is online. We outline research directions that embrace the bitter lesson, and state roadblocks on the path to AGI that need to be addressed even in a regime of absolute energy- and compute-abundance. youtube.com/watch?v=6wraMn…

Bronson (@bronsn4) 's Twitter Profile Photo

The Afri-Aya dataset is now live. 🎉 14 cultures, 14 languages, and we're only getting started. 📌 Afri-Aya dataset: huggingface.co/datasets/Coher… This is the largest human-reviewed dataset for African vision-language models (VLMs) to date, representing the greatest number of

The Afri-Aya dataset is now live. 🎉 

14 cultures, 14 languages, and we're only getting started.

📌 Afri-Aya dataset: huggingface.co/datasets/Coher…

This is the largest human-reviewed dataset for African vision-language models (VLMs) to date, representing the greatest number of
Sara Hooker (@sarahookr) 's Twitter Profile Photo

I'm starting a new project. Working on what I consider to be the most important problem: building thinking machines that adapt and continuously learn. We have incredibly talent dense founding team + are hiring for engineering, ops, design. Join us: adaptionlabs.ai