harsha (@sree_harsha_n) Twitter Tweets • TwiCopy

Irem Ergün

6 months ago

I'm excited to share our new pre-print ShiQ: Bringing back Bellman to LLMs! arxiv.org/abs/2505.11081 In this work, we propose a new, Q-learning inspired RL algorithm for finetuning LLMs 🎉 (1/n)

thumb_up_off_alt225

chat_bubble_outline12

repeat38

shareShare

🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨 Thrilled to share our latest work at Cohere Labs: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers“ that explores this phenomenon! Details in 🧵 ⤵️

🚨 Wait, adding simple markers 📌during training unlocks outsized gains at inference time?! 🤔 🚨

Thrilled to share our latest work at <a href="/Cohere_Labs/">Cohere Labs</a>: “Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers“ that explores this phenomenon!

Details in 🧵 ⤵️

thumb_up_off_alt34

chat_bubble_outline1

repeat15

shareShare

Matt Beton

@mattbeton

3 months ago

Linear scaling achieved with multiple DeepSeek v3.1 instances. 4x macs = 4x throughput. 2x M3 Ultra Mac Studios = 1x DeepSeek @ 14 tok/sec 4x M3 Ultra Mac Studios = 2x DeepSeek @ 28 tok/sec DeepSeek V3.1 is a 671B parameter model - so at its native 8-bit quantization, it

thumb_up_off_alt1,1K

chat_bubble_outline49

repeat147

shareShare

Nando Fioretto

@nandofioretto

3 months ago

🔊Open access version of the book 📖 "Differential Privacy in AI: From Theory to Practice" is now available! 👉shorturl.at/biTgO This was a tremendous effort of so many leaders in the DP community who contributed to it Hope it will be a useful resource for many!

thumb_up_off_alt66

chat_bubble_outline3

repeat16

shareShare

kalomaze

@kalomaze

3 months ago

thank you Tri Dao

thumb_up_off_alt120

chat_bubble_outline4

repeat2

shareShare

harsha

@sree_harsha_n

3 months ago

A non-trivial amount of time is spent by me fixing hydra bugs relative to launching actual training runs.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

GamerLegion AoE

@gamerlegionaoe

3 months ago

thumb_up_off_alt221

chat_bubble_outline5

repeat17

shareShare

Alex Cheema - e/acc

@alexocheema

3 months ago

What if we rethought distributed AI training from the ground up for Apple Silicon? Tycho van der Ouderaa and Matt Beton present KPOP at Cohere Labs ML efficiency group. KPOP is an optimizer that leverages the high memory:FLOPS ratio on Apple Silicon. youtu.be/1DTSdYy2RcU?fe…

thumb_up_off_alt87

chat_bubble_outline12

repeat8

shareShare

EXO Labs

@exolabs

3 months ago

A deep dive on KPOP at Cohere Labs ML efficiency group. KPOP is an optimizer designed specifically for the hardware constraints of Apple Silicon. We're doubling the number of Apple Silicon macs that can train together coherently every 2 months. In 12 months we'll have rebuilt

A deep dive on KPOP at <a href="/Cohere_Labs/">Cohere Labs</a> ML efficiency group.

KPOP is an optimizer designed specifically for the hardware constraints of Apple Silicon.

We're doubling the number of Apple Silicon macs that can train together coherently every 2 months.

In 12 months we'll have rebuilt

thumb_up_off_alt265

chat_bubble_outline8

repeat25

shareShare

harsha

@sree_harsha_n

3 months ago

Resource is insane!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

harsha

@sree_harsha_n

3 months ago

PSA: Franz Srambical (not at neurips cuz no capacity) (who has capacity now) will be presenting at the ml-efficency group Cohere Labs :). Amazing work and excited to hear all about it, you should be too! September 10, 1600 GMT/1800 CEST.

PSA: <a href="/lemergenz/">Franz Srambical (not at neurips cuz no capacity)</a>
(who has capacity now) will be presenting at the ml-efficency group <a href="/Cohere_Labs/">Cohere Labs</a> :). Amazing work and excited to hear all about it, you should be too! September 10, 1600 GMT/1800 CEST.

thumb_up_off_alt27

chat_bubble_outline4

repeat3

shareShare

kalomaze

@kalomaze

3 months ago

there are few pop science neuroscience theories i hate more than "the brain is actually quantum", it's surface level, reddit-tier neil degrasse tyson fan kinda bullshit there's no selection pressure for that kind of complexity, if anything, there was selection against it

thumb_up_off_alt179

chat_bubble_outline24

repeat3

shareShare

Vedant Nanda

@_nvedant_

3 months ago

Curious how to accelerate inference of some of the recent byte level models like HAT/HNet/BLT? Check out this vllm fork developed by my friends and colleagues, Pablo and Lukas! To my knowledge first demonstration of inference speedups from dynamic chunking in byte models!

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

harsha

@sree_harsha_n

3 months ago

We rate the ability to bullshit and speculate higher than we rate delivery.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Irem Ergün

@irombie

3 months ago

peeps here is my first solo single. give it some love! open.spotify.com/track/3Ga31YXa…

thumb_up_off_alt31

chat_bubble_outline7

repeat3

shareShare

harsha

@sree_harsha_n

3 months ago

'evals are dead' might be the same people who write kernels before profiling their workload first.

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

mike64_t

@mike64_t

3 months ago

Systems complexity doesn’t just mean deliberately induced complexity. Its often subtle cost that comes from deciding to use off the shelf tools, and losing inspectability for what seemed like a fine trade-off at the time. Every line of code checked into your project is a

thumb_up_off_alt34

chat_bubble_outline1

repeat3

shareShare

Franz Srambical (not at neurips cuz no capacity)

@lemergenz

2 months ago

My Cohere Labs talk is online. We outline research directions that embrace the bitter lesson, and state roadblocks on the path to AGI that need to be addressed even in a regime of absolute energy- and compute-abundance. youtube.com/watch?v=6wraMn…

thumb_up_off_alt18

chat_bubble_outline0

repeat4

shareShare

Bronson

@bronsn4

2 months ago

The Afri-Aya dataset is now live. 🎉 14 cultures, 14 languages, and we're only getting started. 📌 Afri-Aya dataset: huggingface.co/datasets/Coher… This is the largest human-reviewed dataset for African vision-language models (VLMs) to date, representing the greatest number of

thumb_up_off_alt26

chat_bubble_outline0

repeat11

shareShare

Sara Hooker

@sarahookr

2 months ago

I'm starting a new project. Working on what I consider to be the most important problem: building thinking machines that adapt and continuously learn. We have incredibly talent dense founding team + are hiring for engineering, ops, design. Join us: adaptionlabs.ai

thumb_up_off_alt2,2K

chat_bubble_outline183

repeat184

shareShare

harsha

Irem Ergün

Daniel D'souza 

Matt Beton

Nando Fioretto

kalomaze

harsha

GamerLegion AoE

Alex Cheema - e/acc

EXO Labs

harsha

harsha

kalomaze

Vedant Nanda

harsha

Irem Ergün

harsha

mike64_t

Franz Srambical (not at neurips cuz no capacity)

Bronson

Sara Hooker