Jeremy Pinto (@jerpint) 's Twitter Profile
Jeremy Pinto

@jerpint

Senior Applied Research Scientist at Mila

ID: 379019139

linkhttp://www.jerpint.io calendar_today24-09-2011 06:48:25

624 Tweet

458 Takipçi

250 Takip Edilen

Jeremy Pinto (@jerpint) 's Twitter Profile Photo

This happened to me once. Candidate gave me a perfect pandas one liner (including knowing obscure param names) but couldn’t tell me what df.head() would do

Benno Krojer (@benno_krojer) 's Twitter Profile Photo

Tomás Vergara Browne and I were quiet over the summer with our podcast "Behind the Research of AI"... But now we're back! And with an awesome guest! We interviewed Jack Morris during Conference on Language Modeling and had a blast chatting, eating snacks together and reflecting on phd life/research ideas

<a href="/tvergarabrowne/">Tomás Vergara Browne</a> and I were quiet over the summer with our podcast "Behind the Research of AI"...

But now we're back! And with an awesome guest!

We interviewed <a href="/jxmnop/">Jack Morris</a> during <a href="/COLM_conf/">Conference on Language Modeling</a> and had a blast chatting, eating snacks together and reflecting on phd life/research ideas
Greg Kamradt (@gregkamradt) 's Twitter Profile Photo

We verified the TRM results on the semi private holdout set and they're legit Awesome work and contribution to the open source community by Alexia Jolicoeur-Martineau My notes: * This model is tiny! 7M params, but the rub is that it is relatively expensive to run because pre-training and

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've

Thariq (@trq212) 's Twitter Profile Photo

We launched a sandbox within Claude Code that allows you to define exactly which directories and network hosts your agent can access. Type /sandbox to enable it.

We launched a sandbox within Claude Code that allows you to define exactly which directories and network hosts your agent can access.

Type /sandbox to enable it.
NVIDIA (@nvidia) 's Twitter Profile Photo

Space isn’t just for stars anymore. 🌠 Starcloud’s H100-powered satellite brings sustainable, high-performance computing beyond Earth. Learn more: nvda.ws/47eYZvC

Alexia Jolicoeur-Martineau (@jm_alexia) 's Twitter Profile Photo

Insane finding! You train on at most 16 improvement steps at training, but at inference you do as many steps as possible (448 steps) and you reach crazy accuracy. This is how you build intelligence!!