Katrina Drozdov (Evtimova) (@stochasticdoggo) Twitter Tweets • TwiCopy

Gabriele Berton

2 years ago

This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical

thumb_up_off_alt3,3K

chat_bubble_outline46

repeat341

shareShare

Ian Goodfellow

@goodfellow_ian

a year ago

The recording of the GAN test of time talk by David Warde-Farley 🇺🇦 @[email protected] is now publicly available: neurips.cc/virtual/2024/t…

thumb_up_off_alt121

chat_bubble_outline11

repeat16

shareShare

Katrina Drozdov (Evtimova)

@stochasticdoggo

a year ago

The principle of least effort, from psychology, describes how we favor efficiency over effort. It aligns with System 1 (fast, intuitive) vs. System 2 (slow, deliberate) reasoning. AI faces a similar challenge: knowing when to rely on heuristics vs. deeper reasoning.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Andrew Ng

@andrewyng

a year ago

The buzz over DeepSeek this week crystallized, for many people, a few important trends that have been happening in plain sight: (i) China is catching up to the U.S. in generative AI, with implications for the AI supply chain. (ii) Open weight models are commoditizing the

thumb_up_off_alt4,4K

chat_bubble_outline293

repeat1,1K

shareShare

Hila Chefer

@hila_chefer

10 months ago

VideoJAM is our new framework for improved motion generation from AI at Meta We show that video generators struggle with motion because the training objective favors appearance over dynamics. VideoJAM directly adresses this **without any extra data or scaling** 👇🧵

thumb_up_off_alt1,1K

chat_bubble_outline60

repeat194

shareShare

Katrina Drozdov (Evtimova)

@stochasticdoggo

9 months ago

I asked ChatGPT, Gemini, and Claude for a clever joke. They all gave me the same one. Either AI is merging into a hive mind… or humor has officially been solved mathematically!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Thomas Wolf

@thom_wolf

9 months ago

I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won't give us a "compressed 21st century". The "compressed 21st century" comes from Dario's "Machine of Loving Grace" and if you haven’t read it, you probably

thumb_up_off_alt2,2K

chat_bubble_outline277

repeat501

shareShare

Misha Laskin

@mishalaskin

9 months ago

Today I’m launching Reflection AI with my friend and co-founder Ioannis Antonoglou. Our team pioneered major advances in RL and LLMs, including AlphaGo and Gemini. At Reflection, we're building superintelligent autonomous systems. Starting with autonomous coding.

Today I’m launching <a href="/reflection_ai/">Reflection AI</a> with my friend and co-founder <a href="/real_ioannis/">Ioannis Antonoglou</a>.

Our team pioneered major advances in RL and LLMs, including AlphaGo and Gemini.

At Reflection, we're building superintelligent autonomous systems. Starting with autonomous coding.

thumb_up_off_alt1,1K

chat_bubble_outline174

repeat219

shareShare

Katrina Drozdov (Evtimova)

@stochasticdoggo

9 months ago

Huge congratulations on the launch! Reflection AI has an incredible team and an ambitious mission—excited to follow your progress!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Andrew Drozdov

@mrdrozdov

9 months ago

No labels. No problems. 😎 Check out the new impactful approach (TAO) from the Mosaic Research Team!

thumb_up_off_alt14

chat_bubble_outline1

repeat1

shareShare

Katrina Drozdov (Evtimova)

@stochasticdoggo

8 months ago

We’re working on researching and designing world models. In the meantime, you definitely need RAG and FreshStack will help.

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

NYU Center for Data Science

@nyudatascience

7 months ago

CDS PhD Vlad Sobal (Vlad Sobal) and Courant PhD Wancong (Kevin) Zhang show that when good data is scarce, planning beats traditional reinforcement learning. With Kyunghyun Cho, Tim G. J. Rudner, and Yann LeCun. nyudatascience.medium.com/when-good-data…

thumb_up_off_alt51

chat_bubble_outline3

repeat15

shareShare

Kyunghyun Cho

@kchonyc

7 months ago

it's been more than a decade since KD was proposed, and i've been using it all along .. but why does it work? too many speculations but no simple explanation. Sungmin Cha and i decided to see if we can come up with the simplest working description of KD in this work. we ended

thumb_up_off_alt359

chat_bubble_outline7

repeat43

shareShare

jack morris

@jxmnop

7 months ago

excited to finally share on arxiv what we've known for a while now: All Embedding Models Learn The Same Thing embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data feels like magic, but it's real:🧵

thumb_up_off_alt6,6K

chat_bubble_outline124

repeat618

shareShare

Katrina Drozdov (Evtimova)

@stochasticdoggo

5 months ago

Finally dipped my toes into RL post-training. I trained a code generation LLM with GRPO using open-r1. Here are my 9 takeaways: kevtimova.github.io/posts/grpo/

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Diana Cai

@dianarycai

3 months ago

The application for a research fellowship at the Flatiron Institute in the Center for Computational Math is now live! This includes positions for ML and stats. The deadline is Dec 1. Links below with more details.

thumb_up_off_alt107

chat_bubble_outline2

repeat18

shareShare

John Schulman

@johnschulman2

3 months ago

Tinker provides an abstraction layer that is the right one for post-training R&D -- it's the infrastructure I've always wanted. I'm excited to see what people build with it. "Civilization advances by extending the number of important operations which we can perform without

thumb_up_off_alt1,1K

chat_bubble_outline48

repeat119

shareShare

Bob McGrew

@bobmcgrewai

2 months ago

After spending billions of dollars of compute, GPT-5 learned that the most effective use of its token budget is to give itself a little pep talk every time it figures something out. Maybe you should do the same.

thumb_up_off_alt2,2K

chat_bubble_outline45

repeat105

shareShare

Thinking Machines

@thinkymachines

2 months ago

Today we’re announcing research and teaching grants for Tinker: credits for scholars and students to fine-tune and experiment with open-weight LLMs. Read more and apply at: thinkingmachines.ai/blog/tinker-re…

thumb_up_off_alt944

chat_bubble_outline17

repeat108

shareShare

Katrina Drozdov (Evtimova)

@stochasticdoggo

2 months ago

Really glad to see initiatives like Thinking Machines Tinker grants that support hands-on RL and open-weights LLM work in both research and teaching. What an exciting opportunity for the community!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare