Yaroslav Bulatov (@yaroslavvb) Twitter Tweets • TwiCopy

Yaroslav Bulatov

@yaroslavvb

+ Follow

Together.AI (ex-Google Brain, OpenAI, Meta)
New Blog: medium.com/@yaroslavvb
Old Blog: yaroslavvb.blogspot.com

ID: 258031029

linkhttp://medium.com/@yaroslavvb calendar_today26-02-2011 20:22:57

1,1K Tweet

7,7K Takipçi

873 Takip Edilen

Yaroslav Bulatov

@yaroslavvb

a year ago

I did a spot check of my most recent math question I asked on math forums against LLMs Gemini Flash Thinking: 2/2 DeepSeek: 2/2 Claude: 1/2 ChatGPT o1: 0/2 Gemini was the fastest, followed by DeepSeek, followed by human. docs.google.com/document/d/16S…

thumb_up_off_alt145

chat_bubble_outline8

repeat14

shareShare

Yaroslav Bulatov

@yaroslavvb

a year ago

Sitting in airport and realizing I don't get how singular values work. Does anyone understand why the two graphs match? mathoverflow.net/questions/4848…

thumb_up_off_alt221

chat_bubble_outline8

repeat7

shareShare

Yaroslav Bulatov

@yaroslavvb

a year ago

Fun visualization: take 10x10 random matrix and visualize eigenvalue trajectories you get by rotating this matrix by theta in 0..2 Pi in some fixed direction. Now vary this direction smoothly

thumb_up_off_alt1,1K

chat_bubble_outline30

repeat92

shareShare

Yaroslav Bulatov

@yaroslavvb

a year ago

From a talk by Chris Manning

thumb_up_off_alt16

chat_bubble_outline0

repeat3

shareShare

Yaroslav Bulatov

@yaroslavvb

a year ago

Are there recorded talks I can watch relevant to DeepSeek?

thumb_up_off_alt5

chat_bubble_outline2

repeat0

shareShare

Yaroslav Bulatov

@yaroslavvb

a year ago

Keeping up with headline news, which are often negative, makes it easy to lose track of the big picture

thumb_up_off_alt21

chat_bubble_outline1

repeat0

shareShare

Max Ryabinin

@m_ryabinin

8 months ago

I'm giving a talk at the MCDC🤝 workshop (#ICLR2025) tomorrow! Planning to cover: * An overview of decentralized DL & its links to other fields * Lessons learned from research on Learning@home, DeDLOC, SWARM, Petals * Sneak peek on some of our upcoming work! See you at 14:30!

thumb_up_off_alt53

chat_bubble_outline3

repeat9

shareShare

Yaroslav Bulatov

@yaroslavvb

8 months ago

Watching Zhuang Liu's - "Transformers without Normalization", this slide is a reminder how our optimizer and architecture choices are coupled

Watching <a href="/liuzhuang1234/">Zhuang Liu</a>'s - "Transformers without Normalization", this slide is a reminder how our optimizer and architecture choices are coupled

thumb_up_off_alt158

chat_bubble_outline5

repeat19

shareShare

Yaroslav Bulatov

@yaroslavvb

8 months ago

Unexpected RMT observation, squared singular values of a product of random projections are essentially distributed as exponentiated chi-squared, can anyone see a direct explanation of this? math.stackexchange.com/questions/5060…

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare

Yaroslav Bulatov

@yaroslavvb

8 months ago

Once everyone online is indistinguishable from an AI agent, it would make it cool again to hang out in person. Until the robot impersonators.

thumb_up_off_alt21

chat_bubble_outline1

repeat0

shareShare

Yaroslav Bulatov

@yaroslavvb

8 months ago

Enjoyed Jeremy Bernstein thought-provoking talk on optimizers at ML Collective today. Are theories that motivate optimizers very useful? Adversarial for AdaGrad, natural gradient for KFAC. Non-linear solvers in scientific computing seem to advance without spending a lot of effort thinking

thumb_up_off_alt14

chat_bubble_outline1

repeat1

shareShare