An Yan (@anyan_ai) Twitter Tweets • TwiCopy

Omar Sanseviero

8 months ago

Want to learn about the research behind Gemma 3n? Altup - arxiv.org/abs/2301.13310 LAuReL - arxiv.org/abs/2411.07501 MatFormer - arxiv.org/abs/2310.07707 Activation sparsity - arxiv.org/abs/2506.06644 Universal Speech Model - arxiv.org/abs/2303.01037 Blog - developers.googleblog.com/en/introducing…

thumb_up_off_alt665

chat_bubble_outline12

repeat126

shareShare

surya

@suryasure05

7 months ago

I spent my summer building TinyTPU : An open source ML inference and training chip. it can do end to end inference + training ENTIRELY on chip. here's how I did it👇:

thumb_up_off_alt3,3K

chat_bubble_outline70

repeat337

shareShare

Igor Kotenkov

@stalkermustang

4 months ago

I'm not sure why this new ByteDance Seed paper is not all over my feed. Am I missing something? - trained Qwen2VL-7B to play genshin - SFT only, no RL - 2424 hours of human gameplay + 15k short reasoning traces to decompose the tasks - sub 20k H100 hours (3 epochs) - heaps of

thumb_up_off_alt498

chat_bubble_outline12

repeat54

shareShare

Ethan

@torchcompiled

3 months ago

The Classifier free guidance equation looks a lot like reinforcement learning, and in fact you can exploit this similarly.

thumb_up_off_alt746

chat_bubble_outline7

repeat65

shareShare

Volodymyr Kuleshov 🇺🇦

@volokuleshov

3 months ago

Really nice effort on implementing diffusion language models such as MDLM, UDLM, BD3LM, GIDD, FlexMDM in a unified framework.

thumb_up_off_alt71

chat_bubble_outline2

repeat13

shareShare

POM

@peteromallet

3 months ago

This is such a cool QwenEdit LoRA by Mohamed Oumoumad - among the best retexturing I've seen from a diffusion model w/ precise control. On top of models like Z-Image, LoRAs like this will be able to do tasks c. 20 times faster and cheaper than Nano Banana Pro + w/ more consistent

thumb_up_off_alt83

chat_bubble_outline2

repeat6

shareShare

Robert Youssef

@rryssf_

3 months ago

Holy shit… this might be the most unreal academic-writing upgrade I’ve ever seen 🤯 A team from NUS just dropped PaperDebugger an in-editor, multi-agent system that lives inside Overleaf and rewrites your paper with you in real time. Not copy-paste. Not a sidebar chatbot.

thumb_up_off_alt1,1K

chat_bubble_outline30

repeat194

shareShare

Yuvraj Singh

@yuvrajs9886

3 months ago

Finally, a neat way to understand the internal of basic RL algorithms!

thumb_up_off_alt439

chat_bubble_outline3

repeat38

shareShare

Beff – e/acc

@basedbeffjezos

3 months ago

One thing to appreciate about Demis is that he consistently provides the most unbiased estimator of AI progress because he doesn't have to keep raising capital to get to train his next model (has direct access to the Google money printer and infinite TPUs)

thumb_up_off_alt696

chat_bubble_outline28

repeat35

shareShare

rohan anil

@_arohan_

3 months ago

Everyone should stop what they are doing rn and hold on to your horses and read Andy’s post. I personally feel like a horse in ai research and coding. Computers will get better than me at both, even with more than two decades of experience writing code, I can only best them on

thumb_up_off_alt266

chat_bubble_outline16

repeat10

shareShare

Sander Dieleman

@sedielem

3 months ago

Really nice work combining a bunch of recent ideas that speed up training of diffusion models, including representation alignment, improved latent diffusability, token dropping and many more. Don't miss the list of things that didn't work in the appendix. Code is on GitHub!

thumb_up_off_alt212

chat_bubble_outline3

repeat20

shareShare

Tailin Wu

@tailin_wu

3 months ago

🔍 Beyond MeanFlow: A Unified Perspective for One-Step Diffusion We introduce ESC as explicit shortcut model, which explicitly explores, analyze and improves the design of one-step diffusion model. One-step diffusion is just getting started 👀 🚀 Why do recent one-step

thumb_up_off_alt242

chat_bubble_outline3

repeat33

shareShare

François Fleuret

@francoisfleuret

3 months ago

Because it's a domain of "extreme reliance" on data, ML is one of the least grateful branch of CS when it comes to rewarding "good ideas". You take your gorgeous idea, a jewel of abstraction and mathematical harmony, and you slap it on an irregular barbed pile of training data.

thumb_up_off_alt118

chat_bubble_outline5

repeat3

shareShare

Nathan Lambert

@natolambert

2 months ago

Reasoning model reports I recommend reading: 2025-01-22 - DeepSeek R1 - arxiv.org/abs/2501.12948 2025-01-22 - Kimi 1.5 - arxiv.org/abs/2501.12599 2025-03-31 - Open-Reasoner-Zero - arxiv.org/abs/2503.24290 2025-04-10 - Seed-Thinking 1.5 - arxiv.org/abs/2504.13914 2025-04-30 - Phi-4

thumb_up_off_alt487

chat_bubble_outline7

repeat97

shareShare

Boris Cherny

@bcherny

2 months ago

I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit. My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to

thumb_up_off_alt38,38K

chat_bubble_outline919

repeat4,4K

shareShare

Songlin Yang

@songlinyang4

2 months ago

To be fair, this is just DeltaNet applied in the residual stream.

thumb_up_off_alt257

chat_bubble_outline7

repeat23

shareShare

kache

@yacinemtb

2 months ago

i am currently watching joseph suarez change his mind live about claude code x.com/i/broadcasts/1…

thumb_up_off_alt231

chat_bubble_outline15

repeat6

shareShare

Lucas Beyer (bl16)

@giffmana

2 months ago

Quote-reply to Rohan because I think it can be interesting to many more. So there are two things you're missing here: 1) You're only looking at one specific instantiation of the general JEPA idea. There are many different instantiations. 2) The core JEPA idea (Joint Embedding

thumb_up_off_alt764

chat_bubble_outline39

repeat41

shareShare

Abhinav

@_abhinavj

2 months ago

i spent the last 4 days diving deep into flow matching and visualizing it inside vision-language-action models turning pure noise into coherent actions for robots to follow is beautiful here's the blog I wrote about it with visuals that made it click better for me:

thumb_up_off_alt761

chat_bubble_outline15

repeat66

shareShare

Vaishnavh Nagarajan

@_vaishnavh

2 months ago

1/ We found that deep sequence models memorize atomic facts "geometrically" -- not as an associative lookup table as often imagined. This opens up practical questions on reasoning/memory/discovery, and also poses a theoretical "memorization puzzle."

thumb_up_off_alt1,1K

chat_bubble_outline58

repeat244

shareShare