An Yan (@anyan_ai) 's Twitter Profile
An Yan

@anyan_ai

@SFResearch Prev-@UCSanDiego @Mircosoft Working on Vision-Language.

ID: 1624182047655989249

linkhttps://zzxslp.github.io/ calendar_today10-02-2023 23:04:01

269 Tweet

80 Takipçi

296 Takip Edilen

Omar Sanseviero (@osanseviero) 's Twitter Profile Photo

Want to learn about the research behind Gemma 3n? Altup - arxiv.org/abs/2301.13310 LAuReL - arxiv.org/abs/2411.07501 MatFormer - arxiv.org/abs/2310.07707 Activation sparsity - arxiv.org/abs/2506.06644 Universal Speech Model - arxiv.org/abs/2303.01037 Blog - developers.googleblog.com/en/introducing…

surya (@suryasure05) 's Twitter Profile Photo

I spent my summer building TinyTPU : An open source ML inference and training chip. it can do end to end inference + training ENTIRELY on chip. here's how I did it👇:

Igor Kotenkov (@stalkermustang) 's Twitter Profile Photo

I'm not sure why this new ByteDance Seed paper is not all over my feed. Am I missing something? - trained Qwen2VL-7B to play genshin - SFT only, no RL - 2424 hours of human gameplay + 15k short reasoning traces to decompose the tasks - sub 20k H100 hours (3 epochs) - heaps of

I'm not sure why this new ByteDance Seed paper is not all over my feed. Am I missing something?

- trained Qwen2VL-7B to play genshin
- SFT only, no RL
- 2424 hours of human gameplay + 15k short reasoning traces to decompose the tasks
- sub 20k H100 hours (3 epochs)
- heaps of
POM (@peteromallet) 's Twitter Profile Photo

This is such a cool QwenEdit LoRA by Mohamed Oumoumad - among the best retexturing I've seen from a diffusion model w/ precise control. On top of models like Z-Image, LoRAs like this will be able to do tasks c. 20 times faster and cheaper than Nano Banana Pro + w/ more consistent

Robert Youssef (@rryssf_) 's Twitter Profile Photo

Holy shit… this might be the most unreal academic-writing upgrade I’ve ever seen 🤯 A team from NUS just dropped PaperDebugger an in-editor, multi-agent system that lives inside Overleaf and rewrites your paper with you in real time. Not copy-paste. Not a sidebar chatbot.

Holy shit… this might be the most unreal academic-writing upgrade I’ve ever seen 🤯

A team from NUS just dropped PaperDebugger an in-editor, multi-agent system that lives inside Overleaf and rewrites your paper with you in real time.

Not copy-paste. Not a sidebar chatbot.
Beff – e/acc (@basedbeffjezos) 's Twitter Profile Photo

One thing to appreciate about Demis is that he consistently provides the most unbiased estimator of AI progress because he doesn't have to keep raising capital to get to train his next model (has direct access to the Google money printer and infinite TPUs)

rohan anil (@_arohan_) 's Twitter Profile Photo

Everyone should stop what they are doing rn and hold on to your horses and read Andy’s post. I personally feel like a horse in ai research and coding. Computers will get better than me at both, even with more than two decades of experience writing code, I can only best them on

Sander Dieleman (@sedielem) 's Twitter Profile Photo

Really nice work combining a bunch of recent ideas that speed up training of diffusion models, including representation alignment, improved latent diffusability, token dropping and many more. Don't miss the list of things that didn't work in the appendix. Code is on GitHub!

Tailin Wu (@tailin_wu) 's Twitter Profile Photo

🔍 Beyond MeanFlow: A Unified Perspective for One-Step Diffusion We introduce ESC as explicit shortcut model, which explicitly explores, analyze and improves the design of one-step diffusion model. One-step diffusion is just getting started 👀 🚀 Why do recent one-step

🔍 Beyond MeanFlow: A Unified Perspective for One-Step Diffusion

We introduce ESC as explicit shortcut model, which explicitly explores, analyze and improves the design of one-step diffusion model.  One-step diffusion is just getting started 👀

🚀 Why do recent one-step
François Fleuret (@francoisfleuret) 's Twitter Profile Photo

Because it's a domain of "extreme reliance" on data, ML is one of the least grateful branch of CS when it comes to rewarding "good ideas". You take your gorgeous idea, a jewel of abstraction and mathematical harmony, and you slap it on an irregular barbed pile of training data.

Nathan Lambert (@natolambert) 's Twitter Profile Photo

Reasoning model reports I recommend reading: 2025-01-22 - DeepSeek R1 - arxiv.org/abs/2501.12948 2025-01-22 - Kimi 1.5 - arxiv.org/abs/2501.12599 2025-03-31 - Open-Reasoner-Zero - arxiv.org/abs/2503.24290 2025-04-10 - Seed-Thinking 1.5 - arxiv.org/abs/2504.13914 2025-04-30 - Phi-4

Boris Cherny (@bcherny) 's Twitter Profile Photo

I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit. My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to

Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

Quote-reply to Rohan because I think it can be interesting to many more. So there are two things you're missing here: 1) You're only looking at one specific instantiation of the general JEPA idea. There are many different instantiations. 2) The core JEPA idea (Joint Embedding

Abhinav (@_abhinavj) 's Twitter Profile Photo

i spent the last 4 days diving deep into flow matching and visualizing it inside vision-language-action models turning pure noise into coherent actions for robots to follow is beautiful here's the blog I wrote about it with visuals that made it click better for me:

Vaishnavh Nagarajan (@_vaishnavh) 's Twitter Profile Photo

1/ We found that deep sequence models memorize atomic facts "geometrically" -- not as an associative lookup table as often imagined. This opens up practical questions on reasoning/memory/discovery, and also poses a theoretical "memorization puzzle."