Yufan Zhuang (@yufan_zhuang) Twitter Tweets • TwiCopy

Lisan al Gaib

@scaling01

3 months ago

DeepSeek V3.1 beats Claude 4 Opus on Aider Polyglot This makes it the best non-TTC coding model and all of that for ~$1

thumb_up_off_alt1,1K

chat_bubble_outline51

repeat117

shareShare

Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.

thumb_up_off_alt4,4K

chat_bubble_outline150

repeat659

shareShare

Samuel Schmidgall

@srschmidgall

3 months ago

Our paper on autonomous scientific research is accepted to Findings of #EMNLP2025! 🎉 We introduce Agent Laboratory, a framework that accelerates scientific discovery by teaming human researchers with LLM agents.

thumb_up_off_alt1,1K

chat_bubble_outline11

repeat227

shareShare

Yanda Chen

@yanda_chen_

3 months ago

Our results overall suggest that we can effectively separate harmful from harmless data and use pretraining data filtering to improve model safety without compromising usefulness. Big thanks to the team! 🙏 Mycal Tucker, Nina, Tony Wang 🐨, Francesco Mosconi,

thumb_up_off_alt76

chat_bubble_outline2

repeat10

shareShare

tensorqt

@tensorqt

3 months ago

attention sinks may be a bias in causal transformers. as some of you know, i've been writing a long blogpost on attention and its properties as a message-passing operation on graphs. while doing so, i figured i might have found an explanation for which attention sinks may be an

thumb_up_off_alt824

chat_bubble_outline33

repeat83

shareShare

SemiAnalysis

@semianalysis_

3 months ago

TogetherAI's Chief Scientist Tri Dao announced Flash Attention v4 at HotChips Conference which is up to 22% faster than the attention kernel implementation from NVIDIA's cuDNN library. Tri Dao was able to achieve this 2 key algorithmic changes. Firstly, it uses a new online

TogetherAI's Chief Scientist <a href="/tri_dao/">Tri Dao</a> announced Flash Attention v4 at HotChips Conference which is up to 22% faster than the attention kernel implementation from NVIDIA's cuDNN library. Tri Dao was able to achieve this 2 key algorithmic changes. Firstly, it uses a new online

thumb_up_off_alt604

chat_bubble_outline24

repeat70

shareShare

Rocky Duan

@rocky_duan

3 months ago

We're hiring interns (and full-times) all year long! Please email me if interested.

thumb_up_off_alt1,1K

chat_bubble_outline41

repeat85

shareShare

Tim Cook

@tim_cook

3 months ago

Get ready for an awe dropping #AppleEvent on Tuesday, September 9!

thumb_up_off_alt70,70K

chat_bubble_outline3,3K

repeat8,8K

shareShare

Pan Lu

@lupantech

3 months ago

🔔 Two months ago, we released #IneqMath, which revealed the Soundness Gap: LLMs can guess answers to Olympiad-level inequalities problems, but still struggle to make rigorous proof steps. Since then, it's been downloaded 4K+ times on HuggingFace! ➡️ ineqmath.github.io

thumb_up_off_alt26

chat_bubble_outline0

repeat7

shareShare

Liyuan Liu (Lucas)

@liyuanlucas

2 months ago

appreciate Thinking Machines taking an open research approach! excited to see the first blog mentioned our work! truly on-policy RL is like RTX3090 for gamers in 2020 - you really want it, but the blockers make your head itch… kernel mismatches, parallelism mismatches, etc. etc.

thumb_up_off_alt238

chat_bubble_outline3

repeat16

shareShare

Zilong (Ryan) Wang

@zlwang_cs

2 months ago

🤖 RLVR is great for aligning LLMs — but what about optimizing multiple objectives at once? Different rewards have different learning difficulty & saturation rates ⚖️ Introducing my intern Yining Lu 's work 🎓 Dynamic Reward Weighting 🔀 – Adapts weights online as training

thumb_up_off_alt31

chat_bubble_outline0

repeat6

shareShare

alphaXiv

@askalphaxiv

2 months ago

This new paper suggests that LLM ‘aha moments’ arise from an emergent planning-vs-execution hierarchy, similar to HRM’s slow-planner/fast-executor idea So they proposed HICRA which amplifies per-token credit on scarce planning tokens, focusing strategy & often beating GRPO!

thumb_up_off_alt390

chat_bubble_outline15

repeat69

shareShare

Yufan Zhuang

@yufan_zhuang

2 months ago

😺Glad to share that our mixture-of-inputs paper is accepted at NeurIPS, see you in San Diego!

thumb_up_off_alt13

chat_bubble_outline0

repeat3

shareShare

Yufan Zhuang

@yufan_zhuang

2 months ago

R1 started this fastly evolving era 🐋

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Dheeraj Mekala

@mekaladheeraj

2 months ago

Super excited to share GAIA2 & ARE! ARE - research platform for scalable creation of RL environments. GAIA2 - successor of GAIA for evaluating agents in a smartphone-like environment.

thumb_up_off_alt29

chat_bubble_outline1

repeat5

shareShare

Qwen

@alibaba_qwen

2 months ago

🚀 Introducing Qwen3-Omni — the first natively end-to-end omni-modal AI unifying text, image, audio & video in one model — no modality trade-offs! 🏆 SOTA on 22/36 audio & AV benchmarks 🌍 119L text / 19L speech in / 10L speech out ⚡ 211ms latency | 🎧 30-min audio

thumb_up_off_alt4,4K

chat_bubble_outline116

repeat799

shareShare

Riley Walz

@rtwlz

2 months ago

I reverse engineered the San Francisco parking ticket system. I can see every ticket seconds after it's written So I made a website. Find My Friends? AVOID THE PARKING COPS.

thumb_up_off_alt25,25K

chat_bubble_outline463

repeat1,1K

shareShare

Da Yu

@dayu85201802

2 months ago

✨ Internship Opportunity @ Google Research ✨ We are seeking a self-motivated student researcher to join our team at Google Research starting around January 2026. 🚀 In this role, you will contribute to research projects advancing agentic LLMs through tool use and RL, with the

thumb_up_off_alt837

chat_bubble_outline14

repeat95

shareShare

Pan Lu

@lupantech

2 months ago

🔥Introducing #AgentFlow, a new trainable agentic system where a team of agents learns to plan and use tools in the flow of a task. 🌐agentflow.stanford.edu 📄huggingface.co/papers/2510.05… AgentFlow unlocks full potential of LLMs w/ tool-use. (And yes, our 3/7B model beats GPT-4o)👇

thumb_up_off_alt854

chat_bubble_outline22

repeat180

shareShare

Yufan Zhuang

@yufan_zhuang

2 months ago

🤩 LongdLLM extends diffusion LM’s capability to an impressive 131k with simple edits! Great work from Albert!

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Yufan Zhuang

Lisan al Gaib

Sebastien Bubeck

Samuel Schmidgall

Yanda Chen

tensorqt

SemiAnalysis

Rocky Duan

Tim Cook

Pan Lu

Liyuan Liu (Lucas)

Zilong (Ryan) Wang

alphaXiv

Yufan Zhuang

Yufan Zhuang

Dheeraj Mekala

Qwen

Riley Walz

Da Yu

Pan Lu

Yufan Zhuang