Andrew Zhao (@andrewz45732491) 's Twitter Profile
Andrew Zhao

@andrewz45732491

PhD Candidate @Tsinghua_Uni.
ExpeL,Diver-CT
Ex. Research Intern @MSFTResearch, @ BIGAI.
Interested in RL, LLM Reasoning/Safety, LLM-based Agents.

ID: 1301324575125286912

linkhttps://andrewzh112.github.io/ calendar_today03-09-2020 01:02:20

420 Tweet

219 Followers

2,2K Following

jack morris (@jxmnop) 's Twitter Profile Photo

NEW RESEARCH: Approximating Language Model Training Data from Weights ever wonder how much information is available in an open-weights model? DeepSeek R1 weights are 1.2 TB... what can we learn from all those bits? our method reverses LLM finetuning to recover data: 🧵

NEW RESEARCH:   Approximating Language Model Training Data from Weights

ever wonder how much information is available in an open-weights model?  

DeepSeek R1 weights are 1.2 TB... 

what can we learn from all those bits?

our method reverses LLM finetuning to recover data: 🧵
Kimi.ai (@kimi_moonshot) 's Twitter Profile Photo

Benchmarks aside, It thinks: → 23 reasoning steps per task (avg.) → 200+ URLs explored → Multi-turn tool use of search, browser, and code → Inline citations Beta access is rolling out at kimi.com — get on the waitlist 👉 [docs.google.com/forms/d/e/1FAI…]

León (@leonguertler) 's Twitter Profile Photo

For the past ~2 months we have been working on training reasoning models on TextArena games. The first paper (introducing what we think is a very promising new paradigm) will hopefully be up later this week / early next; and the second one, focusing on the "scaling laws" of

Nouha Dziri (@nouhadziri) 's Twitter Profile Photo

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? 

Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬

 We built a benchmark to find out → OMEGA Ω 📐

💥 We found
vittorio (@iterintellectus) 's Twitter Profile Photo

holy shit, it’s here! deepmind just released AlphaGenome. an AI model that reads 1 million bases of DNA and predicts how any mutation changes molecular function not just in single genes but across the entire regulatory genome. DNA is code, and you are software 1/

holy shit, it’s here!

deepmind just released AlphaGenome. 
an AI model that reads 1 million bases of DNA and predicts how any mutation changes molecular function

not just in single genes but across the entire regulatory genome.

DNA is code, and you are software
1/
Shenzhi Wang🌟 (@shenzhiwang_thu) 's Twitter Profile Photo

🔥明天(2025.06.26)北京时间10:30到11:30,我将线上直播讲解我们Qwen团队和清华Leaplab团队的最近比较火的文章《Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning》 🌟欢迎感兴趣的朋友参加! 直播报名链接:event.baai.ac.cn/activities/935

🔥明天(2025.06.26)北京时间10:30到11:30,我将线上直播讲解我们Qwen团队和清华Leaplab团队的最近比较火的文章《Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning》
🌟欢迎感兴趣的朋友参加!
直播报名链接:event.baai.ac.cn/activities/935
Rohan Pandey (@khoomeik) 's Twitter Profile Photo

the CIA is not ready for the RL era israeli intelligence guy just hacked into a live surveillance camera in front of me with an exploit generated by qwen vulnerable software is simulatable. penetration success is verifiable. hacking is RLable.

the CIA is not ready for the RL era

israeli intelligence guy just hacked into a live surveillance camera in front of me with an exploit generated by qwen

vulnerable software is simulatable.
penetration success is verifiable.
hacking is RLable.
gensyn (@gensynai) 's Twitter Profile Photo

1/ Introducing RL Swarm’s new backend: GenRL. A modular reinforcement learning library built for distributed, fault-tolerant training - now powering RL Swarm from the ground up. 🧵

Prime Intellect (@primeintellect) 's Twitter Profile Photo

We did it — SYNTHETIC‑2 is complete. A planetary-scale decentralized inference run generating 4M verified reasoning samples. 1,250+ GPUs joined in 3 days — from 4090s to H200s — creating data for complex RL tasks. Full open-source release + technical report coming next week!

We did it — SYNTHETIC‑2 is complete.

A planetary-scale decentralized inference run generating 4M verified reasoning samples.

1,250+ GPUs joined in 3 days — from 4090s to H200s — creating data for complex RL tasks.

Full open-source release + technical report coming next week!