Andrew Zhao (@andrewz45732491) Twitter Tweets • TwiCopy

Andrew Zhao

@andrewz45732491

+ Follow

PhD Candidate @Tsinghua_Uni.
ExpeL,Diver-CT
Ex. Research Intern @MSFTResearch, @ BIGAI.
Interested in RL, LLM Reasoning/Safety, LLM-based Agents.

ID: 1301324575125286912

linkhttps://andrewzh112.github.io/ calendar_today03-09-2020 01:02:20

420 Tweet

219 Takipçi

2,2K Takip Edilen

jack morris

@jxmnop

6 months ago

NEW RESEARCH: Approximating Language Model Training Data from Weights ever wonder how much information is available in an open-weights model? DeepSeek R1 weights are 1.2 TB... what can we learn from all those bits? our method reverses LLM finetuning to recover data: 🧵

thumb_up_off_alt1,1K

chat_bubble_outline23

repeat116

shareShare

Kimi.ai

@kimi_moonshot

6 months ago

Benchmarks aside, It thinks: → 23 reasoning steps per task (avg.) → 200+ URLs explored → Multi-turn tool use of search, browser, and code → Inline citations Beta access is rolling out at kimi.com — get on the waitlist 👉 [docs.google.com/forms/d/e/1FAI…]

thumb_up_off_alt96

chat_bubble_outline4

repeat6

shareShare

Andrew Zhao

@andrewz45732491

6 months ago

notion drops are back

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Andrew Zhao

@andrewz45732491

6 months ago

Moonshot does it again, nice deep research + RL work moonshotai.github.io/Kimi-Researche…

thumb_up_off_alt435

chat_bubble_outline3

repeat71

shareShare

Flood Sung

@roteksong

6 months ago

Our first End2End RL Trained Agent is out! Hope you like it!

thumb_up_off_alt121

chat_bubble_outline3

repeat13

shareShare

will brown

@willccbb

6 months ago

had a blast talking with himanshu about all things RL + agents :) 2hr vid is up now!

thumb_up_off_alt215

chat_bubble_outline6

repeat15

shareShare

Jason Weston

@jaseweston

6 months ago

Reasoning, Attention & Memory Workshop @ COLM Submission Deadline: June 23, 2025 -- Today!

thumb_up_off_alt51

chat_bubble_outline1

repeat13

shareShare

León

@leonguertler

6 months ago

For the past ~2 months we have been working on training reasoning models on TextArena games. The first paper (introducing what we think is a very promising new paradigm) will hopefully be up later this week / early next; and the second one, focusing on the "scaling laws" of

thumb_up_off_alt308

chat_bubble_outline2

repeat51

shareShare

will brown

@willccbb

6 months ago

very bullish for generative RMs super cool to see people training them at scale :)

thumb_up_off_alt102

chat_bubble_outline3

repeat5

shareShare

Andrew Zhao

@andrewz45732491

6 months ago

More story line for the next merch drop

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

will brown

@willccbb

6 months ago

thumb_up_off_alt141

chat_bubble_outline8

repeat7

shareShare

Nouha Dziri

@nouhadziri

6 months ago

📢 Can LLMs really reason outside the box in math? Or are they just remixing familiar strategies? Remember DeepSeek R1, o1 have impressed us on Olympiad-level math but also they were failing at simple arithmetic 😬 We built a benchmark to find out → OMEGA Ω 📐 💥 We found

thumb_up_off_alt714

chat_bubble_outline22

repeat157

shareShare

Andrew Zhao

@andrewz45732491

6 months ago

some things we should CoT more about

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Andrew Zhao

@andrewz45732491

6 months ago

word

thumb_up_off_alt8

chat_bubble_outline1

repeat0

shareShare

vittorio

@iterintellectus

6 months ago

holy shit, it’s here! deepmind just released AlphaGenome. an AI model that reads 1 million bases of DNA and predicts how any mutation changes molecular function not just in single genes but across the entire regulatory genome. DNA is code, and you are software 1/

thumb_up_off_alt6,6K

chat_bubble_outline204

repeat955

shareShare

Shenzhi Wang🌟

@shenzhiwang_thu

6 months ago

🔥明天（2025.06.26）北京时间10:30到11:30，我将线上直播讲解我们Qwen团队和清华Leaplab团队的最近比较火的文章《Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning》 🌟欢迎感兴趣的朋友参加！直播报名链接：event.baai.ac.cn/activities/935

thumb_up_off_alt16

chat_bubble_outline0

repeat5

shareShare

Rohan Pandey

@khoomeik

6 months ago

the CIA is not ready for the RL era israeli intelligence guy just hacked into a live surveillance camera in front of me with an exploit generated by qwen vulnerable software is simulatable. penetration success is verifiable. hacking is RLable.

thumb_up_off_alt4,4K

chat_bubble_outline105

repeat333

shareShare

gensyn

@gensynai

6 months ago

1/ Introducing RL Swarm’s new backend: GenRL. A modular reinforcement learning library built for distributed, fault-tolerant training - now powering RL Swarm from the ground up. 🧵

thumb_up_off_alt294

chat_bubble_outline42

repeat58

shareShare

Prime Intellect

@primeintellect

6 months ago

We did it — SYNTHETIC‑2 is complete. A planetary-scale decentralized inference run generating 4M verified reasoning samples. 1,250+ GPUs joined in 3 days — from 4090s to H200s — creating data for complex RL tasks. Full open-source release + technical report coming next week!

thumb_up_off_alt816

chat_bubble_outline33

repeat96

shareShare

Roy

@im_roy_lee

6 months ago

introducing Cluely. today is the start of a world where you never have to think again. we just killed 9 industries (thread):

thumb_up_off_alt6,6K

chat_bubble_outline689

repeat363

shareShare