Yihe Deng (@yihe__deng) Twitter Tweets • TwiCopy

Yihe Deng

@yihe__deng

+ Follow

CS PhD candidate @UCLA, Student Researcher @GoogleAI | Prev. Research Intern @MSFTResearch @AWS | LLM post-training, synthetic data

ID: 1462223072203722756

linkhttps://yihe-deng.notion.site/Yihe-Deng-167ab2d2c1fb80b3a76dfb120f716c84 calendar_today21-11-2021 00:55:36

175 Tweet

2,2K Takipçi

1,1K Takip Edilen

Gate.io

@gate_io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

😄I did a brief intro of RLHF algorithms for the reading group presentation of our lab. It was a good learning experience for me and I want to share the github repo here holds the slides as well as the list of interesting papers: github.com/yihedeng9/rlhf… Would love to hear about

thumb_up_off_alt260

chat_bubble_outline7

repeat26

shareShare

Kaiyu Yang

@kaiyuyang4

6 months ago

🚀 Excited to share our position paper: "Formal Mathematical Reasoning: A New Frontier in AI"! 🔗 arxiv.org/abs/2412.16075 LLMs like o1 & o3 have tackled hard math problems by scaling test-time compute. What's next for AI4Math? We advocate for formal mathematical reasoning,

thumb_up_off_alt570

chat_bubble_outline19

repeat125

shareShare

Daniel Han

@danielhanchen

6 months ago

Cool things from DeepSeek v3's paper: 1. Float8 uses E4M3 for forward & backward - no E5M2 2. Every 4th FP8 accumulate adds to master FP32 accum 3. Latent Attention stores C cache not KV cache 4. No MoE loss balancing - dynamic biases instead More details: 1. FP8: First large

thumb_up_off_alt1,1K

chat_bubble_outline16

repeat257

shareShare

Zongyu Lin

@zy27962986

4 months ago

Interested in the combination of Inference time scaling + LLM Agent?🤖💭 Announcing QLASS (Q-guided Language Agent Stepwise Search, arxiv.org/abs/2502.02584), a framework that supercharges language agents at inference time. ⚡In this work, we build a process reward model to guide

thumb_up_off_alt173

chat_bubble_outline10

repeat54

shareShare

Yihe Deng

@yihe__deng

4 months ago

New paper & model release! Excited to introduce DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails, showcasing our new DuoGuard-0.5B model. - Model: huggingface.co/DuoGuard/DuoGu… - Paper: arxiv.org/abs/2502.05163 - GitHub: github.com/yihedeng9/DuoG… Grounded in a

thumb_up_off_alt133

chat_bubble_outline2

repeat30

shareShare

Wanjia Zhao

@wanjiazhao1203

4 months ago

Introducing #SIRIUS🌟: A self-improving multi-agent LLM framework that learns from successful interactions and refines failed trajectories, enhancing college-level reasoning and competitive negotiations. 📜Preprint: arxiv.org/pdf/2502.04780 💻code: github.com/zou-group/siri… 1/N

thumb_up_off_alt326

chat_bubble_outline8

repeat57

shareShare

Yong Lin

@yong18850571

4 months ago

🚀 Exciting news! Our Goedel-Prover paper is now live on arXiv: arxiv.org/pdf/2502.07640 🎉 We're currently developing the RL version and have a stronger checkpoint than before (currently not included in the report)!🚀🚀🚀 Plus, we’ll be open-sourcing 1.64M formalized

thumb_up_off_alt135

chat_bubble_outline7

repeat39

shareShare

DeepSeek

@deepseek_ai

4 months ago

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With

thumb_up_off_alt16,16K

chat_bubble_outline901

repeat2,2K

shareShare

Ziniu Li @ ICLR2025

@ziniuli

4 months ago

🌟 Can better cold start strategies improve RL training for LLMs? 🤖 I’ve written a blog that delves into the challenges of fine-tuning LLMs during the cold-start phase and how the strategies applied there can significantly impact RL performance in complex reasoning tasks that

thumb_up_off_alt166

chat_bubble_outline3

repeat32

shareShare

Ge Zhang

@gezhang86038849

4 months ago

[1/n] SuperExcited to announce SuperGPQA!!! We spend more than half a year to finally make it done! SuperGPQA is a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. It also provides the largest human-LLM

thumb_up_off_alt215

chat_bubble_outline5

repeat50

shareShare

Siyan Zhao

@siyan_zhao

4 months ago

Excited to release PrefEval (ICLR '25 Oral), a benchmark for evaluating LLMs’ ability to infer, memorize, and adhere to user preferences in long-context conversations! ⚠️We find that cutting-edge LLMs struggle to follow user preferences—even in short contexts. This isn't just

thumb_up_off_alt133

chat_bubble_outline3

repeat26

shareShare

Zhiqing Sun

@edwardsun0909

4 months ago

We’re rolling out Deep Research to Plus users today! Deep Research was the biggest “Feel The AGI” moment I’ve ever had since ChatGPT. And I’m glad more people will experience their first AGI moment! The team also worked super hard to make more tools including image citations /

thumb_up_off_alt498

chat_bubble_outline26

repeat29

shareShare

Yihe Deng

@yihe__deng

4 months ago

🤖 I just updated my repository of RL(HF) summary notes to include a growing exploration of new topics, specifically adding notes to projects related to DeepSeek R1 reasoning. Take a look: github.com/yihedeng9/rlhf… 🚀 I’m hoping these summaries are helpful, and I’d love to hear

thumb_up_off_alt100

chat_bubble_outline1

repeat12

shareShare

Yihe Deng

@yihe__deng

3 months ago

🗞️Arxiv: arxiv.org/abs/2503.17352

thumb_up_off_alt72

chat_bubble_outline0

repeat10

shareShare

Yihe Deng

@yihe__deng

3 months ago

Thanks AK for sharing our work!

thumb_up_off_alt27

chat_bubble_outline0

repeat2

shareShare

Siyan Zhao

@siyan_zhao

2 months ago

Introducing d1🚀 — the first framework that applies reinforcement learning to improve reasoning in masked diffusion LLMs (dLLMs). Combining masked SFT with a novel form of policy gradient algorithm, d1 significantly boosts the performance of pretrained dLLMs like LLaDA.

thumb_up_off_alt564

chat_bubble_outline8

repeat101

shareShare