Zhengyang Qi (@qi_zhengyang) Twitter Tweets • TwiCopy

Zhengyang Qi

2 years ago

Just revalidated the saying that the more overwhelmed you feel designing the systems, the smoother your developing experience will be.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Zhengyang Qi

@qi_zhengyang

2 years ago

Glad to be one of the contributors. Really interesting project and Hao Zhu Xuhui Zhou are super responsive and helpful project leads.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Excited to share our work, "Skill Set Optimization", a continual learning method for LLM actors that: - Automatically extracts modular subgoals to use as skills - Reinforces skills using environment reward - Facilitates skill retrieval based on state allenai.github.io/sso 🧵

thumb_up_off_alt76

chat_bubble_outline1

repeat26

shareShare

Ruiyi Wang 王睿仪 🦋 @ruiyiwang

@ruiyiwang153

2 years ago

Happy PI🥧 day! Can language agents🤖 learn social skills through imitation and interaction? We are excited to introduce SOTOPIA-π🥧 pi.sotopia.world, an interactive learning method for training language agents to navigate real-world social scenarios while role-playing!

thumb_up_off_alt63

chat_bubble_outline1

repeat17

shareShare

Ruiyi Wang 王睿仪 🦋 @ruiyiwang

@ruiyiwang153

2 years ago

Want to learn more about SOTOPIA-π🥧? Check our website pi.sotopia.world and the full paper on arxiv: arxiv.org/abs/2403.08715

thumb_up_off_alt2

chat_bubble_outline0

repeat3

shareShare

Hao Zhu 朱昊

@_hao_zhu

2 years ago

Another update for Sotopia: I coded a simple Colab tutorial for you to try out Sotopia simulation and evaluation without setting it up on your own machine: colab.research.google.com/drive/14hJOfzp… Create your own characters + social scenarios, and use your favorite LLMs!

thumb_up_off_alt32

chat_bubble_outline0

repeat4

shareShare

Ruiyi Wang 王睿仪 🦋 @ruiyiwang

@ruiyiwang153

2 years ago

Excited to share that SOTOPIA-π was accepted to #ACL2024 main conference 🎉! Check our work on training socially intelligent language agents: arxiv.org/abs/2403.08715

thumb_up_off_alt38

chat_bubble_outline1

repeat10

shareShare

Haofei Yu 🦋 @haofeiyu.bsky.social

@haofeiyu44

9 months ago

🚀 RL works wonders for math & coding... But what about social intelligence? We introduce Sotopia-RL: Reward Design for Social Intelligence 🤝🧠 📘 rl.sotopia.world 📑 arxiv.org/abs/2508.03905 🔗 github.com/sotopia-lab/so…

thumb_up_off_alt94

chat_bubble_outline2

repeat23

shareShare

Bing Liu

@vbingliu

7 months ago

New @Scale_AI paper! The culprit behind reward hacking? We trace it to misspecification in high-reward tail. Our fix: rubric-based rewards to tell “excellent” responses apart from “great.” The result: Less hacking, stronger post-training! arxiv.org/pdf/2509.21500

thumb_up_off_alt178

chat_bubble_outline4

repeat39

shareShare

Fred Sala

@fredsala

6 months ago

The coolest trend for AI is shifting from conversation to action—less talking and more doing. This is also a great opportunity for evals: we need benchmarks that measure utility, including in an economic sense. terminalbench is my favorite effort of this type!

thumb_up_off_alt33

chat_bubble_outline1

repeat18

shareShare

Zhengyang Qi

@qi_zhengyang

6 months ago

Excited to be speaking at AGI House’s Self-Evolving Agents Build Day this weekend! Looking forward to connecting with other builders and researchers working at the edge of adaptive and self-evolving agents.

thumb_up_off_alt10

chat_bubble_outline1

repeat0

shareShare

Amanda Dsouza

@amanda_dsouza

6 months ago

🚨 New research from Snorkel AI tackles a critical problem: LLMs are evolving faster than our ability to evaluate them 📊 We develop BeTaL— Benchmark Tuning with an LLM-in-the-loop— a framework that automates benchmark design using reasoning models as optimizers. BeTaL produces

🚨 New research from <a href="/SnorkelAI/">Snorkel AI</a> tackles a critical problem: LLMs are evolving faster than our ability to evaluate them 📊

We develop BeTaL— Benchmark Tuning with an LLM-in-the-loop— a framework that automates benchmark design using reasoning models as optimizers.

BeTaL produces

thumb_up_off_alt25

chat_bubble_outline4

repeat12

shareShare

Armin

@arminpcm

6 months ago

New blog post: “Snorkeling in RL Environments” What makes a great RL environment for LLMs? We break down what’s needed to give agentic apps the reward signal they need to develop production-ready accuracy.

thumb_up_off_alt34

chat_bubble_outline2

repeat2

shareShare

Snorkel AI

@snorkelai

5 months ago

SnorkelAI is headed to #NeurIPS2025 this December with Fred Sala and the team. Come talk benchmarks, rubrics, RL envs, & more. 🔬✨

SnorkelAI is headed to #NeurIPS2025 this December with <a href="/fredsala/">Fred Sala</a> and the team.

Come talk benchmarks, rubrics, RL envs, & more. 🔬✨

thumb_up_off_alt25

chat_bubble_outline0

repeat6

shareShare

Snorkel AI

@snorkelai

5 months ago

NeurIPS lunch crew → Snorkel researchers + the always-great Tom Walshe If you’re at #NeurIPS2025, come say hi — and see everything else we’re doing this week (papers, workshops, events): snorkel.ai/neurips-event/

NeurIPS lunch crew → Snorkel researchers + the always-great <a href="/Walshe_tech/">Tom Walshe</a>

If you’re at #NeurIPS2025, come say hi — and see everything else we’re doing this week (papers, workshops, events): snorkel.ai/neurips-event/

thumb_up_off_alt23

chat_bubble_outline0

repeat2

shareShare

Justin Bauer

@realjustinbauer

3 months ago

Our paper “Learning from Less: Measuring the Effectiveness of RLVR in Low Data and Compute Regimes” was accepted to #MLSys 2026! We introduce three procedurally generated, verifiable datasets—Counting, Graph, and Spatial Reasoning—to study RLVR under low-data / low-compute

thumb_up_off_alt16

chat_bubble_outline2

repeat7

shareShare

Alex Ratner

@ajratner

3 months ago

Exciting mention of TBench 2.0 in today's model releases - congrats to Mike A. Merrill Alex Shaw & team + proud of Snorkel AI 's contributions! Benchmarks are just one (limited) measurement tool - but critical guideposts of frontier progress. Much more to build here ahead!

Exciting mention of TBench 2.0 in today's model releases - congrats to <a href="/Mike_A_Merrill/">Mike A. Merrill</a> <a href="/alexgshaw/">Alex Shaw</a> & team + proud of <a href="/SnorkelAI/">Snorkel AI</a> 's contributions!

Benchmarks are just one (limited) measurement tool - but critical guideposts of frontier progress. Much more to build here ahead!

thumb_up_off_alt40

chat_bubble_outline0

repeat9

shareShare

Zhengyang Qi

@qi_zhengyang

3 months ago

Snorkel is launching a cool program for funding open benchmark and evaluation research. If you have cool benchmark ideas don’t hesitate to contact us and we can make it happen!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Zhengyang Qi

@qi_zhengyang

3 months ago

Super excited to be a part of a company that supports open source! 🔥

thumb_up_off_alt10

chat_bubble_outline2

repeat0

shareShare

Zhengyang Qi

Zhengyang Qi

Zhengyang Qi

Kolby Nottingham

Ruiyi Wang 王睿仪 🦋 @ruiyiwang

Ruiyi Wang 王睿仪 🦋 @ruiyiwang

Hao Zhu 朱昊

Ruiyi Wang 王睿仪 🦋 @ruiyiwang

Haofei Yu 🦋 @haofeiyu.bsky.social

Bing Liu

Fred Sala

Zhengyang Qi

Amanda Dsouza

Armin

Snorkel AI

Snorkel AI

Justin Bauer

Alex Ratner

Zhengyang Qi

Zhengyang Qi