Simon Yu (@simon_ycl) Twitter Tweets • TwiCopy

Simon Yu

a month ago

Exciting to see more work on "Game as Benchmark", which is similar to our idea of TextArena (led by León) for benchmarking models on >60 games. though you can see GM Magnus Carlsen's comments on LLMs chess play 🔥

Exciting to see more work on "Game as Benchmark", which is similar to our idea of TextArena (led by <a href="/LeonGuertler/">León</a>) for benchmarking models on >60 games.

though you can see GM <a href="/MagnusCarlsen/">Magnus Carlsen</a>'s comments on LLMs chess play 🔥

thumb_up_off_alt59

chat_bubble_outline0

repeat8

shareShare

Bo Liu (Benjamin Liu)

@benjamin_eecs

a month ago

one day they will create their own game arena :)

thumb_up_off_alt12

chat_bubble_outline0

repeat1

shareShare

will brown

@willccbb

a month ago

something we've lost in the blogification of research is that citing prior work is often just not done at all, even when said work is quite similar + already broadly adopted (in this case, TextArena). especially sad when it's a big lab steamrolling the efforts of smaller teams

thumb_up_off_alt421

chat_bubble_outline12

repeat18

shareShare

will brown

@willccbb

a month ago

TextArena is one of my favorite projects of the year. i use it near-daily for RL experiments they've got an awesome interactive site, multiple RL frameworks, and a really great paper. check it out if you haven't: ui: textarena.ai gh: github.com/LeonGuertler/T… paper:

thumb_up_off_alt107

chat_bubble_outline2

repeat8

shareShare

León

@leonguertler

a month ago

Very exciting to see others interested in using games to eval relative performance of frontier models as well :) And it finally solves to mystery of who has been downloading TextArena so much (80k downloads are via uv from the same kernel, so I just presume it's the google mono

thumb_up_off_alt28

chat_bubble_outline1

repeat2

shareShare

Orby AI

@orbyai

a month ago

From knowing to doing. The next evolution in AI isn't just about understanding language—it's about taking action. Large Language Models (LLMs) are expert advisors. Large Action Models (LAMs) are reliable teammates. Read the breakdown here: orby.ai/blogs/why-larg… (1/5) #AI

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

Coalition on Digital Impact (CODI)

@codi_global

a month ago

🌍 Language shapes how we think and connect—but most AI models still struggle beyond English. Microsoft's July seminar discussed how we can bridge the gap and build #AIforEveryone with Marzieh Fadaee of Cohere Labs. 📽️microsoft.com/en-us/research…

thumb_up_off_alt17

chat_bubble_outline0

repeat2

shareShare

Sara Hooker

@sarahookr

a month ago

A little thank you from the Cohere Labs team. ✨ Thank you to everyone who has supported our work -- we just hit a special milestone. We have released 100 papers involving more than 150 institutions. 🔥

A little thank you from the <a href="/Cohere_Labs/">Cohere Labs</a> team. ✨

Thank you to everyone who has supported our work -- we just hit a special milestone.

We have released 100 papers involving more than 150 institutions. 🔥

thumb_up_off_alt180

chat_bubble_outline4

repeat13

shareShare

will brown

@willccbb

25 days ago

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) kalomaze Teknium (e/λ) it’s a nice idea, totally seems plausible that you can approximate some aspects of offline RL with a tweaked SFT objective, though for these experiments the most likely story is it’s triggering the same mode-collapse behavior that boots scores in many malformed Qwen GRPO setups

thumb_up_off_alt8

chat_bubble_outline1

repeat1

shareShare

henry

@arithmoquine

23 days ago

new post. there's a lot in it. i suggest you check it out

thumb_up_off_alt2,2K

chat_bubble_outline71

repeat183

shareShare

jack morris

@jxmnop

21 days ago

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵

thumb_up_off_alt6,6K

chat_bubble_outline151

repeat458

shareShare

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxestex

21 days ago

I like adversarial games for reasoning more than other R-zero type paradigms

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

Multi-Turn Interaction LLM Workshop @ NeurIPS 2025

@mti_neurips

21 days ago

🚀 Still have a chance to submit to NeurIPS Conference for our Multi-Turn Workshop! 🏆 Best Paper Awards 🎓 10-15 Registration Waivers for student authors 🎤 New panelist: will brown from @primeintellect! ⏳ Deadline is August 22—only 10 days left! 🎉 Thanks to our sponsor

🚀 Still have a chance to submit to <a href="/NeurIPSConf/">NeurIPS Conference</a> for our Multi-Turn Workshop!

🏆 Best Paper Awards
🎓 10-15 Registration Waivers for student authors
🎤 New panelist: <a href="/willccbb/">will brown</a> from @primeintellect!
⏳ Deadline is August 22—only 10 days left!

🎉 Thanks to our sponsor

thumb_up_off_alt83

chat_bubble_outline2

repeat15

shareShare

Multi-Turn Interaction LLM Workshop @ NeurIPS 2025

@mti_neurips

14 days ago

🚀 Another exciting news! We're thrilled to announce our second sponsor: Meta! Thank you for the generous support of our Multi-Turn Interaction Workshop at NeurIPS Conference! 🎓 With Meta's support, we're offering 15 registration fee waivers for early-stage researchers. 🎉 We're

🚀 Another exciting news! We're thrilled to announce our second sponsor: <a href="/Meta/">Meta</a>! Thank you for the generous support of our Multi-Turn Interaction Workshop at <a href="/NeurIPSConf/">NeurIPS Conference</a>!

🎓 With Meta's support, we're offering 15 registration fee waivers for early-stage researchers.
🎉 We're

thumb_up_off_alt70

chat_bubble_outline3

repeat10

shareShare

Weiyan Shi

@shi_weiyan

14 days ago

Thanks Meta for sponsoring our workshop! 🩷15 free tickets for students! 🩷 Deadline extended to 9/1/2025, a few more days to work on multi-turn interaction in LLMs!

thumb_up_off_alt86

chat_bubble_outline1

repeat5

shareShare

Delip Rao e/σ

@deliprao

7 days ago

The good folks behind LiteLLM (YC W23) seem to maintain this for anyone else trying to solve this problem. So up-to-date that even nano-🍌 is in there: See: docs.litellm.ai/docs/proxy/cos… github.com/BerriAI/litell… Thanks to Simon Yu for pointing this out.

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Prime Intellect

@primeintellect

7 days ago

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

thumb_up_off_alt1,1K

chat_bubble_outline83

repeat254

shareShare

Multi-Turn Interaction LLM Workshop @ NeurIPS 2025

@mti_neurips

6 days ago

📢 4 days left to submit to the Workshop on Multi-Turn Interaction for LLMs at #NeurIPS2025! Exciting updates: 🥂 We're partnering with Prime Intellect to co-host a post-event reception! A great chance to connect with researchers from industry & academia. 🤖 Thrilled to have

📢 4 days left to submit to the Workshop on Multi-Turn Interaction for LLMs at #NeurIPS2025!

Exciting updates:
🥂 We're partnering with <a href="/PrimeIntellect/">Prime Intellect</a> to co-host a post-event reception! A great chance to connect with researchers from industry & academia.

🤖 Thrilled to have

thumb_up_off_alt59

chat_bubble_outline5

repeat14

shareShare

Simon Yu

@simon_ycl

6 days ago

thanks to will brown and Prime Intellect for their support at our workshop! they also just released Environment Hub, diverse collection of envs for RL training and evals

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Weiyan Shi

@shi_weiyan

6 days ago

Submit to our workshop and join our post-event reception together with Prime Intellect 🥳🥳🥳

thumb_up_off_alt56

chat_bubble_outline0

repeat5

shareShare