Simon Yu (@simon_ycl) 's Twitter Profile
Simon Yu

@simon_ycl

1st Year PhD Student, supervised by @shi_weiyan | Incoming intern in @OrbyAI | MRes and BSc Student @EdinburghNLP | Member of @CohereForAI

ID: 3582852312

linkhttps://simonucl.github.io/ calendar_today16-09-2015 13:07:23

165 Tweet

290 Takipçi

623 Takip Edilen

Simon Yu (@simon_ycl) 's Twitter Profile Photo

Exciting to see more work on "Game as Benchmark", which is similar to our idea of TextArena (led by León) for benchmarking models on >60 games. though you can see GM Magnus Carlsen's comments on LLMs chess play 🔥

Exciting to see more work on "Game as Benchmark", which is similar to our idea of TextArena (led by <a href="/LeonGuertler/">León</a>) for benchmarking models on &gt;60 games. 

though you can see GM <a href="/MagnusCarlsen/">Magnus Carlsen</a>'s comments on LLMs chess play 🔥
will brown (@willccbb) 's Twitter Profile Photo

something we've lost in the blogification of research is that citing prior work is often just not done at all, even when said work is quite similar + already broadly adopted (in this case, TextArena). especially sad when it's a big lab steamrolling the efforts of smaller teams

will brown (@willccbb) 's Twitter Profile Photo

TextArena is one of my favorite projects of the year. i use it near-daily for RL experiments they've got an awesome interactive site, multiple RL frameworks, and a really great paper. check it out if you haven't: ui: textarena.ai gh: github.com/LeonGuertler/T… paper:

León (@leonguertler) 's Twitter Profile Photo

Very exciting to see others interested in using games to eval relative performance of frontier models as well :) And it finally solves to mystery of who has been downloading TextArena so much (80k downloads are via uv from the same kernel, so I just presume it's the google mono

Very exciting to see others interested in using games to eval relative performance of frontier models as well :)

And it finally solves to mystery of who has been downloading TextArena so much (80k downloads are via uv from the same kernel, so I just presume it's the google mono
Orby AI (@orbyai) 's Twitter Profile Photo

From knowing to doing. The next evolution in AI isn't just about understanding language—it's about taking action. Large Language Models (LLMs) are expert advisors. Large Action Models (LAMs) are reliable teammates. Read the breakdown here: orby.ai/blogs/why-larg… (1/5) #AI

From knowing to doing. The next evolution in AI isn't just about understanding language—it's about taking action.

Large Language Models (LLMs) are expert advisors. Large Action Models (LAMs) are reliable teammates. 
Read the breakdown here: orby.ai/blogs/why-larg… 

(1/5) #AI
Coalition on Digital Impact (CODI) (@codi_global) 's Twitter Profile Photo

🌍 Language shapes how we think and connect—but most AI models still struggle beyond English. Microsoft's July seminar discussed how we can bridge the gap and build #AIforEveryone with Marzieh Fadaee of Cohere Labs. 📽️microsoft.com/en-us/research…

Sara Hooker (@sarahookr) 's Twitter Profile Photo

A little thank you from the Cohere Labs team. ✨ Thank you to everyone who has supported our work -- we just hit a special milestone. We have released 100 papers involving more than 150 institutions. 🔥

A little thank you from the <a href="/Cohere_Labs/">Cohere Labs</a> team. ✨

Thank you to everyone who has supported our work -- we just hit a special milestone. 

We have released 100 papers involving more than 150 institutions. 🔥
will brown (@willccbb) 's Twitter Profile Photo

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) kalomaze Teknium (e/λ) it’s a nice idea, totally seems plausible that you can approximate some aspects of offline RL with a tweaked SFT objective, though for these experiments the most likely story is it’s triggering the same mode-collapse behavior that boots scores in many malformed Qwen GRPO setups

jack morris (@jxmnop) 's Twitter Profile Photo

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019.  they recently released GPT-OSS, which is reasoning-only...

or is it? 

turns out that underneath the surface, there is still a strong base model. so we extracted it.

introducing gpt-oss-20b-base 🧵
Multi-Turn Interaction LLM Workshop @ NeurIPS 2025 (@mti_neurips) 's Twitter Profile Photo

🚀 Still have a chance to submit to NeurIPS Conference for our Multi-Turn Workshop! 🏆 Best Paper Awards 🎓 10-15 Registration Waivers for student authors 🎤 New panelist: will brown from @primeintellect! ⏳ Deadline is August 22—only 10 days left! 🎉 Thanks to our sponsor

🚀 Still have a chance to submit to <a href="/NeurIPSConf/">NeurIPS Conference</a> for our Multi-Turn Workshop! 

🏆 Best Paper Awards 
🎓 10-15 Registration Waivers for student authors 
🎤 New panelist: <a href="/willccbb/">will brown</a> from @primeintellect! 
⏳ Deadline is August 22—only 10 days left! 

🎉 Thanks to our sponsor
Multi-Turn Interaction LLM Workshop @ NeurIPS 2025 (@mti_neurips) 's Twitter Profile Photo

🚀 Another exciting news! We're thrilled to announce our second sponsor: Meta! Thank you for the generous support of our Multi-Turn Interaction Workshop at NeurIPS Conference! 🎓 With Meta's support, we're offering 15 registration fee waivers for early-stage researchers. 🎉 We're

🚀 Another exciting news! We're thrilled to announce our second sponsor: <a href="/Meta/">Meta</a>! Thank you for the generous support of our Multi-Turn Interaction Workshop at <a href="/NeurIPSConf/">NeurIPS Conference</a>!

🎓 With Meta's support, we're offering 15 registration fee waivers for early-stage researchers.
🎉 We're
Weiyan Shi (@shi_weiyan) 's Twitter Profile Photo

Thanks Meta for sponsoring our workshop! 🩷15 free tickets for students! 🩷 Deadline extended to 9/1/2025, a few more days to work on multi-turn interaction in LLMs!

Delip Rao e/σ (@deliprao) 's Twitter Profile Photo

The good folks behind LiteLLM (YC W23) seem to maintain this for anyone else trying to solve this problem. So up-to-date that even nano-🍌 is in there: See: docs.litellm.ai/docs/proxy/cos… github.com/BerriAI/litell… Thanks to Simon Yu for pointing this out.

Prime Intellect (@primeintellect) 's Twitter Profile Photo

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

Multi-Turn Interaction LLM Workshop @ NeurIPS 2025 (@mti_neurips) 's Twitter Profile Photo

📢 4 days left to submit to the Workshop on Multi-Turn Interaction for LLMs at #NeurIPS2025! Exciting updates: 🥂 We're partnering with Prime Intellect to co-host a post-event reception! A great chance to connect with researchers from industry & academia. 🤖 Thrilled to have

📢 4 days left to submit to the Workshop on Multi-Turn Interaction for LLMs at #NeurIPS2025!

Exciting updates: 
🥂 We're partnering with <a href="/PrimeIntellect/">Prime Intellect</a> to co-host a post-event reception! A great chance to connect with researchers from industry &amp; academia. 

🤖 Thrilled to have
Simon Yu (@simon_ycl) 's Twitter Profile Photo

thanks to will brown and Prime Intellect for their support at our workshop! they also just released Environment Hub, diverse collection of envs for RL training and evals