Shunyu Yao (@shunyuyao12) 's Twitter Profile
Shunyu Yao

@shunyuyao12

@OpenAI Language agents (ReAct, Reflexion, Tree of Thoughts, SWE-agent, CoALA) for digital automation (WebShop, SWE-bench, tau-bench)

ID: 1271552707464032256

linkhttp://ysymyth.github.io calendar_today12-06-2020 21:19:32

787 Tweet

15,15K Takipçi

1,1K Takip Edilen

Greg Brockman (@gdb) 's Twitter Profile Photo

Operator is now powered by o3, improving overall task success rate. Also results in clearer, more thorough, and better-structured responses.

Shunyu Yao (@shunyuyao12) 's Twitter Profile Photo

Tech is overestimated in the short term, (because infra is so damn harder than people realize) And underestimated in the long run. (becuase new tech becomes infra for new applications) Applies for computer, chip, internet, llm, rl, etc.

Ben Shi (@benshi34) 's Twitter Profile Photo

As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achieve superior collaborative outcomes? In our new preprint, we investigate human-centric model reasoning for knowledge transfer 🧵:

As we optimize model reasoning over verifiable objectives, how does this affect human understanding of said reasoning to achieve superior collaborative outcomes?

In our new preprint, we investigate human-centric model reasoning for knowledge transfer 🧵:
Andy Konwinski (@andykonwinski) 's Twitter Profile Photo

Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity. Built for and by researchers, including Jeff Dean & Joelle Pineau on the board, Laude Institute catalyzes research with real-world impact.

Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity.
Built for and by researchers, including <a href="/JeffDean/">Jeff Dean</a> &amp; <a href="/jpineau1/">Joelle Pineau</a> on the board, <a href="/LaudeInstitute/">Laude Institute</a> catalyzes research with real-world impact.
Shunyu Yao (@shunyuyao12) 's Twitter Profile Photo

An awesome piece by Kevin Lu . I find lots of the points connected to my own post ysymyth.github.io/The-Second-Hal… Pre-training is a genius idea that essentially leveraged billions of people, not just dozens in the lab. How can we leverage more people for rl?

Sam Altman (@sama) 's Twitter Profile Photo

watching chatgpt agent use a computer to do complex tasks has been a real "feel the agi" moment for me; something about seeing the computer think, plan, and execute hits different.

Josh Tobin (@josh_tobin_) 's Twitter Profile Photo

Introducing our latest agent: ChatGPT agent combines the best of deep research and operator into something that can do so much more for you. Try it out and let us know what you think!

Alexander Wei (@alexwei_) 's Twitter Profile Photo

1/N I’m excited to share that our latest OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

1/N I’m excited to share that our latest <a href="/OpenAI/">OpenAI</a> experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Reiichiro Nakano (@reiinakano) 's Twitter Profile Photo

capabilities-wise gpt5 seems within expectations with slightly better evals across the board. expect other frontier labs to catch up/jump ahead within the following weeks/month the real paradigm-shifting 4->5 leap is free users getting access to a frontier model by default.