Shunyu Yao (@shunyuyao12) Twitter Tweets • TwiCopy

Shunyu Yao

@shunyuyao12

+ Follow

Language agents (ReAct, Reflexion, Tree of Thoughts) for digital automation (WebShop, SWE-bench, SWE-agent)

ID: 1271552707464032256

linkhttp://ysymyth.github.io calendar_today12-06-2020 21:19:32

640 Tweet

10,10K Followers

945 Following

Ofir Press

3 months ago

The progress on SWE-bench is nuts. I think my prediction of 2 systems surpassing 35% pass@1 on the full test set by Aug 1 will come true. When we launched in October, nobody wanted to work on the dataset because it was considered "too hard" or "impossible". Acc was 1.96% then.

thumb_up_off_alt134

chat_bubble_outline10

Sam Rodriques

2 months ago

We're expanding our wet lab at FutureHouse, and looking for exceptional junior bio researchers. Our AI systems are designing protocols & experiments to make basic science discoveries in biology. If you want to see the future of wet lab research, apply: job-boards.greenhouse.io/futurehouse/jo…

thumb_up_off_alt76

chat_bubble_outline0

Ben Shi

2 months ago

Excited to share that our work on competitive programming has been accepted to Conference on Language Modeling, landing in the top 1% of review scores! Very grateful to my collaborators. Working on some additional, exciting releases relating to this, stay tuned…

thumb_up_off_alt38

chat_bubble_outline5

Yangsibo Huang

2 months ago

What shall we expect for unlearning for LMs (more in 🧵)? Data owners may want the LM to unlearn the wording / knowledge of their data, w/o privacy leakage. But model deployers may want the unlearned LM to remain useful, even after sequential unlearning requests that may vary

thumb_up_off_alt100

chat_bubble_outline2

Karthik Narasimhan

2 months ago

A little over a year ago, when Shunyu Yao tweeted about our tree-of-thoughts (ToT) paper (x.com/ShunyuYao12/st…), there were concerns around it being too expensive given the use of tens to hundreds of LLM calls for solving each problem. The recent releases of gpt-4o mini,

thumb_up_off_alt89

chat_bubble_outline3

Shunyu Yao

2 months ago

almost forget to post: i've joined OpenAI! time to convert the research vision to reality, and expect something exciting to drop out :)

thumb_up_off_alt1,1K

chat_bubble_outline83

Shunyu Yao

2 months ago

Will we use llm as a text tool or llm use us as multimodal tool? After all we might be good at and enjoy different things😀

thumb_up_off_alt48

chat_bubble_outline2

Bret Taylor

a month ago

Excited to welcome Professor Zico Kolter to the OpenAI board

thumb_up_off_alt113

chat_bubble_outline7

Ben Shi

a month ago

The human-in-the-loop visualizer + analysis is up! Feel free to take a look at detailed analysis on various failure modes of models on USACO programming questions. benshi34.github.io/blog/2024/huma… Very happy to chat about anything here! DMs open.

The human-in-the-loop visualizer + analysis is up! Feel free to take a look at detailed analysis on various failure modes of models on USACO programming questions.

benshi34.github.io/blog/2024/huma…

Very happy to chat about anything here! DMs open.

thumb_up_off_alt14

chat_bubble_outline0

Rob Perez

a month ago

this will be one of my favorite moments ever not just because it secured gold. nor because of the difficulty. because he is the only person on earth who not passing to one of these two, wide open, is acceptable.

this will be one of my favorite moments ever not just because it secured gold. nor because of the difficulty. because he is the only person on earth who not passing to one of these two, wide open, is acceptable.

thumb_up_off_alt165,165K

chat_bubble_outline458

Nouha Dziri

a month ago

📢Super excited that our workshop "System 2 Reasoning At Scale" was accepted to #NeurIPS24, Vancouver! 🎉 🎯 how can we equip LMs with reasoning, moving beyond just scaling parameters and data? Organized w. Stanford NLP Group Massachusetts Institute of Technology (MIT) Princeton University Ai2 UW NLP 🗓️ when? Dec 15 2024

📢Super excited that our workshop "System 2 Reasoning At Scale" was accepted to #NeurIPS24, Vancouver! 🎉
🎯 how can we equip LMs with reasoning, moving beyond just scaling parameters and data?

Organized w. <a href="/stanfordnlp/">Stanford NLP Group</a> <a href="/MIT/">Massachusetts Institute of Technology (MIT)</a> <a href="/Princeton/">Princeton University</a> <a href="/allen_ai/">Ai2</a> <a href="/uwnlp/">UW NLP</a>

🗓️ when? Dec 15 2024

thumb_up_off_alt130

chat_bubble_outline2

Shunyu Yao

a month ago

Very excited for this convergence of interest :)

thumb_up_off_alt94

chat_bubble_outline0

Sanjeev Arora

@prfsanjeevarora

a month ago

Workshop on AI Agents on Aug 29 11-3 EST Princeton PLI . Tools, evals, movers-n-shakers sites.google.com/princeton.edu/…

thumb_up_off_alt97

chat_bubble_outline2

Andrew Lampinen

@andrewlampinen

a month ago

Really excited to share that I'm hiring for a Research Scientist position in our team! If you're interested in the kind of cognitively-oriented work we've been doing on learning & generalization, data properties, representations, LMs, or agents, please check it out!

thumb_up_off_alt342

chat_bubble_outline10

Ravi Gupta

a month ago

There has been so much discussion about AI agents. There is no one who knows more about what they can do now and what they will be able to do in the future than my friend Clay Bavor, co-founder of Sierra. Pat Grady and I sat down with Clay on Training Data.

thumb_up_off_alt54

chat_bubble_outline3