Shunyu Yao (@shunyuyao12) 's Twitter Profile
Shunyu Yao

@shunyuyao12

Language agents (ReAct, Reflexion, Tree of Thoughts) for digital automation (WebShop, SWE-bench, SWE-agent)

ID: 1271552707464032256

linkhttp://ysymyth.github.io calendar_today12-06-2020 21:19:32

640 Tweet

10,10K Followers

945 Following

Ofir Press (@ofirpress) 's Twitter Profile Photo

The progress on SWE-bench is nuts. I think my prediction of 2 systems surpassing 35% pass@1 on the full test set by Aug 1 will come true. When we launched in October, nobody wanted to work on the dataset because it was considered "too hard" or "impossible". Acc was 1.96% then.

Sam Rodriques (@sgrodriques) 's Twitter Profile Photo

We're expanding our wet lab at FutureHouse, and looking for exceptional junior bio researchers. Our AI systems are designing protocols & experiments to make basic science discoveries in biology. If you want to see the future of wet lab research, apply: job-boards.greenhouse.io/futurehouse/jo…

Ben Shi (@benshi34) 's Twitter Profile Photo

Excited to share that our work on competitive programming has been accepted to Conference on Language Modeling, landing in the top 1% of review scores! Very grateful to my collaborators. Working on some additional, exciting releases relating to this, stay tuned…

Yangsibo Huang (@yangsibohuang) 's Twitter Profile Photo

What shall we expect for unlearning for LMs (more in 🧵)? Data owners may want the LM to unlearn the wording / knowledge of their data, w/o privacy leakage. But model deployers may want the unlearned LM to remain useful, even after sequential unlearning requests that may vary

Karthik Narasimhan (@karthik_r_n) 's Twitter Profile Photo

A little over a year ago, when Shunyu Yao tweeted about our tree-of-thoughts (ToT) paper (x.com/ShunyuYao12/st…), there were concerns around it being too expensive given the use of tens to hundreds of LLM calls for solving each problem. The recent releases of gpt-4o mini,

Shunyu Yao (@shunyuyao12) 's Twitter Profile Photo

almost forget to post: i've joined OpenAI! time to convert the research vision to reality, and expect something exciting to drop out :)

Shunyu Yao (@shunyuyao12) 's Twitter Profile Photo

Will we use llm as a text tool or llm use us as multimodal tool? After all we might be good at and enjoy different things😀

Ben Shi (@benshi34) 's Twitter Profile Photo

The human-in-the-loop visualizer + analysis is up! Feel free to take a look at detailed analysis on various failure modes of models on USACO programming questions. benshi34.github.io/blog/2024/huma… Very happy to chat about anything here! DMs open.

The human-in-the-loop visualizer + analysis is up! Feel free to take a look at detailed analysis on various failure modes of models on USACO programming questions.

benshi34.github.io/blog/2024/huma…

Very happy to chat about anything here! DMs open.
Rob Perez (@worldwidewob) 's Twitter Profile Photo

this will be one of my favorite moments ever not just because it secured gold. nor because of the difficulty. because he is the only person on earth who not passing to one of these two, wide open, is acceptable.

this will be one of my favorite moments ever not just because it secured gold. nor because of the difficulty. because he is the only person on earth who not passing to one of these two, wide open, is acceptable.
Nouha Dziri (@nouhadziri) 's Twitter Profile Photo

📢Super excited that our workshop "System 2 Reasoning At Scale" was accepted to #NeurIPS24, Vancouver! 🎉 🎯 how can we equip LMs with reasoning, moving beyond just scaling parameters and data? Organized w. Stanford NLP Group Massachusetts Institute of Technology (MIT) Princeton University Ai2 UW NLP 🗓️ when? Dec 15 2024

📢Super excited that our workshop "System 2 Reasoning At Scale" was accepted to #NeurIPS24, Vancouver! 🎉
🎯 how can we equip LMs with reasoning, moving beyond just scaling parameters and data?

Organized w. <a href="/stanfordnlp/">Stanford NLP Group</a> <a href="/MIT/">Massachusetts Institute of Technology (MIT)</a> <a href="/Princeton/">Princeton University</a> <a href="/allen_ai/">Ai2</a> <a href="/uwnlp/">UW NLP</a> 

🗓️ when? Dec 15 2024
Andrew Lampinen (@andrewlampinen) 's Twitter Profile Photo

Really excited to share that I'm hiring for a Research Scientist position in our team! If you're interested in the kind of cognitively-oriented work we've been doing on learning & generalization, data properties, representations, LMs, or agents, please check it out!

Ravi Gupta (@guptark22) 's Twitter Profile Photo

There has been so much discussion about AI agents. There is no one who knows more about what they can do now and what they will be able to do in the future than my friend Clay Bavor, co-founder of Sierra. Pat Grady and I sat down with Clay on Training Data.