Jessy Lin (@realjessylin) 's Twitter Profile
Jessy Lin

@realjessylin

PhD @Berkeley_AI | interactive language agents 🤖 💬

ID: 1250628810

linkhttp://jessylin.com calendar_today08-03-2013 03:40:38

287 Tweet

1,1K Takipçi

813 Takip Edilen

Jiahai Feng (@feng_jiahai) 's Twitter Profile Photo

New preprint! We build on the hypothesis that language models construct latent world models of their inputs, and seek to extract latent world states as logical propositions using “propositional probes”.

New preprint! We build on the hypothesis that language models construct latent world models of their inputs, and seek to extract latent world states as logical propositions using “propositional probes”.
Jessy Lin (@realjessylin) 's Twitter Profile Photo

I'll be at #ICML2024 next week presenting Dynalang as an 𝗼𝗿𝗮𝗹 in the agents and world modeling session on Thurs! Email or DM if you want to chat about anything language <> agents! I'm excited lately about vid-lang models, optimizing for human assistance and other fuzzy

Sarah Wooders 👾 (@sarahwooders) 's Twitter Profile Photo

Excited to announce Letta, the company Charles Packer and I started for building stateful LLM agents We're building out an incredible (in-person) team in SF, and are actively hiring founding engineer/researchers jobs.ashbyhq.com/letta techcrunch.com/2024/09/23/let…

Jessy Lin (@realjessylin) 's Twitter Profile Photo

Really cool of ICLR to experiment with making AI part of the reviewing process. Instead of rejecting AI assistance and pretending that people aren't already using LMs to read/write/understand things, we can learn a lot from trying to make it part of our process (even if

Really cool of ICLR to experiment with making AI part of the reviewing process. Instead of rejecting AI assistance and pretending that people aren't already using LMs to read/write/understand things, we can learn a lot from trying to make it part of our process (even if
Jessy Lin (@realjessylin) 's Twitter Profile Photo

Using AI agents to help humans understand and audit complex AI systems — I'm really excited by the long-term vision Jacob and Sarah are working on here!

Jessy Lin (@realjessylin) 's Twitter Profile Photo

+1 to the key idea here - it's def important to iterate on algorithms with clean benchmarks like math+code with known reward functions, but almost every task we care about in the real world has a fuzzy / human-defined reward func. I'm interested to see how we'll end up applying

Charlie Snell (@sea_snell) 's Twitter Profile Photo

Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task? We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵

Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task?

We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵
Boaz Barak (@boazbaraktcs) 's Twitter Profile Photo

Fascinating interviews. I'm not sure humans will ever be "out of the loop" in math. Even if humans have no advantages in proving theorems, they are still going to matter in asking questions. Mathematics is not just about what is true, but also what is interesting - to humans!

Sanidhya Vijayvargiya (@sanidhya903) 's Twitter Profile Photo

1/ LLM agents can code—but can they ask clarifying questions? 🤖💬 Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀

1/ LLM agents can code—but can they ask clarifying questions? 🤖💬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀
Cassidy Laidlaw (@cassidy_laidlaw) 's Twitter Profile Photo

We built an AI assistant that plays Minecraft with you. Start building a house—it figures out what you’re doing and jumps in to help. This assistant *wasn't* trained with RLHF. Instead, it's powered by *assistance games*, a better path forward for building AI assistants. 🧵

Helen Toner (@hlntnr) 's Twitter Profile Photo

New on Rising Tide, I break down 2 factors that will play a huge role in how much AI progress we see over the next couple years: verification & generalization. How well these go will determine if AI just gets super good at math & coding vs. mastering many domains. Post excerpts:

New on Rising Tide, I break down 2 factors that will play a huge role in how much AI progress we see over the next couple years: verification &amp; generalization.

How well these go will determine if AI just gets super good at math &amp; coding vs. mastering many domains. Post excerpts:
Sam Rodriques (@sgrodriques) 's Twitter Profile Photo

Today, we are launching the first publicly available AI Scientist, via the FutureHouse Platform. Our AI Scientist agents can perform a wide variety of scientific tasks better than humans. By chaining them together, we've already started to discover new biology really fast. With

John Yang (@jyangballin) 's Twitter Profile Photo

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified.

We built it by synthesizing a ton of agentic training data from 100+ Python repos.

Today we’re open-sourcing the toolkit that made it happen: SWE-smith.
Jessy Lin (@realjessylin) 's Twitter Profile Photo

underrated idea to learn passively about people from everyday computer use - I think the natural extension is learning from *trajectories* of how people prefer to do things, which is hard to get from prompting / static user data otherwise