Jessy Lin (@realjessylin) Twitter Tweets • TwiCopy

Jiahai Feng

a year ago

New preprint! We build on the hypothesis that language models construct latent world models of their inputs, and seek to extract latent world states as logical propositions using “propositional probes”.

thumb_up_off_alt97

chat_bubble_outline4

repeat20

shareShare

Jessy Lin

@realjessylin

a year ago

I'll be at #ICML2024 next week presenting Dynalang as an 𝗼𝗿𝗮𝗹 in the agents and world modeling session on Thurs! Email or DM if you want to chat about anything language <> agents! I'm excited lately about vid-lang models, optimizing for human assistance and other fuzzy

thumb_up_off_alt59

chat_bubble_outline5

repeat9

shareShare

Sarah Wooders 👾

@sarahwooders

a year ago

Excited to announce Letta, the company Charles Packer and I started for building stateful LLM agents We're building out an incredible (in-person) team in SF, and are actively hiring founding engineer/researchers jobs.ashbyhq.com/letta techcrunch.com/2024/09/23/let…

thumb_up_off_alt488

chat_bubble_outline28

repeat40

shareShare

Jessy Lin

@realjessylin

a year ago

Really cool of ICLR to experiment with making AI part of the reviewing process. Instead of rejecting AI assistance and pretending that people aren't already using LMs to read/write/understand things, we can learn a lot from trying to make it part of our process (even if

thumb_up_off_alt64

chat_bubble_outline1

repeat8

shareShare

Jessy Lin

@realjessylin

a year ago

Using AI agents to help humans understand and audit complex AI systems — I'm really excited by the long-term vision Jacob and Sarah are working on here!

thumb_up_off_alt23

chat_bubble_outline0

repeat1

shareShare

Jessy Lin

@realjessylin

10 months ago

+1 to the key idea here - it's def important to iterate on algorithms with clean benchmarks like math+code with known reward functions, but almost every task we care about in the real world has a fuzzy / human-defined reward func. I'm interested to see how we'll end up applying

thumb_up_off_alt30

chat_bubble_outline1

repeat2

shareShare

Charlie Snell

@sea_snell

10 months ago

Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task? We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵

thumb_up_off_alt570

chat_bubble_outline12

repeat70

shareShare

Boaz Barak

@boazbaraktcs

10 months ago

Fascinating interviews. I'm not sure humans will ever be "out of the loop" in math. Even if humans have no advantages in proving theorems, they are still going to matter in asking questions. Mathematics is not just about what is true, but also what is interesting - to humans!

thumb_up_off_alt28

chat_bubble_outline5

repeat1

shareShare

Sanidhya Vijayvargiya

@sanidhya903

7 months ago

1/ LLM agents can code—but can they ask clarifying questions? 🤖💬 Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀

thumb_up_off_alt67

chat_bubble_outline4

repeat16

shareShare

Cassidy Laidlaw

@cassidy_laidlaw

6 months ago

We built an AI assistant that plays Minecraft with you. Start building a house—it figures out what you’re doing and jumps in to help. This assistant *wasn't* trained with RLHF. Instead, it's powered by *assistance games*, a better path forward for building AI assistants. 🧵

thumb_up_off_alt2,2K

chat_bubble_outline90

repeat217

shareShare

Jessy Lin

@realjessylin

6 months ago

chatgpt memory is like the buzzfeed quiz of 2025

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Helen Toner

@hlntnr

5 months ago

New on Rising Tide, I break down 2 factors that will play a huge role in how much AI progress we see over the next couple years: verification & generalization. How well these go will determine if AI just gets super good at math & coding vs. mastering many domains. Post excerpts:

thumb_up_off_alt126

chat_bubble_outline6

repeat18

shareShare

Sam Rodriques

@sgrodriques

5 months ago

Today, we are launching the first publicly available AI Scientist, via the FutureHouse Platform. Our AI Scientist agents can perform a wide variety of scientific tasks better than humans. By chaining them together, we've already started to discover new biology really fast. With

thumb_up_off_alt3,3K

chat_bubble_outline144

repeat708

shareShare

John Yang

@jyangballin

5 months ago

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

thumb_up_off_alt638

chat_bubble_outline25

repeat132

shareShare

Jessy Lin

@realjessylin

4 months ago

underrated idea to learn passively about people from everyday computer use - I think the natural extension is learning from *trajectories* of how people prefer to do things, which is hard to get from prompting / static user data otherwise

thumb_up_off_alt12

chat_bubble_outline1

repeat3

shareShare