Jay (@jayendra_ram) 's Twitter Profile
Jay

@jayendra_ram

using computers @hud_evals | prev physics+cs @columbia, @ycombinator

ID: 1568772893156282369

linkhttps://bento.me/jayendra calendar_today11-09-2022 01:28:29

859 Tweet

1,1K Followers

741 Following

Jay (@jayendra_ram) 's Twitter Profile Photo

Getting a lot of people telling me LLMs can accurately simulate human behavior. This isn't true. If LLMs could truly model human preferences, markets would be obsolete. Prices are just a clumsy way to guess what people want. Perfect simulation would make price signals

Jay (@jayendra_ram) 's Twitter Profile Photo

There's been a lot of cloudflare hate recently but tbh they're justified in wanting to restrict ai agents. Until very recently, the de facto business model of the internet was ads. Ads require a human being to view your website and interact with it. If most of your site traffic

There's been a lot of cloudflare hate recently but tbh they're justified in wanting to restrict ai agents. 

Until very recently, the de facto business model of the internet was ads. Ads require a human being to view your website and interact with it. If most of your site traffic
Jay (@jayendra_ram) 's Twitter Profile Photo

IMO the only reason CUA agents aren't more prevalent is because of 1) speed, 2) cost and 3) inability to do long horizon tasks reliably (in that order). The models are already quite good for many important tasks. Scaling compute will fix 1) and 2) in ~1 year.

hud (@hud_evals) 's Twitter Profile Photo

Everyone claims SOTA for Computer Use Agents (CUAs), but there's no way to ensure reproducible results. We're publicly releasing our OSWorld Verified leaderboard, starting with CUA models from OpenAI and Anthropic. We will include more evals and models soon.

Everyone claims SOTA for Computer Use Agents (CUAs), but there's no way to ensure reproducible results.
We're publicly releasing our OSWorld Verified leaderboard, starting with CUA models from OpenAI and Anthropic. We will include more evals and models soon.
Varunram Ganesh (@varunramg) 's Twitter Profile Photo

I’m excited to announce Lapis, the fastest and most accurate analytics platform for AI search engines like ChatGPT, Claude, Gemini, Perplexity, and Grok. Here’s why we wanted to tackle this specific problem. The way we access information is fundamentally changing, ChatGPT and

Jay (@jayendra_ram) 's Twitter Profile Photo

We're going to teach people the best practices when making RL environments at Y Combinator this Saturday. If you're interested in the space and want to see the potential use cases of RL environments, come by! x.com/_WEEXIAO/statu…