Jay (@jayendra_ram) Twitter Tweets • TwiCopy

Jay

@jayendra_ram

+ Follow

using computers @hud_evals | prev physics+cs @columbia, @ycombinator

ID: 1568772893156282369

linkhttps://bento.me/jayendra calendar_today11-09-2022 01:28:29

859 Tweet

1,1K Followers

741 Following

Jay

@jayendra_ram

4 months ago

This is my new favorite hub.

thumb_up_off_alt122

chat_bubble_outline4

repeat5

shareShare

Getting a lot of people telling me LLMs can accurately simulate human behavior. This isn't true. If LLMs could truly model human preferences, markets would be obsolete. Prices are just a clumsy way to guess what people want. Perfect simulation would make price signals

thumb_up_off_alt32

chat_bubble_outline4

repeat2

shareShare

Jay

@jayendra_ram

4 months ago

There's been a lot of cloudflare hate recently but tbh they're justified in wanting to restrict ai agents. Until very recently, the de facto business model of the internet was ads. Ads require a human being to view your website and interact with it. If most of your site traffic

thumb_up_off_alt13

chat_bubble_outline1

repeat0

shareShare

Jay

@jayendra_ram

4 months ago

IMO the only reason CUA agents aren't more prevalent is because of 1) speed, 2) cost and 3) inability to do long horizon tasks reliably (in that order). The models are already quite good for many important tasks. Scaling compute will fix 1) and 2) in ~1 year.

thumb_up_off_alt32

chat_bubble_outline3

repeat0

shareShare

Jay

@jayendra_ram

4 months ago

This is awesome. Congrats jan! x.com/adamcohenhille…

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

hud

@hud_evals

4 months ago

Everyone claims SOTA for Computer Use Agents (CUAs), but there's no way to ensure reproducible results. We're publicly releasing our OSWorld Verified leaderboard, starting with CUA models from OpenAI and Anthropic. We will include more evals and models soon.

thumb_up_off_alt171

chat_bubble_outline7

repeat18

shareShare

Hassan Hayat 🔥

@theseamouse

4 months ago

Alexander Doria the greatest trick the devil played is convince everyone they don't need evals

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Jay

@jayendra_ram

4 months ago

This really was the year of agents huh

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

Jay

@jayendra_ram

3 months ago

Juicebox is probably the best tool for recruiting out there. Congrats David and Ishan! x.com/juicebox_work/…

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Kasey Zhang

@_weexiao

3 months ago

We’re hosting RL IRL in 10 days with Y Combinator, Shane Gu, Reducto, Greptile, Encord, and hud - it’ll be a great opportunity to meet others working on applied reinforcement learning!

We’re hosting RL IRL in 10 days with <a href="/ycombinator/">Y Combinator</a>, <a href="/shaneguML/">Shane Gu</a>, <a href="/reductoai/">Reducto</a>, <a href="/greptileai/">Greptile</a>, <a href="/encord_team/">Encord</a>, and <a href="/hud_evals/">hud</a> - it’ll be a great opportunity to meet others working on applied reinforcement learning!

thumb_up_off_alt41

chat_bubble_outline3

repeat2

shareShare

Varunram Ganesh

@varunramg

3 months ago

I’m excited to announce Lapis, the fastest and most accurate analytics platform for AI search engines like ChatGPT, Claude, Gemini, Perplexity, and Grok. Here’s why we wanted to tackle this specific problem. The way we access information is fundamentally changing, ChatGPT and

thumb_up_off_alt460

chat_bubble_outline90

repeat44

shareShare

Kasey Zhang

@_weexiao

3 months ago

hud (Jay, lorenss) and The LLM Data Company will break down how to build + scale RL environments:

<a href="/hud_evals/">hud</a> (<a href="/jayendra_ram/">Jay</a>, <a href="/seeklis/">lorenss</a>) and The LLM Data Company will break down how to build + scale RL environments:

thumb_up_off_alt11

chat_bubble_outline1

repeat1

shareShare

Jay

@jayendra_ram

3 months ago

We're going to teach people the best practices when making RL environments at Y Combinator this Saturday. If you're interested in the space and want to see the potential use cases of RL environments, come by! x.com/_WEEXIAO/statu…

thumb_up_off_alt45

chat_bubble_outline4

repeat5

shareShare

Jay

@jayendra_ram

2 months ago

Emergent behavior in LLMs may be the greatest psyop of our times.

thumb_up_off_alt20

chat_bubble_outline1

repeat1

shareShare