kelly (@kellyhongsn) 's Twitter Profile
kelly

@kellyhongsn

research @trychroma & gapping @ucberkeley

ID: 1597899266915115008

calendar_today30-11-2022 10:24:16

54 Tweet

551 Followers

158 Following

Chroma (@trychroma) 's Twitter Profile Photo

Over the weekend we crossed 20,000 stars and usage in 80,000 repos on GitHub! Our team is humbled and we’re excited to continue to build.

Over the weekend we crossed 20,000 stars and usage in 80,000 repos on GitHub!

Our team is humbled and we’re excited to continue to build.
anton (@atroyn) 's Twitter Profile Photo

rumors that we have secretly achieved sota on large-scale video generation through faustian pacts with the basilisk are unfounded

ARC Prize (@arcprize) 's Twitter Profile Photo

Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores - Frontier AI: 0%, Humans: 100%

Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI

We’re releasing:
* 3 games (environments)
* $10K agent contest
* AI agents API

Starting scores - Frontier AI: 0%, Humans: 100%
anton (@atroyn) 's Twitter Profile Photo

tomorrow! kelly and i going live to talk about her research on how models actually use long context in realistic settings. we’ll discuss the motivation behind the work, the ideas that didn’t quite make it, what we think the results mean, and answer audience q’s.

Chroma (@trychroma) 's Twitter Profile Photo

Go behind the research with kelly and anton 🇺🇸. Our latest technical report: "Context Rot" investigates how model performance grows increasingly unreliable as input length grows.

wh (@nrehiew_) 's Twitter Profile Photo

Some notes on the Context Rot research. I think there are pretty big/validated implications on several use cases and how we should think about these models.

Some notes on the Context Rot research. 

I think there are pretty big/validated implications on several use cases and how we should think about these models.