Ryan Shar (@ryanshar01) 's Twitter Profile
Ryan Shar

@ryanshar01

MS student @ CMU ML department

ID: 1829589192684257283

calendar_today30-08-2024 18:37:23

5 Tweet

5 Takipçi

18 Takip Edilen

Misha Khodak (@khodakmoments) 's Twitter Profile Photo

🧵 on surprising revelations from our study of specialized foundation models (FMs beyond vision/text): after evaluating dozens of scientific & time series FMs we found that most weren’t even competitive with simple supervised models, some with as little as 513 parameters. 1/n

🧵 on surprising revelations from our study of specialized foundation models (FMs beyond vision/text): after evaluating dozens of scientific & time series FMs we found that most weren’t even competitive with simple supervised models, some with as little as 513 parameters.
1/n
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Which model is best for coding? Copilot Arena leaderboard is out! Our code completions leaderboard contains data collected over the last month, with >100K completions served and >10K votes! Let’s discuss our findings so far🧵

Which model is best for coding? <a href="/CopilotArena/">Copilot Arena</a> leaderboard is out!

Our code completions leaderboard contains data collected over the last month, with &gt;100K completions served and &gt;10K votes!

Let’s discuss our findings so far🧵
Jane Pan (@janepan_) 's Twitter Profile Photo

When benchmarks talk, do LLMs listen? Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks! Work w/ Ryan Shar, Jacob Pfau, Ameet Talwalkar, He He, and Valerie Chen! [1/6]

When benchmarks talk, do LLMs listen?

Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks!

Work w/ <a href="/RyanShar01/">Ryan Shar</a>, <a href="/jacob_pfau/">Jacob Pfau</a>, <a href="/atalwalkar/">Ameet Talwalkar</a>, <a href="/hhexiy/">He He</a>,  and <a href="/valeriechen_/">Valerie Chen</a>!

[1/6]
Wayne Chi (@iamwaynechi) 's Twitter Profile Photo

What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants? In October, we launched Copilot Arena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint. Here's what we have learned /🧵

What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants?

In October, we launched <a href="/CopilotArena/">Copilot Arena</a> to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.

Here's what we have learned /🧵
Ameet Talwalkar (@atalwalkar) 's Twitter Profile Photo

I’m excited to share new work from Datadog AI Research! We just released Toto, a new SOTA (by a wide margin!) time series foundation model, and BOOM, the largest benchmark of observability metrics. Both are available under the Apache 2.0 license. 🧵

I’m excited to share new work from Datadog AI Research! We just released Toto, a new SOTA (by a wide margin!) time series foundation model, and BOOM, the largest benchmark of observability metrics. Both are available under the Apache 2.0 license. 🧵