Ryan Shar (@ryanshar01) Twitter Tweets • TwiCopy

Ryan Shar

@ryanshar01

+ Follow

MS student @ CMU ML department

ID: 1829589192684257283

calendar_today30-08-2024 18:37:23

5 Tweet

5 Followers

18 Following

Misha Khodak

@khodakmoments

a year ago

🧵 on surprising revelations from our study of specialized foundation models (FMs beyond vision/text): after evaluating dozens of scientific & time series FMs we found that most weren’t even competitive with simple supervised models, some with as little as 513 parameters. 1/n

thumb_up_off_alt245

chat_bubble_outline3

repeat62

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

a year ago

Which model is best for coding? Copilot Arena leaderboard is out! Our code completions leaderboard contains data collected over the last month, with >100K completions served and >10K votes! Let’s discuss our findings so far🧵

Which model is best for coding? <a href="/CopilotArena/">Copilot Arena</a> leaderboard is out!

Our code completions leaderboard contains data collected over the last month, with >100K completions served and >10K votes!

Let’s discuss our findings so far🧵

thumb_up_off_alt542

chat_bubble_outline17

repeat78

shareShare

Jane Pan

@janepan_

9 months ago

When benchmarks talk, do LLMs listen? Our new paper shows that evaluating that code LLMs with interactive feedback significantly affects model performance compared to standard static benchmarks! Work w/ Ryan Shar, Jacob Pfau, Ameet Talwalkar, He He, and Valerie Chen! [1/6]

thumb_up_off_alt51

chat_bubble_outline2

repeat13

shareShare

Wayne Chi

@iamwaynechi

9 months ago

What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants? In October, we launched Copilot Arena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint. Here's what we have learned /🧵

What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants?

In October, we launched <a href="/CopilotArena/">Copilot Arena</a> to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.

Here's what we have learned /🧵

thumb_up_off_alt161

chat_bubble_outline2

repeat34

shareShare

Valerie Chen

@valeriechen_

8 months ago

Blog post on Copilot Arena out now!

thumb_up_off_alt15

chat_bubble_outline0

repeat2

shareShare

Ameet Talwalkar

@atalwalkar

6 months ago

I’m excited to share new work from Datadog AI Research! We just released Toto, a new SOTA (by a wide margin!) time series foundation model, and BOOM, the largest benchmark of observability metrics. Both are available under the Apache 2.0 license. 🧵

thumb_up_off_alt241

chat_bubble_outline4

repeat53

shareShare