Sergey Karayev (@sergeykarayev) 's Twitter Profile
Sergey Karayev

@sergeykarayev

ID: 740930520

linkhttps://sergeykarayev.com calendar_today06-08-2012 16:14:10

1,1K Tweet

12,12K Takipçi

3,3K Takip Edilen

Sergey Karayev (@sergeykarayev) 's Twitter Profile Photo

Several agents plus three simple baselines were tested on HumanEval. Agents were mostly worse and always more expensive than the baselines. The good: · Evaluating the Pareto frontier · Strong simple baselines (just repeated calls!) The bad: · Clearly saturating the benchmark

Several agents plus three simple baselines were tested on HumanEval.

Agents were mostly worse and always more expensive than the baselines.

The good:
· Evaluating the Pareto frontier
· Strong simple baselines (just repeated calls!)

The bad:
· Clearly saturating the benchmark