atharva (@atharvaraykar) Twitter Tweets • TwiCopy

atharva

@atharvaraykar

+ Follow

writing: atharvaraykar.com
work: @nilenso

elsewheres:
🦋 @atharvaraykar.com
🐘 @[email protected]

ID: 1085056759310544896

calendar_today15-01-2019 06:11:05

411 Tweet

221 Takipçi

586 Takip Edilen

Charlie Snell

@sea_snell

24 days ago

What happened to adding error bars to evals?

thumb_up_off_alt913

chat_bubble_outline17

repeat31

shareShare

Anthropic is the first lab that (very quietly) released scores for the more useful SWE Bench variants on release. Significant improvement on SWE Bench Pro! Unfortunately no one knows how well Gemini or Codex-Max is on it.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

atharva

@atharvaraykar

24 days ago

Claude is the most intelligent model because of the typography

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

atharva

@atharvaraykar

23 days ago

django unchained

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

atharva

@atharvaraykar

16 days ago

I looked at the code for this Is this just ...asking an LLM to improve the prompt in a loop by giving it annotated data? It beats/matches DSPy's fancy optimizers. Is this a big L for DSPy? Am I missing something? Why didn't they try this obvious thing first?

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare