Eric Steinberger (@ericsteinb) 's Twitter Profile
Eric Steinberger

@ericsteinb

CEO @magicailabs

ID: 3782527941

linkhttps://magic.dev calendar_today26-09-2015 14:03:47

153 Tweet

8,8K Followers

571 Following

Eric Steinberger (@ericsteinb) 's Twitter Profile Photo

Nat is a great sparring partner, coach and supporter. He has consistently pushed us to be even more ambitious while remaining practical. We are incredibly fortunate to have him as our major backer and now also as a board member at Magic.

Noam Brown (@polynoamial) 's Twitter Profile Photo

This blog post by Magic does a great job highlighting the weaknesses of popular long-context evals and introduces HashHop as an alternative. Very impressive work from the Magic team and congrats on the new funding!

Taelin (@victortaelin) 's Twitter Profile Photo

You know something is good when it aces existing tests and has to invent its own benchmark to flex. HashHop is a step forward, and I hope it becomes the norm, replacing the stupid "Needle In A Haystack" test for benchmarking long context windows.

Eric Steinberger (@ericsteinb) 's Twitter Profile Photo

We're growing our Applied Team to work on post-training LTM2-medium (and once done pretraining, LTM2-large) into a useful assistant and capable agent.

METR (@metr_evals) 's Twitter Profile Photo

How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.

How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.
Eric Steinberger (@ericsteinb) 's Twitter Profile Photo

We're hiring for a new team aiming to train AI SWEs to robustly complete long-horizon work on a no-restrictions computer via the GUI. Today's models excel at small, Olympiad-type coding tasks but struggle in complex codebases and aren't easy to integrate into existing enterprise