Ori Press (@ori_press) Twitter Tweets • TwiCopy

Ori Press

@ori_press

+ Follow

Graduate student @BethgeLab.
I yearn to deep learn

ID: 1076861996283367425

linkhttp://oripress.com calendar_today23-12-2018 15:28:01

104 Tweet

364 Followers

393 Following

Kilian Lieret @ICLR

@klieret

10 months ago

SWE-agent 1.0 is the open-source SOTA on SWE-bench Lite! Tons of new features: massively parallel runs; cloud-based deployment; extensive configurability with tool bundles; new command line interface & utilities.

thumb_up_off_alt60

chat_bubble_outline3

repeat18

shareShare

Ofir Press

@ofirpress

10 months ago

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

Ofir Press

@ofirpress

8 months ago

Completing games requires long context and complex visual processing- so we put a bunch of 90s games into an emulator and made a benchmark. Our agent can't even the first level of these games. You can download it right now and try it out.

thumb_up_off_alt84

chat_bubble_outline6

repeat8

shareShare

Alex Zhang

@a1zhang

6 months ago

Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! 🧵👇

thumb_up_off_alt518

chat_bubble_outline23

repeat71

shareShare

Ofir Press

@ofirpress

5 months ago

AlgoBench is extremely tough, with agents not finding substantial speedups on most tasks. But sometimes these agents do really cool things: here, the agent realized that it could solve this convex optimization problem with a scipy function, leading to an 81x speedup.

thumb_up_off_alt161

chat_bubble_outline10

repeat27

shareShare

Brandon Amos

@brandondamos

5 months ago

Excited to release AlgoTune!! It's a benchmark and coding agent for optimizing the runtime of numerical code 🚀 algotune.io 📚 algotune.io/paper.pdf 🤖 github.com/oripress/AlgoT… with Ofir Press Ori Press Patrick Kidger Bartolomeo Stellato Arman Zharmagambetov & many others 🧵

thumb_up_off_alt129

chat_bubble_outline2

repeat26

shareShare

Ofir Press

@ofirpress

4 months ago

Congrats to my brother Dr. Ori Press on passing his PhD defense! Ori Press

Congrats to my brother Dr. Ori Press on passing his PhD defense! <a href="/ori_press/">Ori Press</a>

thumb_up_off_alt101

chat_bubble_outline11

repeat3

shareShare

Ori Press

@ori_press

4 months ago

We just benchmarked Qwen 3 Coder and GLM 4.5 on AlgoTune, and they manage to beat Claude Opus 4! We're excited to see if the models that will be released this week manage to make progress. Also: I just defended my PhD and I'm on the industry job market, my DMs are open :)

thumb_up_off_alt32

chat_bubble_outline0

repeat3

shareShare

Ofir Press

@ofirpress

4 months ago

We know that a bunch of teams are working on applying AlphaEvolve to AlgoTune, super excited to see some initial results! This is going to get super interesting.

thumb_up_off_alt22

chat_bubble_outline1

repeat4

shareShare

Ori Press

@ori_press

4 months ago

Just added Claude Opus 4.1 and gpt-oss-120b to the AlgoTune leaderboard. Excited to see if GPT-5 can break the 2 barrier!

thumb_up_off_alt16

chat_bubble_outline0

repeat2

shareShare

Richard Suwandi @ICLR2025

@richardcsuwandi

4 months ago

Introducing OpenEvolve x AlgoTune! Now you can run and benchmark evolutionary coding agents on 100+ algorithm optimization tasks from algotune.io

thumb_up_off_alt180

chat_bubble_outline2

repeat20

shareShare

Kilian Lieret @ICLR

@klieret

4 months ago

What if your agent uses a different LM at every turn? We let mini-SWE-agent randomly switch between GPT-5 and Sonnet 4 and it scored higher on SWE-bench than with either model separately. Read more in the SWE-bench blog 🧵

thumb_up_off_alt266

chat_bubble_outline18

repeat18

shareShare