Albert Örwall (@aorwall) 's Twitter Profile
Albert Örwall

@aorwall

Building Moatless Tools (github.com/aorwall/moatle…) and eval.moatless.ai

ID: 6107262

calendar_today17-05-2007 13:29:54

234 Tweet

154 Followers

406 Following

Jordan Juravsky (@jordanjuravsky) 's Twitter Profile Photo

Do you like LLMs? Do you also like for loops? Then you’ll love our new paper! We scale inference compute through repeated sampling: we let models make hundreds or thousands of attempts when solving a problem, rather than just one. By simply sampling more, we can boost LLM

Do you like LLMs? Do you also like for loops? Then you’ll love our new paper!

We scale inference compute through repeated sampling: we let models make hundreds or thousands of attempts when solving a problem, rather than just one. By simply sampling more, we can boost LLM
OpenAI (@openai) 's Twitter Profile Photo

We're releasing a new iteration of SWE-bench, in collaboration with the original authors, to more reliably evaluate AI models on their ability to solve real-world software issues. openai.com/index/introduc…

Albert Örwall (@aorwall) 's Twitter Profile Photo

I could run almost all instances of the new SWE-Bench Verified split in my evaluation harness eval.moatless.ai/evaluations/ab… But rebuilding SWE-Bench containers with unpinned dependencies is a moving target... DM me if you want to try it out.

I could run almost all instances of the new SWE-Bench Verified split in my evaluation harness eval.moatless.ai/evaluations/ab… 

But rebuilding SWE-Bench containers with unpinned dependencies is a moving target...

DM me if you want to try it out.
Albert Örwall (@aorwall) 's Twitter Profile Photo

It's been a true pleasure to be part of developing SWE-Search! Looking forward to further tweaking and optimizing the solution to see how far we can go with AI agents based solely on open-source models. 🚀 Huge thanks to Antonis Antoniades for the opportunity to work together on this!

Albert Örwall (@aorwall) 's Twitter Profile Photo

Moatless EvalTools now caches predictions, so you get instant feedback on already evaluated patches. Try it at eval.moatless.ai and let me know if you have suggestions for improvements!

Moatless EvalTools now caches predictions, so you get instant feedback on already evaluated patches. Try it at eval.moatless.ai and let me know if you have suggestions for improvements!
Albert Örwall (@aorwall) 's Twitter Profile Photo

I built moatless-tools as a challenge to see how high I could rank on the SWE-bench leaderboard on a tight budget. I try to keep costs below $0.50 per run. But compared to other entries, I realize that's like showing up to a Formula 1 race in a go-kart 🥲

Albert Örwall (@aorwall) 's Twitter Profile Photo

Should I build a hosted service to run moatless-tools/tree-search or a fork where users can use their own API keys to crowdsource SOTA on swebench?

Antonis Antoniades (@anton_iades) 's Twitter Profile Photo

happy to see an approach that took inspiration from SWE-Search achieving SOTA on SWE-bench. while there is still plenty of room for improvement, search is a vital tool for navigating complex SWE environments, and I expect more approaches to follow suit.

Albert Örwall (@aorwall) 's Twitter Profile Photo

Finally, a leaderboard that Moatless Tools can top. It’s too easy to just throw money at the agent to get a high score on SWE-Bench.

Albert Örwall (@aorwall) 's Twitter Profile Photo

Deepseek-R1 gets 50% on SWE-Bench Verified Mini with Moatless Tools, surpassing Claude 3.5 Sonnet. experiments.moatless.ai/evaluations/20…

Deepseek-R1 gets 50% on SWE-Bench Verified Mini with Moatless Tools, surpassing Claude 3.5 Sonnet. experiments.moatless.ai/evaluations/20…
Albert Örwall (@aorwall) 's Twitter Profile Photo

The new @spotify Running App reminds me of an app I developed at #wowhack last year. But without the Bieber Alert... github.com/wowhack/Spotif…

Martin Fowler (@martinfowler) 's Twitter Profile Photo

posted: Microservice Resource Guide: my selection of material on the what, when, how, and who of microservices martinfowler.com/microservices/