Maria Trofimova (@__vimary) Twitter Tweets • TwiCopy

Nebius

a year ago

Discover how Nebius is advancing AI-driven software engineering with search and learning. Our autonomous agent achieves a 40.6% resolution rate on SWE-bench Verified, using only open-weight models. Learn more from our research article: eu1.hubs.ly/H0dM3Rr0

thumb_up_off_alt14

chat_bubble_outline0

repeat4

shareShare

hr0nix @ ICLR

@hr0nix

a year ago

Can open-weight models match frontier LLM performance on SWE-bench? They can if you equip them with search! We've been studying how guided search can improve SWE agents, and built an SWE-agent-based system that scores 40.6% on SWE-Bench Verified using only open-weight models. 🧵

thumb_up_off_alt74

chat_bubble_outline5

repeat20

shareShare

hr0nix @ ICLR

@hr0nix

a year ago

As a follow up to our work on applying search to software engineering agents, today we are releasing datasets of problem instances and agent trajectories. This is the training data we previously used to achieve 40.6% on SWE-bench Verified using open-weight models only! 🧵⬇️

thumb_up_off_alt36

chat_bubble_outline2

repeat20

shareShare

Maria Trofimova

@__vimary

a year ago

🐸🚀 kvax rocks!

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

mshnmshn

@mshn_mshn_nl

a year ago

This stack of laminated π cards represents 4 years of my favorite #PiDay tradition: wandering through Moscow technical universities, giving them away freely while saying 'Happy Pi Day!' and reciting 3.14159265358979323846264 from heart. Watching faces light up when fellow math

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

hr0nix @ ICLR

@hr0nix

a year ago

One of our research interests in Nebius is agentic software engineering. Because of that, we have reviewed LOTS of agent evals on software engineering tasks, and there were issues about these evals that made us unhappy. Today we are making a step towards fixing some of them ⬇️

thumb_up_off_alt38

chat_bubble_outline3

repeat15

shareShare