Maria Trofimova (@__vimary) 's Twitter Profile
Maria Trofimova

@__vimary

ID: 1494172133282492421

calendar_today17-02-2022 04:56:16

7 Tweet

6 Followers

25 Following

Nebius (@nebiusai) 's Twitter Profile Photo

Discover how Nebius is advancing AI-driven software engineering with search and learning. Our autonomous agent achieves a 40.6% resolution rate on SWE-bench Verified, using only open-weight models. Learn more from our research article: eu1.hubs.ly/H0dM3Rr0

Discover how Nebius is advancing AI-driven software engineering with search and learning. Our autonomous agent achieves a 40.6% resolution rate on SWE-bench Verified, using only open-weight models. 

Learn more from our research article: eu1.hubs.ly/H0dM3Rr0
hr0nix @ ICLR (@hr0nix) 's Twitter Profile Photo

Can open-weight models match frontier LLM performance on SWE-bench? They can if you equip them with search! We've been studying how guided search can improve SWE agents, and built an SWE-agent-based system that scores 40.6% on SWE-Bench Verified using only open-weight models. 🧵

hr0nix @ ICLR (@hr0nix) 's Twitter Profile Photo

As a follow up to our work on applying search to software engineering agents, today we are releasing datasets of problem instances and agent trajectories. This is the training data we previously used to achieve 40.6% on SWE-bench Verified using open-weight models only! 🧵⬇️

mshnmshn (@mshn_mshn_nl) 's Twitter Profile Photo

This stack of laminated π cards represents 4 years of my favorite #PiDay tradition: wandering through Moscow technical universities, giving them away freely while saying 'Happy Pi Day!' and reciting 3.14159265358979323846264 from heart. Watching faces light up when fellow math

This stack of laminated π cards represents 4 years of my favorite #PiDay tradition: wandering through Moscow technical universities, giving them away freely while saying 'Happy Pi Day!' and reciting 3.14159265358979323846264 from heart. Watching faces light up when fellow math
hr0nix @ ICLR (@hr0nix) 's Twitter Profile Photo

One of our research interests in Nebius is agentic software engineering. Because of that, we have reviewed LOTS of agent evals on software engineering tasks, and there were issues about these evals that made us unhappy. Today we are making a step towards fixing some of them ⬇️

hr0nix @ ICLR (@hr0nix) 's Twitter Profile Photo

An extended writeup of our earlier research blogpost on training critics for SWE agents has been accepted to ICML! Some details below ⬇️

An extended writeup of our earlier research blogpost on training critics for SWE agents has been accepted to ICML! Some details below ⬇️
Maria Trofimova (@__vimary) 's Twitter Profile Photo

Should we abandon gradient accumulation? "Correct" configuring of Adam's beta2 leads to stable (pre)training with small batch sizes