Mert Unsal (@mertunsal2020) 's Twitter Profile
Mert Unsal

@mertunsal2020

Researcher at Project Numina

ID: 953183202654478337

linkhttps://mertunsall.github.io/ calendar_today16-01-2018 08:32:39

695 Tweet

434 Followers

608 Following

Mert Unsal (@mertunsal2020) 's Twitter Profile Photo

the reason we forked is that it was just really not possible to do all the things we wanted to do without forking. we merged as much as possible back to verl but of course it slows you down too much if you have to wait for every merge and also some things are just inappropriate

Mert Unsal (@mertunsal2020) 's Twitter Profile Photo

Claude Sonnet 4.5 is extremely powerful and extremely expensive. Just running our simplest eval suite once costed ~120 USD with an average of ~60 cents per task. With latest Gemini-2.5-Flash the same costs 10 USD.

Claude Sonnet 4.5 is extremely powerful and extremely expensive.

Just running our simplest eval suite once costed ~120 USD with an average of ~60 cents per task.

With latest Gemini-2.5-Flash the same costs 10 USD.
Mert Unsal (@mertunsal2020) 's Twitter Profile Photo

I don't read posts/papers anymore. I clone the code and talk to Sonnet in Cursor. Directly diving into the implementation with a good explainer by my side is much better learning than reading vague high-level explanations that skip over the true complexity of things.

Mert Unsal (@mertunsal2020) 's Twitter Profile Photo

I still don’t understand how is this so different than renting a GPU cluster and writing a Megatron script You can also do full fine tuning or use another algorithm of your choice and choose from much larger variety of models

Mert Unsal (@mertunsal2020) 's Twitter Profile Photo

when I was little, I played League of Legends with friends ALL awake hours for an entire summer straight I loved it so much I would go to the bathroom only while the new game was loading put yourself in a position where you get to “play”. this is the biggest privilege of all.

Gregor Zunic (@gregpr07) 's Twitter Profile Photo

Where could we human label a few thousand (browser use) agent traces? We use llm as a judge for evals and want to know how aligned it is with human labels🌝

Mert Unsal (@mertunsal2020) 's Twitter Profile Photo

People who claim agents will end up doing everything often underestimate how often humans want to stay in control. There’s a variety of reasons why this is the case. Sometimes, we just want to think and choose for ourselves. For example, it is very easy for food apps to have a

Mert Unsal (@mertunsal2020) 's Twitter Profile Photo

all such discussions around “being conscious” lack a good understanding of consciousness what is it this “consciousness” that humans have? how can you tell something has consciousness or not externally? without answering these questions all these statements are void