Mert Unsal (@mertunsal2020) Twitter Tweets • TwiCopy

Mert Unsal

@mertunsal2020

2 months ago

we got really cool interns

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

the reason we forked is that it was just really not possible to do all the things we wanted to do without forking. we merged as much as possible back to verl but of course it slows you down too much if you have to wait for every merge and also some things are just inappropriate

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Mert Unsal

@mertunsal2020

2 months ago

Claude Sonnet 4.5 is extremely powerful and extremely expensive. Just running our simplest eval suite once costed ~120 USD with an average of ~60 cents per task. With latest Gemini-2.5-Flash the same costs 10 USD.

thumb_up_off_alt9

chat_bubble_outline2

repeat2

shareShare

Mert Unsal

@mertunsal2020

2 months ago

I don't read posts/papers anymore. I clone the code and talk to Sonnet in Cursor. Directly diving into the implementation with a good explainer by my side is much better learning than reading vague high-level explanations that skip over the true complexity of things.

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Mert Unsal

@mertunsal2020

2 months ago

Join Carina Hong to build mathematical superintelligence - they just raised $64M seed round and have a fantastic team!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Mert Unsal

@mertunsal2020

2 months ago

I still don’t understand how is this so different than renting a GPU cluster and writing a Megatron script You can also do full fine tuning or use another algorithm of your choice and choose from much larger variety of models

thumb_up_off_alt1

chat_bubble_outline2

repeat0

shareShare

Mert Unsal

@mertunsal2020

2 months ago

when I was little, I played League of Legends with friends ALL awake hours for an entire summer straight I loved it so much I would go to the bathroom only while the new game was loading put yourself in a position where you get to “play”. this is the biggest privilege of all.

thumb_up_off_alt9

chat_bubble_outline1

repeat1

shareShare

Mert Unsal

@mertunsal2020

2 months ago

Should I just make a blog post of Browser Use perf on these benchmarks? P.S. We stopped using these ones a while ago.

thumb_up_off_alt16

chat_bubble_outline10

repeat3

shareShare

Mert Unsal

@mertunsal2020

2 months ago

Browser Use on top, Gemini Computer Use on the bottom. In real time. Pick your side ⚡️

thumb_up_off_alt40

chat_bubble_outline7

repeat9

shareShare

Mert Unsal

@mertunsal2020

2 months ago

for some reason my posts are being blocked by X with no notice...

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Daniel Prevoznik

@danielprevoznik

2 months ago

For those who have tried the new models from Browser Use and Gemini (via Stagehand 🤘), which one has the better vibes for speed/accuracy?

thumb_up_off_alt3

chat_bubble_outline2

repeat1

shareShare

Gregor Zunic

@gregpr07

2 months ago

Where could we human label a few thousand (browser use) agent traces? We use llm as a judge for evals and want to know how aligned it is with human labels🌝

thumb_up_off_alt21

chat_bubble_outline5

repeat1

shareShare

Mert Unsal

@mertunsal2020

a month ago

People who claim agents will end up doing everything often underestimate how often humans want to stay in control. There’s a variety of reasons why this is the case. Sometimes, we just want to think and choose for ourselves. For example, it is very easy for food apps to have a

thumb_up_off_alt5

chat_bubble_outline2

repeat0

shareShare

Mert Unsal

@mertunsal2020

a month ago

all such discussions around “being conscious” lack a good understanding of consciousness what is it this “consciousness” that humans have? how can you tell something has consciousness or not externally? without answering these questions all these statements are void

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare