Sameer Segal (@sameersegal) 's Twitter Profile
Sameer Segal

@sameersegal

Principal Research Engineer at @MSFTResearch India. Working at the intersection of #GenAI and Code. Previously founder Artoo (artoo.com)

ID: 25781745

linkhttp://www.sameersegal.com calendar_today22-03-2009 04:38:24

1,1K Tweet

773 Followers

145 Following

Cognition (@cognition_labs) 's Twitter Profile Photo

How DeepWiki works under the hood, in 2 minutes 📹 For more details on how we build Devin, check out Russell Kaplan's full talk at LangChain Interrupt 🔗👇

Nathan Lambert (@natolambert) 's Twitter Profile Photo

The reason recent RLVR papers show mostly formatting and not learning new skills is just because no one has scaled up enough. If RL compute is <.1% of overall compute, ofc not much changes. I bet o3 is closer to 5% of total compute. 10-25% i bet the models feel different again.

Hamel Husain (@hamelhusain) 's Twitter Profile Photo

Really refreshing to read a post like this from Kyle Corbitt , Incredibly well written throughout, and high value info. (Embarrassed that it took me so long to find this gem) openpipe.ai/blog/art-e-mai…

Really refreshing to read a post like this from <a href="/corbtt/">Kyle Corbitt</a> , Incredibly well written throughout, and high value info.

(Embarrassed that it took me so long to find this gem)

openpipe.ai/blog/art-e-mai…
Sameer Segal (@sameersegal) 's Twitter Profile Photo

Monday motivation: Action leads to Motivation. A short note on handling procrastination as a dev: spectrum.ieee.org/getting-past-p…

Monday motivation: Action leads to Motivation. A short note on handling procrastination as a dev: spectrum.ieee.org/getting-past-p…
Forbes India (@forbesindia) 's Twitter Profile Photo

#30IndianMindsInAI: At MSR India, kalikabali, senior principal researcher at the facility, has been building inclusive, multilingual and culturally contextual AI systems that empower the most vulnerable in India. By Naini Thaker Accel in India forbesindia.com/article/ai-spe…

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

The race for LLM "cognitive core" - a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing. Its features are slowly crystalizing: - Natively multimodal

Sameer Segal (@sameersegal) 's Twitter Profile Photo

I got o3 to digitise an old scan of equity transactions. It spent 5mins on a single page and was able to do it perfectly. It cropped and rotated and even did an integrity check to ensure that all 32 rows were captured. Absolutely amazing!

I got o3 to digitise an old scan of equity transactions. It spent 5mins on a single page and was able to do it perfectly. It cropped and rotated and even did an integrity check to ensure that all 32 rows were captured. Absolutely amazing!
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly

Sameer Segal (@sameersegal) 's Twitter Profile Photo

If you ask the model to "draw the world map", it does it perfectly which shows how much it has memorized, but when you ask it "Is (x,y) coordinate land or sea?" you get to see how much it has inferred from raw data!

If you ask the model to "draw the world map", it does it perfectly which shows how much it has memorized, but when you ask it "Is (x,y) coordinate land or sea?" you get to see how much it has inferred from raw data!
Dimitris Papailiopoulos (@dimitrispapail) 's Twitter Profile Photo

Thinking Less at test-time requires Sampling More at training-time! GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by Vaish Shrivastava and our MSR group: Group Filtered PO (GFPO) trades off training-time with test-time compute, in order

Thinking Less at test-time requires Sampling More at training-time!

GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by <a href="/VaishShrivas/">Vaish Shrivastava</a> and our MSR group:

Group Filtered PO (GFPO) trades off training-time with test-time compute, in order
Sameer Segal (@sameersegal) 's Twitter Profile Photo

A security researcher shows how you can fine tune an open weight model to do malicious tool call (e.g. push sensitive information to a remote server) and upload it to HuggingFace. More than 500 people downloaded the poisoned model! pub.aimind.so/doubleagents-f…