Ryan Smith (@rnsmith49) 's Twitter Profile
Ryan Smith

@rnsmith49

Mechanistic Interpretability research, LLM Routing, and general GenAI optimization at withmartian.com

ID: 1948796683170799616

link calendar_today25-07-2025 17:26:12

50 Tweet

24 Followers

40 Following

Abir HARRASSE (@aharrasse1906) 's Twitter Profile Photo

Paper alert 🚨: LLMs build a shared multilingual latent space for meaning, decoding into languages only later. 🌍 Performance gaps come from tokenizer bias & weaker late-layer circuits, not missing concepts. We show this mechanistically with Cross-Layer Transcoders. 🧵👇

Paper alert 🚨: LLMs build a shared multilingual latent space for meaning, decoding into languages only later. 🌍 Performance gaps come from tokenizer bias & weaker late-layer circuits, not missing concepts. We show this mechanistically with Cross-Layer Transcoders. 🧵👇
Martian (@withmartian) 's Twitter Profile Photo

We’re excited for the first round of the Martian Interpretability Prize! The proposal deadline was this week, and we’ve had a ton of applications. Can’t wait to look through these and get back to everyone who applied. More rounds coming up 🙂

We’re excited for the first round of the Martian Interpretability Prize! The proposal deadline was this week, and we’ve had a ton of applications. Can’t wait to look through these and get back to everyone who applied.
 
More rounds coming up 🙂
Josh Greaves (@joshua_gre63805) 's Twitter Profile Photo

We'll be presenting the ARES roadmap at office hours tomorrow at 2pm PT. If you're interested in agents | RL | interp and want to contribute to open-source send me a DM for more info. github.com/withmartian/ar…

Shriyash Upadhyay (@shriyashku) 's Twitter Profile Photo

Thinking of training an RL model specialized for OpenClaw🦞 using ARES. So it gets better performance and much lower token costs. Is this something folks would be interested in using? 100 likes and I put up an endpoint

Ryan Smith (@rnsmith49) 's Twitter Profile Photo

ARES 🤝 Mech Interp We’re excited that you can now easily dig into model internals across long-horizon tasks with ARES! We’ve built some nice integrations with TransformerLens, with even better support for hooks coming soon!

Ryan Smith (@rnsmith49) 's Twitter Profile Photo

This is one of the biggest takeaways for me - model internals change over steps after interacting with the environment! I love this other figure Narmeen Oozeer made that shows this too - a matrix of cosine similarity between optimal steering vectors at each step:

This is one of the biggest takeaways for me - model internals change over steps after interacting with the environment! I love this other figure <a href="/Narmeen29013644/">Narmeen Oozeer</a> made that shows this too - a matrix of cosine similarity between optimal steering vectors at each step:
Martian (@withmartian) 's Twitter Profile Photo

A new ARES tutorial from Narmeen Oozeer: Getting started in long-horizon interp. When do agents fail to accurately model their environment? How de we fix them? And how can you run these experiments on your own machine?

Ryan Smith (@rnsmith49) 's Twitter Profile Photo

Verification is easier than Generation - the same is true for Code, and I am really excited to see the pipeline of: Code Review getting more robust evals → better code review tools → better coding tools

Ryan Smith (@rnsmith49) 's Twitter Profile Photo

Coding tools are getting way better, but also way more convincing to humans (or me at least). We definitely need to continue building robust evals for code review so we know we are keeping the "LGTM effect" in check

Shriyash Upadhyay (@shriyashku) 's Twitter Profile Photo

Our code review tracker caught the release of Claude Code Review before Anthropic announced it. Greptile v4 hit #1 on CRB. The tracker caught it before their announcement. The data is predicting something new from @Devin Review in the near future. Here's how. 🧵

Ashley Zhang (@ashleyzhang110) 's Twitter Profile Photo

I've been playing around with eval-ing AI code review tools at work. We track 22 different ones. Greptile V4 had the single biggest improvement I've ever measured. Recall increased 47% from 38.7 → 56.9%

Fazl Barez (@fazlbarez) 's Twitter Profile Photo

If this policy is not revoked, I won’t be reviewing/ACing for #NeurIPS Science requires open exchange of ideas! When participation gets shaped by geopolitics, it ends up reflecting power structures, not merit--narrows what science can be and powerful nations get full control!

Ryan Smith (@rnsmith49) 's Twitter Profile Photo

Personally, I love the shift in the industry of optimizing for two separate workflows - fast and iterative, vs slow and offline. And I'm always a sucker for seeing data back up intuition 😅

Ryan Smith (@rnsmith49) 's Twitter Profile Photo

I unironically love the Lotka-Volterra equations, so yes I may spend my Wednesday morning diving into the code of an April Fools’ post

Ryan Smith (@rnsmith49) 's Twitter Profile Photo

I've been so excited to see Mech Interp techniques actually generalize to improving real-world applications, and it feels like we're finally at that point!

Ryan Smith (@rnsmith49) 's Twitter Profile Photo

It's pretty wild to think that a year ago CLI-first code tools didn't really exist, and now here we are measuring the entire E2E effectiveness of LLMs in software development

CodeRabbit (@coderabbitai) 's Twitter Profile Photo

Think OpenAI and Anthropic own the AI game? Open-source models already handle 50%+ of real production requests! New pod episode with our own Erfan & Shriyash Upadhyay from Martian on the DeepSeek moment, vendor lock-in risk, and the fine-tuning cheat code.