Michael Zhang (@michaelrzhang) 's Twitter Profile
Michael Zhang

@michaelrzhang

PhD student doing machine learning / neural networks research @UofT @VectorInst. Prev: @UCBerkeley.

Journey before destination.

ID: 899756744561246208

linkhttps://www.cs.toronto.edu/~michael/ calendar_today21-08-2017 22:14:59

422 Tweet

2,2K Followers

493 Following

Nisarg Shah (@nsrg_shah) 's Twitter Profile Photo

I'm looking for a postdoc to work on algorithmic fairness, AI alignment, cooperative AI, AI safety, and related topics from the lens of social choice theory. Applications sent in by the end of January will receive full consideration. Details: cs.toronto.edu/theory/positio…

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

TinyZero reproduction of R1-Zero "experience the Ahah moment yourself for < $30" Given a base model, the RL finetuning can be relatively very cheap and quite accessible.

Ilya Abyzov (@ilyaabyzov) 's Twitter Profile Photo

Inspired by Andrej Karpathy and the idea of using games to compare LLMs, I've built a version of the game Codenames where different models are paired in teams to play the game with each other. Fun to see o3-mini team with R1 against Grok and Gemini! Link and repo below.

Ethan Mollick (@emollick) 's Twitter Profile Photo

Been waiting for someone to test this and see if it really works - can multiple AI agents fact-checking each other reduce hallucinations? The answer appears to be yes - using 3 agents with a structured review process reduced hallucination scores by ~2,800% across 310 test cases.

Been waiting for someone to test this and see if it really works - can multiple AI agents fact-checking each other reduce hallucinations?

The answer appears to be yes - using 3 agents with a structured review process reduced hallucination scores by ~2,800% across 310 test cases.
Michael Zhang (@michaelrzhang) 's Twitter Profile Photo

progress on math has been so fast. I remember how impressive Minerva was when it came out.. now you need to be good at competitive math to evaluate these latest models.

Dan Busbridge (@danbusbridge) 's Twitter Profile Photo

Reading "Distilling Knowledge in a Neural Network" left me fascinated and wondering: "If I want a small, capable model, should I distill from a more powerful model, or train from scratch?" Our distillation scaling law shows, well, it's complicated... 🧵 arxiv.org/abs/2502.08606

JJ Watt (@jjwatt) 's Twitter Profile Photo

It’s just incredible how much of a home run 4 Nations has been for the NHL and hockey in general. Friends who never watched a hockey game in their lives reaching out asking what the plan is for tonight’s game, what food we’re ordering, etc. Definition of growing the game.

Alex Albert (@alexalbert__) 's Twitter Profile Photo

One of the things we've been most impressed by internally at Anthropic is Claude 3.7 Sonnet's one-shot code generation ability. Here are a few of my favorite examples I've seen on here over the past day:

Shalev Lifshitz (@shalev_lif) 's Twitter Profile Photo

Hot off the Servers 🔥💻 --- we’ve found a new approach for scaling test-time compute! Multi-Agent Verification (MAV) scales the number of verifier models at test-time, which boosts LLM performance without any additional training. Now we can scale along two dimensions: by

Hot off the Servers 🔥💻 --- we’ve found a new approach for scaling test-time compute! Multi-Agent Verification (MAV) scales the number of verifier models at test-time, which boosts LLM performance without any additional training.

Now we can scale along two dimensions: by
Ethan Mollick (@emollick) 's Twitter Profile Photo

The past 18 months have seen the most rapid change in human written communication ever By. September 2024, 18% of financial consumer complaints, 24% of press releases, 15% of job postings & 14% of UN press releases showed signs of LLM writing. And the method undercounts true use

The past 18 months have seen the most rapid change in human written communication ever

By. September 2024, 18% of financial consumer complaints, 24% of press releases, 15% of job postings &amp; 14% of UN press releases showed signs of LLM writing. And the method undercounts true use
Andrew Ng (@andrewyng) 's Twitter Profile Photo

Some people today are discouraging others from learning programming on the grounds AI will automate it. This advice will be seen as some of the worst career advice ever given. I disagree with the Turing Award and Nobel prize winner who wrote, “It is far more likely that the

Alex Albert (@alexalbert__) 's Twitter Profile Photo

We wrote up what we've learned about using Claude Code internally at Anthropic. Here are the most effective patterns we've found (many apply to coding with LLMs generally):

We wrote up what we've learned about using Claude Code internally at Anthropic.

Here are the most effective patterns we've found (many apply to coding with LLMs generally):
Lilian Weng (@lilianweng) 's Twitter Profile Photo

Giving your models more time to think before prediction, like via smart decoding, chain-of-thoughts reasoning, latent thoughts, etc, turns out to be quite effective for unblocking the next level of intelligence. New post is here :) “Why we think”: lilianweng.github.io/posts/2025-05-…

Phil Fradkin (@phil_fradkin) 's Twitter Profile Photo

The news is out! We're starting Blank Bio to build a computational toolkit assisted with RNA foundation models. If you want to see my flip between being eerily still and overly animated check out the video below! The core hypothesis is that RNA is the most customizable molecule

Michael Zhang (@michaelrzhang) 's Twitter Profile Photo

Life update: I've recently moved to Boston and started a job Amazon Science ! I'm excited to explore - please share local recs and let me know if you want to grab coffee! (picture: White Mountains, NH)

Life update: I've recently moved to Boston and started a job <a href="/AmazonScience/">Amazon Science</a> ! 

I'm excited to explore - please share local recs and let me know if you want to grab coffee!

(picture: White Mountains, NH)
Taco Cohen (@tacocohen) 's Twitter Profile Photo

Exactly. I learned a ton of math during my PhD, and it was fun and easy *because I had a goal* to use it in my research. Coding it up is also a great way to detect gaps in your understanding. Totally different from learning in class. Another common fallacy is that you need to