Zitong Yang (@zitongyang0) 's Twitter Profile
Zitong Yang

@zitongyang0

Statistician

ID: 1063135984399740928

linkhttps://zitongyang.github.io/ calendar_today15-11-2018 18:25:45

259 Tweet

701 Followers

377 Following

Thinking Machines (@thinkymachines) 's Twitter Profile Photo

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
Berkeley Physics (@berkeleyphysics) 's Twitter Profile Photo

Nobel laureate George Smoot, UC Berkeley physicist whose work with satellite experiments confirmed the Big Bang theory, has died at 80. news.berkeley.edu/2025/09/29/nob…

Nobel laureate George Smoot, UC Berkeley physicist whose work with satellite experiments confirmed the Big Bang theory, has died at 80.
news.berkeley.edu/2025/09/29/nob…
Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!

Introducing Tinker: a flexible API for fine-tuning language models.

Write training loops in Python on your laptop; we'll run them on distributed GPUs.

Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
Ruiqi Zhong (@zhongruiqi) 's Twitter Profile Photo

Very excited about this release!! As a former grad student I struggled to finetune llms. Even when the gpus are enough, it was painful to set up the infra correctly. Tinker allows more researchers to understand and language models, beyond a few well-funded labs.

Sam Buchanan (@_sdbuchanan) 's Twitter Profile Photo

We wrote a book about representation learning! It’s fully open source, available and readable online, and covers everything from theoretical foundations to practical algorithms. 👷‍♂️ We’re hard at work updating the content for v2.0, and would love your feedback and contributions

We wrote a book about representation learning!

It’s fully open source, available and readable online, and covers everything from theoretical foundations to practical algorithms.

👷‍♂️ We’re hard at work updating the content for v2.0, and would love your feedback and contributions
Druv Pai (@druv_pai) 's Twitter Profile Photo

🚨 We wrote a new AI textbook "Learning Deep Representations of Data Distributions"! TL;DR: We develop principles for representation learning in large scale deep neural networks, show that they underpin existing methods, and build new principled methods.

🚨 We wrote a new AI textbook "Learning Deep Representations of Data Distributions"!   

TL;DR: We develop principles for representation learning in large scale deep neural networks, show that they underpin existing methods, and build new principled methods.
Druv Pai (@druv_pai) 's Twitter Profile Photo

Why and how do diffusion models memorize vs generalize? Can we have scaling laws for memorization? This is increasingly relevant scientifically and pragmatically (e.g. Sora 2). 🚨 Our new preprint "On the Edge of Memorization in Diffusion Models" addresses this timely question!

CLS (@chengleisi) 's Twitter Profile Photo

I’ll be at #COLM2025 this week! I’ll give a lightening talk at the Visions Workshop on 11am Friday and hang around our LM4SCI @ COLM2025 workshop! DM me if you wanna chat. We have some exciting ongoing projects on automating post-/pre-training research.

Zitong Yang (@zitongyang0) 's Twitter Profile Photo

The passing of the physicist Chen-Ning Yang (nytimes.com/2025/10/18/sci…) saddens me. He has been a long-time hero and role model for me. Below is a short essay I wrote yesterday about Yang that I shared with many of my friends. I translated it into English using Gemini: ``` The

The passing of the physicist Chen-Ning Yang (nytimes.com/2025/10/18/sci…) saddens me. He has been a long-time hero and role model for me. Below is a short essay I wrote yesterday about Yang that I shared with many of my friends. I translated it into English using Gemini:

```
The
John Schulman (@johnschulman2) 's Twitter Profile Photo

Fine-tuning APIs are becoming more powerful and widespread, but they're harder to safeguard against misuse than fixed-weight sampling APIs. Excited to share a new paper: Detecting Adversarial Fine-tuning with Auditing Agents (arxiv.org/abs/2510.16255). Auditing agents search

Simon Guo 🦝 (@simonguozirui) 's Twitter Profile Photo

Wrote a 1-year retrospective with Alex L Zhang on KernelBench and the journey toward automated GPU/CUDA kernel generations! Since my labmates (Anne Ouyang, Simran Arora, William Hu) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have

Wrote a 1-year retrospective with <a href="/a1zhang/">Alex L Zhang</a> on KernelBench and the journey toward automated GPU/CUDA kernel generations!

Since my labmates (<a href="/anneouyang/">Anne Ouyang</a>, <a href="/simran_s_arora/">Simran Arora</a>, <a href="/_williamhu/">William Hu</a>) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have
Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
Kevin Lu (@_kevinlu) 's Twitter Profile Photo

in our new post, we walk through great prior work from Rishabh Agarwal & the Qwen team exploring on-policy distillation using an open source recipe: you can run our experiments on Tinker today! github.com/thinking-machi… i'm especially excited by the use of on-policy

Judy Shen (@judyhshen) 's Twitter Profile Photo

I DEFENDED MY PHD THIS WEEK! 🎉 So grateful for the guidance of my advisor and committee! Special thanks to my friends and family who supported me through every up and down 🥺🥰

I DEFENDED MY PHD THIS WEEK! 🎉 So grateful for the guidance of my advisor and committee! Special thanks to my friends and family who supported me through every up and down 🥺🥰
Sarah Cen (@cen_sarah) 's Twitter Profile Photo

In the AI ecosystem, who supplies the data? the compute? the models? We just released a new tool on the AI Supply Chain. Our dataset reveals how AI models, data, compute, capital, and even talent change hands. Here’s why you should care 👇

In the AI ecosystem, who supplies the data? the compute? the models? 

We just released a new tool on the AI Supply Chain. Our dataset reveals how AI models, data, compute, capital, and even talent change hands. 

Here’s why you should care 👇