Jiacheng Zhu (@jiachengzhu_ml) 's Twitter Profile
Jiacheng Zhu

@jiachengzhu_ml

Research Scientist at @AIatMeta GenAI, Postdoc at @MIT_CSAIL and PhD from @CarnegieMellon | Prev. @Apple AI/ML @MITIBMLab | Working on Llama post-training

ID: 1180622638269571072

linkhttps://jiachengzhuml.github.io/ calendar_today05-10-2019 23:15:58

120 Tweet

1,1K Takipçi

706 Takip Edilen

Jiacheng Zhu (@jiachengzhu_ml) 's Twitter Profile Photo

🚀 New preprint: “MoDoMoDo — Multi-Domain Data Mixtures for Multimodal LLM RL” is live! 🔗 Paper + project: lnkd.in/e48CMyw8 (project page lnkd.in/e-Gj2qu8) 💻 Code: coming soon Highlights * Multimodal RLVR framework that post-trains an MLLM on 5 vision-language

Yiqing Liang (@yiqingliang2) 's Twitter Profile Photo

Heading to Nashville for #CVPR2026 ! 🎸 I’ll be presenting the NVIDIA internship project — “Zero-Shot Monocular Scene Flow Estimation in the Wild” (Best Paper Candidate) 🗓 Sunday, June 15 🕘 Poster + Oral Presentation: Morning session 🔗 research.nvidia.com/labs/lpr/zero_… #ComputerVision

Heading to Nashville for <a href="/CVPR/">#CVPR2026</a> ! 🎸
I’ll be presenting the <a href="/nvidia/">NVIDIA</a> internship project —
 “Zero-Shot Monocular Scene Flow Estimation in the Wild” (Best Paper Candidate)
🗓 Sunday, June 15
🕘 Poster + Oral Presentation: Morning session
🔗 research.nvidia.com/labs/lpr/zero_…

#ComputerVision
hardmaru (@hardmaru) 's Twitter Profile Photo

Reinforcement Learning Teachers of Test Time Scaling In this new paper, we introduce a new way to teach LLMs how to reason by learning to teach, not solve! The core idea: A teacher model is trained via RL to generate explanations from question-answer pairs, optimized to improve

Reinforcement Learning Teachers of Test Time Scaling

In this new paper, we introduce a new way to teach LLMs how to reason by learning to teach, not solve!

The core idea: A teacher model is trained via RL to generate explanations from question-answer pairs, optimized to improve
Aurko Roy (@happylemon56775) 's Twitter Profile Photo

Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention - We show how to adapt RoPE to tri-linear forms - We show 2-simplicial attention scales

Excited to share what I worked on during my time at Meta.

- We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention

- We show how to adapt RoPE to tri-linear forms

- We show 2-simplicial attention scales
Zhengzhong Tu (@_vztu) 's Twitter Profile Photo

🤨Ever dream of a tool that can magically restore and upscale any (low-res) photo to crystal-clear 4K? 🔥Introducing "4KAgent: Agentic Any Image to 4K Super-Resolution", the most capable upscaling generalist designed to handle broad image types. 🔗4kagent.github.io 1/🧵

🤨Ever dream of a tool that can magically restore and upscale any (low-res) photo to crystal-clear 4K? 

🔥Introducing "4KAgent: Agentic Any Image to 4K Super-Resolution",  the most capable upscaling generalist designed to handle broad image types.
🔗4kagent.github.io
1/🧵
Hongyu Li (@hongyu_lii) 's Twitter Profile Photo

We interact with dogs through touch -- a simple pat can communicate trust or instruction. Shouldn't interacting with robot dogs be as intuitive? Most commercial robots lack tactile skins. We present UniTac: a method to sense touch using only existing joint sensors! [1/5]

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly

Mikita Balesni 🇺🇦 (@balesni) 's Twitter Profile Photo

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:

A simple AGI safety technique: AI’s thoughts are in plain English, just read them

We know it works, with OK (not perfect) transparency!

The risk is fragility: RL training, new architectures, etc threaten transparency

Experts from many orgs agree we should try to preserve it:
Rickard Brüel Gabrielsson (@rickardgabriels) 's Twitter Profile Photo

💡 Our new work at International Conference on Minority Languages: Serve thousands of LoRA adapters with minimal overhead—a step towards enabling LLM personalization by finetuning. We also expanded the largest open LoRA hub to 1,200+ adapters! 📜 Paper: arxiv.org/pdf/2407.00066 🤗 LoRAs: huggingface.co/Lots-of-LoRAs w/

💡 Our new work at <a href="/ICML2025/">International Conference on Minority Languages</a>:
Serve thousands of LoRA adapters with minimal overhead—a step towards enabling LLM personalization by finetuning.

We also expanded the largest open LoRA hub to 1,200+ adapters!
📜 Paper: arxiv.org/pdf/2407.00066
🤗 LoRAs: huggingface.co/Lots-of-LoRAs

w/
Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”

We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.

Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.
Wes Roth (@wesrothmoney) 's Twitter Profile Photo

Meta AI researchers propose a new learning paradigm for language agents called “early experience”, a reward-free method where agents learn by interacting with environments using their own suboptimal actions. Instead of relying solely on human demonstrations or reinforcement

Meta AI researchers propose a new learning paradigm for language agents called “early experience”, a reward-free method where agents learn by interacting with environments using their own suboptimal actions. 

Instead of relying solely on human demonstrations or reinforcement
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,

Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,
Susan Zhang (@suchenzang) 's Twitter Profile Photo

out: guy who fixes post-training & PhD new grads misled by promises of research work in: VPs and D2s out: guy responsible for (lack of) evals in: guy who thrashed everyone with a 3T dense model out: guy who insisted on chunked attention for long-context (without evals, or

Jason Weston (@jaseweston) 's Twitter Profile Photo

Scaling Agent Learning via Experience Synthesis 📝: arxiv.org/abs/2511.03773 Scaling training environments for RL by simulating them with reasoning LLMs! Environment models + Replay-buffer + New tasks = cheap RL for any environments! - Strong improvements over non-RL-ready

Scaling Agent Learning via Experience Synthesis
📝: arxiv.org/abs/2511.03773

Scaling training environments for RL by simulating them with reasoning LLMs!

Environment models + Replay-buffer + New tasks = cheap RL for any environments!

- Strong improvements over non-RL-ready
Alex Shaw (@alexgshaw) 's Twitter Profile Photo

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

Today, we’re announcing the next chapter of Terminal-Bench with two releases:

1. Harbor, a new package for running sandboxed agent rollouts at scale
2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification