Ge Zhang (@gezhang86038849) 's Twitter Profile
Ge Zhang

@gezhang86038849

M-A-P @mm_art_project , SEED Bytedance, TigerLab
Prev. 01, BAAI,

MERT, MAP-Neo, MMMU, OpenCI, MAmmoTH, MMLU-pro, Yi, COIG, YuE, SuperGPQA

ID: 1387004918255419395

calendar_today27-04-2021 11:25:25

765 Tweet

2,2K Followers

898 Following

Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
Zihao Wang (@realzihaowang) 's Twitter Profile Photo

๐Ÿš€ Thrilled to introduce Game-TARS: our next-gen generalist multimodal game agent! Tired of AI that needs custom code for every new game? Game-TARS is a single VLM that learns to master any game just like a human: by watching the screen and using a keyboard & mouse. Read more.

Dan Hendrycks (@danhendrycks) 's Twitter Profile Photo

Can AI automate jobs? We created the Remote Labor Index to test AIโ€™s ability to automate hundreds of long, real-world, economically valuable projects from remote work platforms. While AIs are smart, they are not yet that useful: the current automation rate is less than 3%.

Can AI automate jobs?

We created the Remote Labor Index to test AIโ€™s ability to automate hundreds of long, real-world, economically valuable projects from remote work platforms.

While AIs are smart, they are not yet that useful:
the current automation rate is less than 3%.
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Scaling Latent Reasoning via Looped Language Models 1.4B and 2.6B param LoopLMs pretrained on 7.7T tokens match the performance of 4B and 8B standard transformers respectively across nearly all benchmarks time to be bullish on adaptive computation again? great work by

Scaling Latent Reasoning via Looped Language Models

1.4B and 2.6B param LoopLMs pretrained on 7.7T tokens match the performance of 4B and 8B standard transformers respectively across nearly all benchmarks

time to be bullish on adaptive computation again?

great work by
1a3orn (@1a3orn) 's Twitter Profile Photo

Bytedance has open-weighted the largest looped Transformer (7 trillion tokens @ 2.6b parameters) that I've heard about. They also use some of the "Physics of Language Models"-style experiments to locate how loops improve things specifically (multi-hop, fact composition).

Bytedance has open-weighted the largest looped Transformer (7 trillion tokens @ 2.6b parameters) that I've heard about.

They also use some of the "Physics of Language Models"-style experiments to locate how loops improve things specifically (multi-hop, fact composition).
Tiezhen WANG (@xianbao_qian) 's Twitter Profile Photo

This work from ByteDance seed team will be transformative and open the era of iterative latent reasoning: why do model have to only think in human languages? At least I don't. The result is also significant: - 2.6B R4 (4 steps) model achieved comparable performance with

This work from <a href="/BytedanceTalk/">ByteDance</a> seed team will be transformative and open the era of iterative latent reasoning:

why do model have to only think in human languages?
At least I don't.

The result is also significant:
- 2.6B R4 (4 steps) model achieved comparable performance with
DailyPapers (@huggingpapers) 's Twitter Profile Photo

ByteDance unveils MIRA, a new benchmark for visual chain-of-thought It reveals that even the strongest multimodal LLMs struggle with complex visual reasoning unless they can "draw to think" with intermediate images, leading to significant performance gains.

ByteDance unveils MIRA, a new benchmark for visual chain-of-thought

It reveals that even the strongest multimodal LLMs struggle with complex visual reasoning unless they can "draw to think" with intermediate images, leading to significant performance gains.
๐š๐”ช๐Ÿพ๐šก๐šก๐Ÿพ (@gm8xx8) 's Twitter Profile Photo

RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization RLoop mitigates RLVR overfitting by looping RL exploration with RFT consolidation. Each cycle re-initializes from filtered expert trajectories, turning policy drift into lasting

RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization

RLoop mitigates RLVR overfitting by looping RL exploration with RFT consolidation.
Each cycle re-initializes from filtered expert trajectories, turning policy drift into lasting
Ge Zhang (@gezhang86038849) 's Twitter Profile Photo

Congrats to zeng zhiyuanโ€˜s amazing work! We keep pursuing methods simple yet efficiency, leading to scalable, both in model architecture and RL. just Loop your RL to get more stability!

Ge Zhang (@gezhang86038849) 's Twitter Profile Photo

Crazy Achievement! I will call it the first industrial level TTT Generalist Model. What's better? It's even designed for games, like a dream for these amazing R&Ds with anime avatar๐Ÿ˜„

PapersAnon (@papers_anon) 's Twitter Profile Photo

Virtual Width Networks From ByteDance. Decouples representational width from backbone width expanding the embedding space while keeping backbone compute near constant. 8ร— expansion accelerates optimization by over 2ร— for next-token and 3ร— for next-2-token prediction Links below

Virtual Width Networks

From ByteDance. Decouples representational width from backbone width expanding the embedding space while keeping backbone compute near constant. 8ร— expansion accelerates optimization by over 2ร— for next-token and 3ร— for next-2-token prediction

Links below
DailyPapers (@huggingpapers) 's Twitter Profile Photo

ByteDance introduces Virtual Width Networks (VWN) for efficient AI scaling This new framework expands model embedding space for wider representations while keeping compute constant. It accelerates optimization by over 2x for next-token and 3x for next-2-token prediction!

ByteDance introduces Virtual Width Networks (VWN) for efficient AI scaling

This new framework expands model embedding space for wider representations while keeping compute constant. It accelerates optimization by over 2x for next-token and 3x for next-2-token prediction!
Michael Saxon (@m2saxon) 's Twitter Profile Photo

Trying to decide what to do on the first day of #NeurIPS2025? Check out my, Martin Ziqiao Ma, and Xiang Yue's tutorial, "The Science of Benchmarking: What's Measured, What's Missing, What's Next" on December 2 from 1:30 to 4:00pm. What will we cover? 1/3

Trying to decide what to do on the first day of #NeurIPS2025? 

Check out my, <a href="/ziqiao_ma/">Martin Ziqiao Ma</a>, and <a href="/xiangyue96/">Xiang Yue</a>'s tutorial, "The Science of Benchmarking: What's Measured, What's Missing, What's Next" on December 2 from 1:30 to 4:00pm.  

What will we cover?  

1/3
Ge Zhang (@gezhang86038849) 's Twitter Profile Photo

Glad to see OmniBench & OmniVideoBench here. For all Omni-Modality Models, since you claim the Omni-Modality Models, you should definitely test your models on Omni-Modality Interaction.