Wenqi Zhang (@spicysweet1859) 's Twitter Profile
Wenqi Zhang

@spicysweet1859

Engineer, PHD for LLM Research

ID: 1668513129553354754

calendar_today13-06-2023 06:58:31

70 Tweet

151 Followers

296 Following

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

"Scale-up" is NOT dead. High-quality data is the true key to effective scaling, particularly textbook-level, high-quality knowledge corpora. The project in this paper collected massive online instructional videos and extracted keyframes and their corresponding audio

merve (@mervenoyann) 's Twitter Profile Photo

Alibaba released Multimodal Textbook: a new multimodal pre-training set from online instructional videos (22k hours) ๐Ÿง‘๐Ÿปโ€๐Ÿซ๐Ÿ“• 6,5M images interleaved witk 800k text on math, physics, chemistry ๐Ÿ‘

Alibaba released Multimodal Textbook: a new multimodal pre-training set from online instructional videos (22k hours) ๐Ÿง‘๐Ÿปโ€๐Ÿซ๐Ÿ“•

6,5M images interleaved witk 800k text on math, physics, chemistry ๐Ÿ‘
Xin (Ted) Li (@lixin4ever) 's Twitter Profile Photo

๐Ÿš€๐Ÿš€๐Ÿš€Announcing VideoLLaMA3, our latest MLLMs for image and video understanding: - Highly capable 7B models: DocVQA: 94.9, MathVision: 26.2, VideoMME: 66.2/70.3, MLVU: 73.0 - Competitive 2B models for edge devices: MMMU: 45.3, VideoMME: 59.6/63.4 - Frontier-class video model

๐Ÿš€๐Ÿš€๐Ÿš€Announcing VideoLLaMA3, our latest MLLMs for image and video understanding:
- Highly capable 7B models:  DocVQA: 94.9, MathVision: 26.2, VideoMME: 66.2/70.3, MLVU: 73.0
- Competitive 2B models for edge devices: MMMU: 45.3, VideoMME: 59.6/63.4
- Frontier-class video model
The AI Timeline (@theaitimeline) 's Twitter Profile Photo

๐ŸšจThis week's top AI/ML research papers: - DeepSeek-R1 - Kimi k1.5 - UI-TARS - Can We Generate Images with CoT? - Physics of Skill Learning - Test-time regression - SRMT - Scaling Laws for Optimal Sparsity for MoE LMs - Distillation Quantification for LLMs - Autonomy-of-Experts

๐ŸšจThis week's top AI/ML research papers:

- DeepSeek-R1
- Kimi k1.5
- UI-TARS
- Can We Generate Images with CoT?
- Physics of Skill Learning
- Test-time regression
- SRMT
- Scaling Laws for Optimal Sparsity for MoE LMs
- Distillation Quantification for LLMs
- Autonomy-of-Experts
Mingyang Chen (@chen_mingyang) 's Twitter Profile Photo

๐ŸŒŸIntroducing ๐—ฅ๐—ฒ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต: Learning to Reason with Search for LLMs via Reinforcement Learning. An open-source project that combines ๐—ฅ๐—Ÿ and ๐—ฅ๐—”๐—š for LLMs! ๐Ÿ’กLike Deepseek-R1-Zero and Deep Research, we start with pretrained models and use RL to empower them with the

AIGCLINK (@aigclink) 's Twitter Profile Photo

ๅคšๆ‰€้ซ˜ๆ กๅ’Œ้˜ฟ้‡Œ่”ๅˆๅ‡บ็š„ไธ€ไธชๅ…ท่บซๆ™บ่ƒฝๆจกๅž‹๏ผšEmbodied-Reasoner๏ผŒๅฎƒ้€š่ฟ‡่ง†่ง‰ๆœ็ดขใ€ๆŽจ็†ไปฅๅŠๆ‰ง่กŒ่กŒๅŠจ็ป„ๅˆ่ตทๆฅๅฎŒๆˆไบคไบ’ๅผไปปๅŠก ๅฎƒ่ƒฝๆ„Ÿ็Ÿฅๅนถ็†่งฃ็Žฏๅขƒ๏ผŒ่ฟ˜่ƒฝ้€š่ฟ‡ๆ€่€ƒๅ’Œ่ง„ๅˆ’ๆฅๅฎŒๆˆๅคๆ‚็š„ไปปๅŠก๏ผŒๅ…ถๅคๅˆไปปๅŠก่ƒฝๅŠ›ๅผบ๏ผŒ่ถ…ๅ‡บGPT-4o 39.9% ๆˆๅŠŸ็އๆฏ” OpenAI o1้ซ˜9.6%๏ผŒๆœ็ดขๆ•ˆ็އไธŠๆฏ”OpenAI o1้ซ˜12%

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Reflections fix errors. Multi-step reflection keeps exploration consistent, ensuring very minimal wasted moves. Embodied tasks need vision-driven reasoning, but models often fail. This paper unifies observation, reflection, and action, offering consistent planning and

Reflections fix errors. Multi-step reflection keeps exploration consistent, ensuring very minimal wasted moves.

Embodied tasks need vision-driven reasoning, but models often fail.

This paper unifies observation, reflection, and action, offering consistent planning and
Xin (Ted) Li (@lixin4ever) 's Twitter Profile Photo

๐Ÿšจ NEW PAPER ALERT! Even TOP VLMs FAIL at ELEMENTARY SCHOOL MATH! ๐Ÿง โŒ We present VCBench (huggingface.co/papers/2504.18โ€ฆ), revealing the ALARMING TRUTH: all of the latest vision-language models score BELOW 50% on BASIC math problems that 10-year-olds solve easily! ๐Ÿ˜ฑ๐Ÿคฏ WHY? These

๐Ÿšจ NEW PAPER ALERT! Even TOP VLMs FAIL at ELEMENTARY SCHOOL MATH! ๐Ÿง โŒ
We present VCBench (huggingface.co/papers/2504.18โ€ฆ), revealing the ALARMING TRUTH: all of the latest vision-language models score BELOW 50% on BASIC math problems that 10-year-olds solve easily! ๐Ÿ˜ฑ๐Ÿคฏ
WHY? These
Xin Eric Wang @ ICLR 2025 (@xwang_lk) 's Twitter Profile Photo

๐˜๐˜ถ๐˜ฎ๐˜ข๐˜ฏ๐˜ด ๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜ฌ ๐˜ง๐˜ญ๐˜ถ๐˜ช๐˜ฅ๐˜ญ๐˜บโ€”๐˜ฏ๐˜ข๐˜ท๐˜ช๐˜จ๐˜ข๐˜ต๐˜ช๐˜ฏ๐˜จ ๐˜ข๐˜ฃ๐˜ด๐˜ต๐˜ณ๐˜ข๐˜ค๐˜ต ๐˜ค๐˜ฐ๐˜ฏ๐˜ค๐˜ฆ๐˜ฑ๐˜ต๐˜ด ๐˜ฆ๐˜ง๐˜ง๐˜ฐ๐˜ณ๐˜ต๐˜ญ๐˜ฆ๐˜ด๐˜ด๐˜ญ๐˜บ, ๐˜ง๐˜ณ๐˜ฆ๐˜ฆ ๐˜ง๐˜ณ๐˜ฐ๐˜ฎ ๐˜ณ๐˜ช๐˜จ๐˜ช๐˜ฅ ๐˜ญ๐˜ช๐˜ฏ๐˜จ๐˜ถ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ค ๐˜ฃ๐˜ฐ๐˜ถ๐˜ฏ๐˜ฅ๐˜ข๐˜ณ๐˜ช๐˜ฆ๐˜ด. But current reasoning models remain constrained by discrete tokens, limiting their full

๐˜๐˜ถ๐˜ฎ๐˜ข๐˜ฏ๐˜ด ๐˜ต๐˜ฉ๐˜ช๐˜ฏ๐˜ฌ ๐˜ง๐˜ญ๐˜ถ๐˜ช๐˜ฅ๐˜ญ๐˜บโ€”๐˜ฏ๐˜ข๐˜ท๐˜ช๐˜จ๐˜ข๐˜ต๐˜ช๐˜ฏ๐˜จ ๐˜ข๐˜ฃ๐˜ด๐˜ต๐˜ณ๐˜ข๐˜ค๐˜ต ๐˜ค๐˜ฐ๐˜ฏ๐˜ค๐˜ฆ๐˜ฑ๐˜ต๐˜ด ๐˜ฆ๐˜ง๐˜ง๐˜ฐ๐˜ณ๐˜ต๐˜ญ๐˜ฆ๐˜ด๐˜ด๐˜ญ๐˜บ, ๐˜ง๐˜ณ๐˜ฆ๐˜ฆ ๐˜ง๐˜ณ๐˜ฐ๐˜ฎ ๐˜ณ๐˜ช๐˜จ๐˜ช๐˜ฅ ๐˜ญ๐˜ช๐˜ฏ๐˜จ๐˜ถ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ค ๐˜ฃ๐˜ฐ๐˜ถ๐˜ฏ๐˜ฅ๐˜ข๐˜ณ๐˜ช๐˜ฆ๐˜ด. But current reasoning models remain constrained by discrete tokens, limiting their full
Jingyuan Qi (@jingyuan_qi) 's Twitter Profile Photo

๐Ÿš€ Introducing AR-RAG: Autoregressive Retrieval Augmentation for Image Generation: arxiv.org/pdf/2506.06962 ๐Ÿ” Dynamic patch-level retrieval during generation ๐Ÿง  Context-aware visual references that evolve with your image ๐Ÿ“ˆ Significant gains on Midjourney, GenEval and DPG-Bench

๐Ÿš€ Introducing AR-RAG: Autoregressive Retrieval Augmentation for Image Generation: arxiv.org/pdf/2506.06962
๐Ÿ” Dynamic patch-level retrieval during generation
๐Ÿง  Context-aware visual references that evolve with your image
๐Ÿ“ˆ Significant gains on Midjourney, GenEval and DPG-Bench
Yongliang Shen (@itricktreat) 's Twitter Profile Photo

Introducing EasySteer: High-performance LLM steering framework built on vLLM. Achieves 5.5-11.4ร— speedup over existing tools while maintaining 71-84% throughput. Paper: arxiv.org/abs/2509.25175 Code: github.com/ZJU-REAL/EasySโ€ฆ HF Paper: huggingface.co/papers/2509.25โ€ฆ

Introducing EasySteer: High-performance LLM steering framework built on vLLM. Achieves 5.5-11.4ร— speedup over existing tools while maintaining 71-84% throughput.
Paper: arxiv.org/abs/2509.25175
Code: github.com/ZJU-REAL/EasySโ€ฆ
HF Paper: huggingface.co/papers/2509.25โ€ฆ