Zhengyang Tang (@zhengyang_42) 's Twitter Profile
Zhengyang Tang

@zhengyang_42

PhD candidate @cuhksz, Intern @Alibaba_Qwen. Prev: @MSFTResearch, @TencentGlobal, @AlibabaGroup.

ID: 759762884310159360

linkhttps://tangzhy.github.io/ calendar_today31-07-2016 14:49:22

17 Tweet

76 Followers

202 Following

arXiv Daily (@arxiv_daily) 's Twitter Profile Photo

DPTDR: Deep Prompt Tuning for Dense Passage Retrieval deepai.org/publication/dp… by Zhengyang Tang et al. including Benyou Wang #NaturalLanguageProcessing #Computation

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

Microsoft presents GLAN (Generalized Instruction Tuning) Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models GLAN excels without using task-specific training data arxiv.org/abs/2402.13064

Microsoft presents GLAN (Generalized Instruction Tuning)

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

GLAN excels without using task-specific training data

arxiv.org/abs/2402.13064
AK (@_akhaliq) 's Twitter Profile Photo

MathScale Scaling Instruction Tuning for Mathematical Reasoning Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate.

MathScale

Scaling Instruction Tuning for Mathematical Reasoning

Large language models (LLMs) have demonstrated remarkable capabilities in problem-solving. However, their proficiency in solving mathematical problems remains inadequate.
Zhengyang Tang (@zhengyang_42) 's Twitter Profile Photo

🚀 Launching ORLM: the first open-source Operations Research LLM, powered by our OR-Instruct process! 🛠️ 🏆 ORLMs achieves SOTA on NL4OPT, MAMO, & the new IndustryOR benchmarks based on different 7b backbones! 📄 Paper: arxiv.org/pdf/2405.17743 💻 Code: github.com/Cardinal-Opera…

🚀 Launching ORLM: the first open-source Operations Research LLM, powered by our OR-Instruct process! 🛠️

🏆 ORLMs achieves SOTA on NL4OPT, MAMO, & the new IndustryOR benchmarks based on different 7b backbones!

📄 Paper: arxiv.org/pdf/2405.17743
💻 Code: github.com/Cardinal-Opera…
Qingxiu Dong (@qx_dong) 's Twitter Profile Photo

OpenAI o1 scores 94.8% on MATH dataset😲 Then...how should we proceed to track and evaluate the next-gen LLMs' math skills? 👉Omni-Math: a new, challenging benchmark with 4k competition-level problems, where OpenAI o1-mini only achieves 60.54 acc Paper: huggingface.co/papers/2410.07…

OpenAI o1 scores 94.8% on MATH dataset😲
Then...how should we proceed to track and evaluate the next-gen LLMs' math skills? 

👉Omni-Math: a new, challenging benchmark with 4k competition-level problems, where OpenAI o1-mini only achieves 60.54 acc
Paper: huggingface.co/papers/2410.07…
Zhengyang Tang (@zhengyang_42) 's Twitter Profile Photo

📢 Introducing SCRIT: A framework enabling LLMs to self-evolve their critique abilities without human annotations or stronger models. 💡 Key features: • Contrastive self-critic • Mathematical validity check • Zero external supervision 🔗 Paper: huggingface.co/papers/2501.05…

📢 Introducing SCRIT: A framework enabling LLMs to self-evolve their critique abilities without human annotations or stronger models.

💡 Key features:
• Contrastive self-critic
• Mathematical validity check
• Zero external supervision

🔗 Paper: huggingface.co/papers/2501.05…
Ziniu Li @ ICLR2025 (@ziniuli) 's Twitter Profile Photo

🚀 Critique abilities are key for scaling LLMs, but current open-source models fall short. We introduce SCRIT: a framework with scalable oversight that enables LLMs to self-improve their critique skills✨ We’ve built a pipeline to generate high-quality synthetic critique data

🚀 Critique abilities are key for scaling LLMs, but current open-source models fall short.

We introduce SCRIT: a  framework with scalable oversight that enables LLMs to self-improve their critique skills✨

We’ve built a pipeline to generate high-quality synthetic critique data
Zhengyang Tang (@zhengyang_42) 's Twitter Profile Photo

Thrilled to share our paper "ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling" has been accepted by Operations Research! 🎉 This is the FIRST LLM paper in the 70+ year history of this prestigious journal. Our framework improves modeling

Thrilled to share our paper "ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling" has been accepted by Operations Research! 🎉

This is the FIRST LLM paper in the 70+ year history of this prestigious journal. Our framework improves modeling
Zhengyang Tang (@zhengyang_42) 's Twitter Profile Photo

Super excited to have been part of the Qwen3 team! We just dropped our technical report - check it out if you're interested in what's under the hood. Hope it helps with your projects and research. Let us know what you think! #Qwen3 #AI

AK (@_akhaliq) 's Twitter Profile Photo

Learning from Peers in Reasoning Models Large Reasoning Models often get stuck when they start reasoning incorrectly (the "Prefix Dominance Trap"). Propose LeaP (Learning from Peers), a method where parallel reasoning paths share intermediate summaries to learn from each other

Learning from Peers in Reasoning Models

Large Reasoning Models often get stuck when they start reasoning incorrectly (the "Prefix Dominance Trap"). Propose LeaP (Learning from Peers), a method where parallel reasoning paths share intermediate summaries to learn from each other
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

CoRT: Code-integrated Reasoning within Thinking "This paper introduces CoRT, a post-training framework for teaching LRMs to leverage Code Interpreter effectively and efficiently." "We manually create 30 high-quality samples, upon which we post-train models ranging from 1.5B to

CoRT: Code-integrated Reasoning within Thinking

"This paper introduces CoRT, a post-training framework for teaching LRMs to leverage Code Interpreter effectively and efficiently."

"We manually create 30 high-quality samples, upon which we post-train models ranging from 1.5B to
Zhengyang Tang (@zhengyang_42) 's Twitter Profile Photo

We’re excited to share our new paper “CoRT: Code-integrated Reasoning within Thinking”! 🤖 A post-training framework that teaches Large Reasoning Models (LRMs) to better leverage Code Interpreters for enhanced mathematical reasoning. 🔍 Key Highlights: Strategic hint

We’re excited to share our new paper “CoRT: Code-integrated Reasoning within Thinking”!

🤖 A post-training framework that teaches Large Reasoning Models (LRMs) to better leverage Code Interpreters for enhanced mathematical reasoning.

🔍 Key Highlights:

Strategic hint
Zhengyang Tang (@zhengyang_42) 's Twitter Profile Photo

Happy to share that our paper "Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion" has been accepted to #ACL2025 as oral & panel presentation (25 out of 3000 accepted papers = top 0.8%)! 🎉 🚀 We introduce AceGPT with Progressive Vocabulary

Zhengyang Tang (@zhengyang_42) 's Twitter Profile Photo

🚀 Thrilled to announce that our paper "SCRIT: Self-Evolving LLM Critique without Human or Stronger Models" was accepted to #COLM2025! We enable LLMs to self-improve critique abilities — zero human annotations, zero stronger models needed! 🔄✨ Looking forward to meeting

Binyuan Hui (@huybery) 's Twitter Profile Photo

We’ve updated Qwen3 and made excellent progress. The non‑reasoning model now delivers significant improvements across a wide range of tasks and many of its capabilities already rival those of reasoning models. It’s truly remarkable, and we hope you enjoy it!