Tianqing Fang @ ACL24 (@tfang229) 's Twitter Profile
Tianqing Fang @ ACL24

@tfang229

Incoming Research Scientist @TencentGlobal AI Lab. PhD. @hkust @HKUSTKnowComp #NLProc. Ex. visiting Ph.D. @epfl_en. Ex visiting scholar at LUKA lab @CSatUSC.

ID: 1641361658269540352

linkhttp://fangtq.com calendar_today30-03-2023 08:48:23

34 Tweet

226 Takipçi

272 Takip Edilen

Jianshu Zhang ✈️ICLR2025🇸🇬 (@sterzhang) 's Twitter Profile Photo

My earlier work #VLM2Bench called for clearer principles on *when* language aids vision. Our new work **MindCube**: *First Map then Reasoning* puts this into practice—tackling limited-view spatial reasoning with appropriate language-side mapping. Welcome check it out! 🚀

Wenhao Yu (@wyu_nd) 's Twitter Profile Photo

📢 New paper alert 📢 We introduce MobileGUI-RL, an RL framework advancing mobile GUI agents through trajectory-based rollouts and rewards in 𝗼𝗻𝗹𝗶𝗻𝗲 environments. With RL, Qwen 2.5-VL achieves 44.8% Success on Android World! ✨ Checkout paper at: arxiv.org/abs/2507.05720

📢 New paper alert 📢

We introduce MobileGUI-RL, an RL framework advancing mobile GUI agents through trajectory-based rollouts and rewards in 𝗼𝗻𝗹𝗶𝗻𝗲 environments.

With RL, Qwen 2.5-VL achieves 44.8% Success on Android World! ✨

Checkout paper at: arxiv.org/abs/2507.05720
Zhenwen Liang (@liangzhenwen) 's Twitter Profile Photo

🚀 Thrilled to share our new paper at Tencent AI Lab! We introduce a framework DRP-IMO that proves 5 post-2000 IMO problems — a set where no prior open-source prover has solved a single one. 🤯 Link: arxiv.org/pdf/2507.06804 Details in thread👇

🚀 Thrilled to share our new paper at Tencent AI Lab!

We introduce a framework DRP-IMO that proves 5 post-2000 IMO problems — a set where no prior open-source prover has solved a single one. 🤯

Link: arxiv.org/pdf/2507.06804

Details in thread👇
Dong Yu (@dong_yu_ai) 's Twitter Profile Photo

We have some interesting findings in our recent work "One Token to Fool LLM-as-a-Judge" (arxiv.org/abs/2507.08794) that will affect RLVR with generative reward models.

Wenhao Yu (@wyu_nd) 's Twitter Profile Photo

🗒️Have been exploring Agent-RL training over the past few months, particularly in GUI scenarios. Here’s a summary of some practical insights and lessons 🤔 learned from the perspective of an industry researcher, and some reference papers.

🗒️Have been exploring Agent-RL training over the past few months, particularly in GUI scenarios.

Here’s a summary of some practical insights and lessons 🤔 learned from the perspective of an industry researcher, and some reference papers.
DailyPapers (@huggingpapers) 's Twitter Profile Photo

Tencent AI Lab just released Cognitive Kernel-Pro on Hugging Face! A fully open-source & free multi-module agent framework designed for deep research & agent foundation model training. Achieves state-of-the-art among open-source agents on GAIA.

Tencent AI Lab just released Cognitive Kernel-Pro on Hugging Face!

A fully open-source & free multi-module agent framework designed for deep research & agent foundation model training.

Achieves state-of-the-art among open-source agents on GAIA.
Tianqing Fang @ ACL24 (@tfang229) 's Twitter Profile Photo

Thank you for sharing our work! The code, data, and models have been open-sourced for the research community’s benefit: github.com/Tencent/Cognit… huggingface.co/CognitiveKerne… huggingface.co/datasets/Cogni…

DailyPapers (@huggingpapers) 's Twitter Profile Photo

Tencent AI Lab introduces R-Zero! A groundbreaking framework enabling LLMs to self-evolve their reasoning capabilities from zero human-curated data, through an autonomous Challenger-Solver loop.

Tencent AI Lab introduces R-Zero!

A groundbreaking framework enabling LLMs to self-evolve their reasoning capabilities

from zero human-curated data, through an autonomous Challenger-Solver loop.
DailyPapers (@huggingpapers) 's Twitter Profile Photo

Microsoft just released Agent Lightning on Hugging Face. Train ANY AI agents with Reinforcement Learning with almost ZERO code change! A flexible and extensible framework that fully decouples agents from RL training.

Microsoft just released Agent Lightning on Hugging Face.

Train ANY AI agents with Reinforcement Learning with almost ZERO code change!

A flexible and extensible framework that fully decouples agents from RL training.
Wenhao Yu (@wyu_nd) 's Twitter Profile Photo

𝑳𝑳𝑴𝒔 can really 𝑺𝒆𝒍𝒇-𝑬𝒗𝒐𝒍𝒗𝒆, 𝒘𝒊𝒕𝒉𝒐𝒖𝒕 𝑯𝒖𝒎𝒂𝒏 𝑫𝒂𝒕𝒂! -- One LLM, two roles: Challenger creates tasks, Solver answers them. -- No data, no labels, just a base model that learns and improves itself! We name it 𝑹-𝒛𝒆𝒓𝒐: arxiv.org/abs/2508.05004

𝑳𝑳𝑴𝒔 can really 𝑺𝒆𝒍𝒇-𝑬𝒗𝒐𝒍𝒗𝒆, 𝒘𝒊𝒕𝒉𝒐𝒖𝒕 𝑯𝒖𝒎𝒂𝒏 𝑫𝒂𝒕𝒂!

-- One LLM, two roles: Challenger creates tasks, Solver answers them.
-- No data, no labels, just a base model that learns and improves itself!

We name it 𝑹-𝒛𝒆𝒓𝒐: arxiv.org/abs/2508.05004
Shizhe Diao (@shizhediao) 's Twitter Profile Photo

This week, we open-sourced NVIDIA-Nemotron-Nano-v2-9B: our next-generation efficient hybrid model. - 6× faster than Qwen3-8B at reasoning tasks. - Retained long-context capability (8k → 262k trained, usable at 128k) First true demonstration that reasoning models can be

This week, we open-sourced NVIDIA-Nemotron-Nano-v2-9B: our next-generation efficient hybrid model.

- 6× faster than Qwen3-8B at reasoning tasks.
- Retained long-context capability (8k → 262k trained, usable at 128k)
First true demonstration that reasoning models can be
Tianqing Fang @ ACL24 (@tfang229) 's Twitter Profile Photo

WebEvolver is accepted by #EMNLP2025 main conference. See you in Suzhou, China! Code: github.com/Tencent/SelfEv… Paper: arxiv.org/abs/2504.21024

Wenhao Yu (@wyu_nd) 's Twitter Profile Photo

New paper: VLMs can self-reward during RL training — no visual annotations needed! -- Decompose VLM reasoning into visual vs. language parts -- Prompt the same VLM without visual input for visual reward We call it 𝐕𝐢𝐬𝐢𝐨𝐧-𝐒(𝐞𝐥𝐟)𝐑𝟏: arxiv.org/abs/2508.19652

New paper: VLMs can self-reward during RL training — no visual annotations needed!

-- Decompose VLM reasoning into visual vs. language parts
--  Prompt the same VLM without visual input for visual reward

We call it 𝐕𝐢𝐬𝐢𝐨𝐧-𝐒(𝐞𝐥𝐟)𝐑𝟏: arxiv.org/abs/2508.19652