Tian Liang (@skytliang) 's Twitter Profile
Tian Liang

@skytliang

NLP Researcher, Tencent AI Lab
Homepage: skytliang.github.io

ID: 1470054121286041600

calendar_today12-12-2021 15:33:20

37 Tweet

69 Followers

145 Following

Tian Liang (@skytliang) 's Twitter Profile Photo

🤯Mind blown by the brilliant debate between large language models! #AI is truly the new intellectual battleground! 🤖💬 But here's a thought💡: Can these models upgrade themselves through self-debate? Imagine the possibilities! 😲🚀

Zhuosheng Zhang (@zhangzhuosheng) 's Twitter Profile Photo

🔭 Thinking of interesting problems in the era of large language models? 💡Check out our latest work for improving language model pre-training efficiency, commonsense fact verification, mitigating hallucination, aligning with human preference. 🧵

🔭 Thinking of interesting problems in the era of large language models? 💡Check out our latest work for improving language model pre-training efficiency, commonsense fact verification, mitigating hallucination, aligning with human preference. 🧵
Zhuosheng Zhang (@zhangzhuosheng) 's Twitter Profile Photo

🚀Exploring Human-Like Translation Strategy with Large Language Models Prompting LLMs to mimic the human translation process by extracting knowledge step by step, resolving 59% hallucination mistakes. 📑Paper: arxiv.org/pdf/2305.04118… (Preprint) ⭐️Code:github.com/zwhe99/MAPS-mt

Jiao Wenxiang (@wenxiangjiao) 's Twitter Profile Photo

Introducing GAMA-Bench to measure LLMs’ decision-making abilities through the lens of Game Theory. Changing settings ("Guess 2/3 of the Average": 2/3 -> 5/6) can reflect if LLMs only memorize common settings in training data or indeed understand the game. arxiv.org/pdf/2403.11807…

Introducing GAMA-Bench to measure LLMs’ decision-making abilities through the lens of Game Theory.

Changing settings ("Guess 2/3 of the Average": 2/3 -> 5/6) can reflect if LLMs only memorize common settings in training data or indeed understand the game.
arxiv.org/pdf/2403.11807…
Longyue Wang (@wangly0229) 's Twitter Profile Photo

🚀 Exciting News! 🚀 ✨TransAgents, the cutting-edge virtual multi-agent translation company powered by advanced LLMs, is now live! 🌍 Experience our demo system at transagents.ai and kickstart your own AI venture with LLMs today. For a limited time, we're offering

Nathan Lambert (@natolambert) 's Twitter Profile Photo

Things of note (not that much) in this longer o1 video: 1. “Model with RL is better at finding new CoT steps than humans” 2. “Emergence of self critique was a powerful moment” 3. Mentioned a literal timeout for the model, and the model was like “aha I got it” but maybe a

Tian Liang (@skytliang) 's Twitter Profile Photo

‼️Critical Tokens Matter‼️ We find that in mathematical reasoning, certain tokens, which we refer to as critical tokens, have a significant impact on the outcome. In some cases, a SINGLE critical token can undermine the entire rollout solution (even if sampled thousands of times)

Tian Liang (@skytliang) 's Twitter Profile Photo

What happens when step-level search meets critique? Enter PANEL! Our novel framework uses NL self-critique to guide reasoning. It's more informative, versatile, and efficient. Dive into our paper to learn more!

Zhaopeng Tu (@tuzhaopeng) 's Twitter Profile Photo

How can we push AI models to master genuinely challenging math reasoning? Introducing DeepMath-103K, a large-scale, rigorously decontaminated mathematical dataset designed specifically for RL and high-level reasoning. Constructing DeepMath-103K required 138,000 US dollars in

How can we push AI models to master genuinely challenging math reasoning?

Introducing DeepMath-103K, a large-scale, rigorously decontaminated mathematical dataset designed specifically for RL and high-level reasoning. Constructing DeepMath-103K required 138,000 US dollars in
Zhaopeng Tu (@tuzhaopeng) 's Twitter Profile Photo

Trust your AI, but can it trust itself? 🤔 Introducing an online reinforcement learning framework, RISE (Reinforcing Reasoning with Self-Verification), enabling LLMs to simultaneously level-up BOTH their problem-solving AND self-checking skills! 🧐 Problems tackled: ✅

Trust your AI, but can it trust itself? 🤔

Introducing an online reinforcement learning framework, RISE (Reinforcing Reasoning with Self-Verification), enabling LLMs to simultaneously level-up BOTH their problem-solving AND self-checking skills!

🧐 Problems tackled:
✅
Jiahao Xu (@jiahaox82739261) 's Twitter Profile Photo

🚨 Announcing DeepTheorem: Revolutionizing LLM Mathematical Reasoning! 🚀 𝕋𝕃𝔻ℝ: - 🌟 Learning by exploration is the most important rationale that recent RL-zero training teaches us since self-exploration significantly boosts the utilization of LLM pre-training knowledge; -

🚨 Announcing DeepTheorem: Revolutionizing LLM Mathematical Reasoning! 🚀

𝕋𝕃𝔻ℝ:
- 🌟 Learning by exploration is the most important rationale that recent RL-zero training teaches us since self-exploration significantly boosts the utilization of LLM pre-training knowledge;

-
Zeyuan Allen-Zhu, Sc.D. (@zeyuanallenzhu) 's Twitter Profile Photo

No matter how AI evolves overnight—tech, career, how it may impact me—I remain committed to using "physics of language models" approach to predict next-gen AI. Due to my limited GPU access at Meta, Part 4.1 (+new 4.2) are still in progress, but results on Canon layers are shining

No matter how AI evolves overnight—tech, career, how it may impact me—I remain committed to using "physics of language models" approach to predict next-gen AI. Due to my limited GPU access at Meta, Part 4.1 (+new 4.2) are still in progress, but results on Canon layers are shining
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Google researchers show a 60M‑parameter text model can predict data‑center efficiency almost perfectly. The study proves text‑to‑text regression beats classic tabular tricks by 100x and adapts in 500 examples. Fixed‑length tensors drop structure from logs, so older regressors

Google researchers show a 60M‑parameter text model can predict data‑center efficiency almost perfectly.

The study proves text‑to‑text regression beats classic tabular tricks by 100x and adapts in 500 examples.

Fixed‑length tensors drop structure from logs, so older regressors