Tian Liang (@skytliang) Twitter Tweets • TwiCopy

Tian Liang

2 years ago

🤯Mind blown by the brilliant debate between large language models! #AI is truly the new intellectual battleground! 🤖💬 But here's a thought💡: Can these models upgrade themselves through self-debate? Imagine the possibilities! 😲🚀

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Zhuosheng Zhang

@zhangzhuosheng

2 years ago

🔭 Thinking of interesting problems in the era of large language models? 💡Check out our latest work for improving language model pre-training efficiency, commonsense fact verification, mitigating hallucination, aligning with human preference. 🧵

thumb_up_off_alt51

chat_bubble_outline3

repeat9

shareShare

Zhuosheng Zhang

@zhangzhuosheng

2 years ago

🚀Exploring Human-Like Translation Strategy with Large Language Models Prompting LLMs to mimic the human translation process by extracting knowledge step by step, resolving 59% hallucination mistakes. 📑Paper: arxiv.org/pdf/2305.04118… (Preprint) ⭐️Code：github.com/zwhe99/MAPS-mt

thumb_up_off_alt6

chat_bubble_outline2

repeat1

shareShare

Tian Liang

@skytliang

2 years ago

Interesting work! The future belongs to Multimodal!👍

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Jiao Wenxiang

@wenxiangjiao

2 years ago

Introducing GAMA-Bench to measure LLMs’ decision-making abilities through the lens of Game Theory. Changing settings ("Guess 2/3 of the Average": 2/3 -> 5/6) can reflect if LLMs only memorize common settings in training data or indeed understand the game. arxiv.org/pdf/2403.11807…

thumb_up_off_alt69

chat_bubble_outline2

repeat18

shareShare

Longyue Wang

@wangly0229

a year ago

🚀 Exciting News! 🚀 ✨TransAgents, the cutting-edge virtual multi-agent translation company powered by advanced LLMs, is now live! 🌍 Experience our demo system at transagents.ai and kickstart your own AI venture with LLMs today. For a limited time, we're offering

thumb_up_off_alt225

chat_bubble_outline11

repeat69

shareShare

Tian Liang

@skytliang

a year ago

Accepted to EMNLP24 Main! Sincere gratitude and congratulations to my collaborators!🥳🥳🥳

thumb_up_off_alt17

chat_bubble_outline2

repeat2

shareShare

Nathan Lambert

@natolambert

a year ago

Things of note (not that much) in this longer o1 video: 1. “Model with RL is better at finding new CoT steps than humans” 2. “Emergence of self critique was a powerful moment” 3. Mentioned a literal timeout for the model, and the model was like “aha I got it” but maybe a

thumb_up_off_alt325

chat_bubble_outline11

repeat43

shareShare

Tian Liang

@skytliang

a year ago

🤣🤣🤣

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Zhaopeng Tu

@tuzhaopeng

a year ago

Honored to be nominated for the Best Paper Award at ACM MM 2024 🎉🎉🎉 2024.acmmm.org/best-paper

thumb_up_off_alt44

chat_bubble_outline2

repeat6

shareShare

Tian Liang

@skytliang

8 months ago

‼️Critical Tokens Matter‼️ We find that in mathematical reasoning, certain tokens, which we refer to as critical tokens, have a significant impact on the outcome. In some cases, a SINGLE critical token can undermine the entire rollout solution (even if sampled thousands of times)

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Tian Liang

@skytliang

6 months ago

What happens when step-level search meets critique? Enter PANEL! Our novel framework uses NL self-critique to guide reasoning. It's more informative, versatile, and efficient. Dive into our paper to learn more!

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Zhaopeng Tu

@tuzhaopeng

5 months ago

How can we push AI models to master genuinely challenging math reasoning? Introducing DeepMath-103K, a large-scale, rigorously decontaminated mathematical dataset designed specifically for RL and high-level reasoning. Constructing DeepMath-103K required 138,000 US dollars in

thumb_up_off_alt441

chat_bubble_outline14

repeat86

shareShare

Zhaopeng Tu

@tuzhaopeng

4 months ago

Trust your AI, but can it trust itself? 🤔 Introducing an online reinforcement learning framework, RISE (Reinforcing Reasoning with Self-Verification), enabling LLMs to simultaneously level-up BOTH their problem-solving AND self-checking skills! 🧐 Problems tackled: ✅

thumb_up_off_alt110

chat_bubble_outline0

repeat32

shareShare

Jiahao Xu

@jiahaox82739261

4 months ago

🚨 Announcing DeepTheorem: Revolutionizing LLM Mathematical Reasoning! 🚀 𝕋𝕃𝔻ℝ: - 🌟 Learning by exploration is the most important rationale that recent RL-zero training teaches us since self-exploration significantly boosts the utilization of LLM pre-training knowledge; -

thumb_up_off_alt154

chat_bubble_outline0

repeat58

shareShare

Zeyuan Allen-Zhu, Sc.D.

@zeyuanallenzhu

3 months ago

No matter how AI evolves overnight—tech, career, how it may impact me—I remain committed to using "physics of language models" approach to predict next-gen AI. Due to my limited GPU access at Meta, Part 4.1 (+new 4.2) are still in progress, but results on Canon layers are shining

thumb_up_off_alt815

chat_bubble_outline22

repeat61

shareShare

Rohan Paul

@rohanpaul_ai

3 months ago

Google researchers show a 60M‑parameter text model can predict data‑center efficiency almost perfectly. The study proves text‑to‑text regression beats classic tabular tricks by 100x and adapts in 500 examples. Fixed‑length tensors drop structure from logs, so older regressors

thumb_up_off_alt289

chat_bubble_outline7

repeat51

shareShare