Sihao Chen (@soshsihao) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Longqi Yang

@ylongqi

4 months ago

Internship alert! We have an immediate part-time research intern opening at Microsoft’s Office of Applied Research to improve LLM reasoning. Please reach out if you or your students are interested!

thumb_up_off_alt474

chat_bubble_outline50

repeat40

shareShare

🤖 Tired of slow tree searches on LLMs? 🚀 Check out our latest research on efficient tree search! 🔹 We introduce an upgraded transformer architecture that enables token-level self-reward modeling (TRM). 🔹 On top of that, we developed the Streaming Looking Ahead (SLA)

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Taiwei Shi

@taiwei_shi

3 months ago

📢 𝐖𝐢𝐥𝐝𝐅𝐞𝐞𝐝𝐛𝐚𝐜𝐤 A large-scale preference dataset built from 𝐫𝐞𝐚𝐥 𝐮𝐬𝐞𝐫 interactions with ChatGPT ✅ 𝟐𝟎𝐤+ preference pairs 🗣️ Built from 𝟏𝐌 chats 🔍 Annotated with 𝐝𝐢𝐚𝐥𝐨𝐠𝐮𝐞 𝐬𝐭𝐚𝐭𝐞, 𝐝𝐨𝐦𝐚𝐢𝐧, 𝐢𝐧𝐭𝐞𝐧𝐭, and more huggingface.co/datasets/micro…

thumb_up_off_alt47

chat_bubble_outline1

repeat10

shareShare

Yu Feng

@anniefeng6

2 months ago

#ICLR2025 Oral LLMs often struggle with reliable and consistent decisions under uncertainty 😵‍💫 — largely because they can't reliably estimate the probability of each choice. We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty. BIRD

thumb_up_off_alt256

chat_bubble_outline2

repeat38

shareShare

Bowen Jiang (Lauren) @ Penn

@laurenbjiang

2 months ago

🚀 How well can LLMs know you and personalize your response? Turns out, not so much! Introducing the PersonaMem Benchmark -- 👩🏻‍💻Evaluate LLM's ability to understand evolving persona from 180+ multi-session user-chatbot conversation history 🎯Latest models (GPT-4.1, GPT-4.5,

thumb_up_off_alt20

chat_bubble_outline3

repeat7

shareShare

Sihao Chen

@soshsihao

2 months ago

Will be at #NAACL this week. Let's talk if you are interested in RL, agents, and LLM post training in general!

thumb_up_off_alt18

chat_bubble_outline0

repeat2

shareShare

Taiwei Shi

@taiwei_shi

2 months ago

Want to 𝐜𝐮𝐭 𝐑𝐅𝐓 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞 𝐛𝐲 𝐮𝐩 𝐭𝐨 𝟐× and boost performance? 🚀 Meet 𝑨𝒅𝒂𝑹𝑭𝑻 — a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE). Less compute. Better results. 🧵 1/n

thumb_up_off_alt359

chat_bubble_outline3

repeat54

shareShare

Yi Tay

@yitayml

2 months ago

This is so true. LLM researchers seem to like to "specialize" in either pretraining or post training. Doing intense research on both sides does unlock something.

thumb_up_off_alt132

chat_bubble_outline3

repeat12

shareShare

Tong Chen @ ICLR

@tomchen0

a month ago

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data. 🛠️ No changes to pre-training or decoding 🔥 Training models to latently distinguish between memorized

thumb_up_off_alt98

chat_bubble_outline1

repeat30

shareShare

Sihao Chen

@soshsihao

a month ago

Join us if you want to work on the next-gen collaborative, socially- intelligent agents!

thumb_up_off_alt13

chat_bubble_outline0

repeat2

shareShare

Sihao Chen

@soshsihao

a month ago

🔥🔥Let's start cooking 😎😎

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Omar Khattab

@lateinteraction

a month ago

Missing nuance in the collective realization today: The non-trivial negative result is not that "RL just amplifies skills that are already there with low probability". Duh, that's obvious and not an issue actually. What got questioned today is that "dumb pretraining teaches the

thumb_up_off_alt121

chat_bubble_outline5

repeat12

shareShare

Sihao Chen

@soshsihao

18 days ago

Huge congrats Ashish Sharma!!👏👏

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Han Guo

@hanguo97

17 days ago

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

thumb_up_off_alt1,1K

chat_bubble_outline14

repeat185

shareShare

Sihao Chen

Gate.io

Longqi Yang

Hongming Zhang

Taiwei Shi

Yu Feng

Bowen Jiang (Lauren) @ Penn

Sihao Chen

Taiwei Shi

Yi Tay

Tong Chen @ ICLR

Sihao Chen

Sihao Chen

Omar Khattab

Sihao Chen

Han Guo