Sihao Chen (@soshsihao) 's Twitter Profile
Sihao Chen

@soshsihao

Researcher @ Microsoft #OAR. Previously: @upennnlp @cogcomp @GoogleAI; #NLProc. Opnions my own.

ID: 2430441172

linkhttp://sihaoc.github.io calendar_today23-03-2014 07:08:52

107 Tweet

864 Takipçi

513 Takip Edilen

Longqi Yang (@ylongqi) 's Twitter Profile Photo

Internship alert! We have an immediate part-time research intern opening at Microsoft’s Office of Applied Research to improve LLM reasoning. Please reach out if you or your students are interested!

Hongming Zhang (@hongming110) 's Twitter Profile Photo

🤖 Tired of slow tree searches on LLMs? 🚀 Check out our latest research on efficient tree search! 🔹 We introduce an upgraded transformer architecture that enables token-level self-reward modeling (TRM). 🔹 On top of that, we developed the Streaming Looking Ahead (SLA)

🤖 Tired of slow tree searches on LLMs?

🚀 Check out our latest research on efficient tree search! 

🔹 We introduce an upgraded transformer architecture that enables token-level self-reward modeling (TRM).
🔹 On top of that, we developed the Streaming Looking Ahead (SLA)
Taiwei Shi (@taiwei_shi) 's Twitter Profile Photo

📢 𝐖𝐢𝐥𝐝𝐅𝐞𝐞𝐝𝐛𝐚𝐜𝐤 A large-scale preference dataset built from 𝐫𝐞𝐚𝐥 𝐮𝐬𝐞𝐫 interactions with ChatGPT ✅ 𝟐𝟎𝐤+ preference pairs 🗣️ Built from 𝟏𝐌 chats 🔍 Annotated with 𝐝𝐢𝐚𝐥𝐨𝐠𝐮𝐞 𝐬𝐭𝐚𝐭𝐞, 𝐝𝐨𝐦𝐚𝐢𝐧, 𝐢𝐧𝐭𝐞𝐧𝐭, and more huggingface.co/datasets/micro…

Yu Feng (@anniefeng6) 's Twitter Profile Photo

#ICLR2025 Oral LLMs often struggle with reliable and consistent decisions under uncertainty 😵‍💫 — largely because they can't reliably estimate the probability of each choice. We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty. BIRD

#ICLR2025 Oral

LLMs often struggle with reliable and consistent decisions under uncertainty 😵‍💫 — largely because they can't reliably estimate the probability of each choice.

We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty.

BIRD
Bowen Jiang (Lauren) @ Penn (@laurenbjiang) 's Twitter Profile Photo

🚀 How well can LLMs know you and personalize your response? Turns out, not so much! Introducing the PersonaMem Benchmark -- 👩🏻‍💻Evaluate LLM's ability to understand evolving persona from 180+ multi-session user-chatbot conversation history 🎯Latest models (GPT-4.1, GPT-4.5,

🚀 How well can LLMs know you and personalize your response? Turns out, not so much!

Introducing the PersonaMem Benchmark --
👩🏻‍💻Evaluate LLM's ability to understand evolving persona from 180+ multi-session user-chatbot conversation history
🎯Latest models (GPT-4.1, GPT-4.5,
Taiwei Shi (@taiwei_shi) 's Twitter Profile Photo

Want to 𝐜𝐮𝐭 𝐑𝐅𝐓 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞 𝐛𝐲 𝐮𝐩 𝐭𝐨 𝟐× and boost performance? 🚀 Meet 𝑨𝒅𝒂𝑹𝑭𝑻 — a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE). Less compute. Better results. 🧵 1/n

Want to 𝐜𝐮𝐭 𝐑𝐅𝐓 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐭𝐢𝐦𝐞 𝐛𝐲 𝐮𝐩 𝐭𝐨 𝟐× and boost performance? 🚀

Meet 𝑨𝒅𝒂𝑹𝑭𝑻 — a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE).

Less compute. Better results. 🧵 1/n
Yi Tay (@yitayml) 's Twitter Profile Photo

This is so true. LLM researchers seem to like to "specialize" in either pretraining or post training. Doing intense research on both sides does unlock something.

Tong Chen @ ICLR (@tomchen0) 's Twitter Profile Photo

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data. 🛠️ No changes to pre-training or decoding 🔥 Training models to latently distinguish between memorized

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data.
🛠️ No changes to pre-training or decoding
🔥 Training models to latently distinguish between memorized
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

Missing nuance in the collective realization today: The non-trivial negative result is not that "RL just amplifies skills that are already there with low probability". Duh, that's obvious and not an issue actually. What got questioned today is that "dumb pretraining teaches the

Han Guo (@hanguo97) 's Twitter Profile Photo

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?

Introducing Log-Linear Attention with:

- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels