Sihao Chen (@soshsihao) 's Twitter Profile
Sihao Chen

@soshsihao

Researcher @ Microsoft #OAR. Previously: @upennnlp @cogcomp @GoogleAI; #NLProc. Opnions my own.

ID: 2430441172

linkhttp://sihaoc.github.io calendar_today23-03-2014 07:08:52

107 Tweet

864 Followers

513 Following

Longqi Yang (@ylongqi) 's Twitter Profile Photo

Internship alert! We have an immediate part-time research intern opening at Microsoftโ€™s Office of Applied Research to improve LLM reasoning. Please reach out if you or your students are interested!

Hongming Zhang (@hongming110) 's Twitter Profile Photo

๐Ÿค– Tired of slow tree searches on LLMs? ๐Ÿš€ Check out our latest research on efficient tree search! ๐Ÿ”น We introduce an upgraded transformer architecture that enables token-level self-reward modeling (TRM). ๐Ÿ”น On top of that, we developed the Streaming Looking Ahead (SLA)

๐Ÿค– Tired of slow tree searches on LLMs?

๐Ÿš€ Check out our latest research on efficient tree search! 

๐Ÿ”น We introduce an upgraded transformer architecture that enables token-level self-reward modeling (TRM).
๐Ÿ”น On top of that, we developed the Streaming Looking Ahead (SLA)
Taiwei Shi (@taiwei_shi) 's Twitter Profile Photo

๐Ÿ“ข ๐–๐ข๐ฅ๐๐…๐ž๐ž๐๐›๐š๐œ๐ค A large-scale preference dataset built from ๐ซ๐ž๐š๐ฅ ๐ฎ๐ฌ๐ž๐ซ interactions with ChatGPT โœ… ๐Ÿ๐ŸŽ๐ค+ preference pairs ๐Ÿ—ฃ๏ธ Built from ๐Ÿ๐Œ chats ๐Ÿ” Annotated with ๐๐ข๐š๐ฅ๐จ๐ ๐ฎ๐ž ๐ฌ๐ญ๐š๐ญ๐ž, ๐๐จ๐ฆ๐š๐ข๐ง, ๐ข๐ง๐ญ๐ž๐ง๐ญ, and more huggingface.co/datasets/microโ€ฆ

Yu Feng (@anniefeng6) 's Twitter Profile Photo

#ICLR2025 Oral LLMs often struggle with reliable and consistent decisions under uncertainty ๐Ÿ˜ตโ€๐Ÿ’ซ โ€” largely because they can't reliably estimate the probability of each choice. We propose BIRD ๐Ÿฆ, a framework that significantly enhances LLM decision making under uncertainty. BIRD

#ICLR2025 Oral

LLMs often struggle with reliable and consistent decisions under uncertainty ๐Ÿ˜ตโ€๐Ÿ’ซ โ€” largely because they can't reliably estimate the probability of each choice.

We propose BIRD ๐Ÿฆ, a framework that significantly enhances LLM decision making under uncertainty.

BIRD
Bowen Jiang (Lauren) @ Penn (@laurenbjiang) 's Twitter Profile Photo

๐Ÿš€ How well can LLMs know you and personalize your response? Turns out, not so much! Introducing the PersonaMem Benchmark -- ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ปEvaluate LLM's ability to understand evolving persona from 180+ multi-session user-chatbot conversation history ๐ŸŽฏLatest models (GPT-4.1, GPT-4.5,

๐Ÿš€ How well can LLMs know you and personalize your response? Turns out, not so much!

Introducing the PersonaMem Benchmark --
๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ปEvaluate LLM's ability to understand evolving persona from 180+ multi-session user-chatbot conversation history
๐ŸŽฏLatest models (GPT-4.1, GPT-4.5,
Taiwei Shi (@taiwei_shi) 's Twitter Profile Photo

Want to ๐œ๐ฎ๐ญ ๐‘๐…๐“ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐ญ๐ข๐ฆ๐ž ๐›๐ฒ ๐ฎ๐ฉ ๐ญ๐จ ๐Ÿร— and boost performance? ๐Ÿš€ Meet ๐‘จ๐’…๐’‚๐‘น๐‘ญ๐‘ป โ€” a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE). Less compute. Better results. ๐Ÿงต 1/n

Want to ๐œ๐ฎ๐ญ ๐‘๐…๐“ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐ญ๐ข๐ฆ๐ž ๐›๐ฒ ๐ฎ๐ฉ ๐ญ๐จ ๐Ÿร— and boost performance? ๐Ÿš€

Meet ๐‘จ๐’…๐’‚๐‘น๐‘ญ๐‘ป โ€” a lightweight, plug-and-play curriculum learning method you can drop into any mainstream RFT algorithms (PPO, GRPO, REINFORCE).

Less compute. Better results. ๐Ÿงต 1/n
Yi Tay (@yitayml) 's Twitter Profile Photo

This is so true. LLM researchers seem to like to "specialize" in either pretraining or post training. Doing intense research on both sides does unlock something.

Tong Chen @ ICLR (@tomchen0) 's Twitter Profile Photo

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data. ๐Ÿ› ๏ธ No changes to pre-training or decoding ๐Ÿ”ฅ Training models to latently distinguish between memorized

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data.
๐Ÿ› ๏ธ No changes to pre-training or decoding
๐Ÿ”ฅ Training models to latently distinguish between memorized
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

Missing nuance in the collective realization today: The non-trivial negative result is not that "RL just amplifies skills that are already there with low probability". Duh, that's obvious and not an issue actually. What got questioned today is that "dumb pretraining teaches the

Han Guo (@hanguo97) 's Twitter Profile Photo

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between?

Introducing Log-Linear Attention with:

- Log-linear time training
- Log-time inference (in both time and memory)
- Hardware-efficient Triton kernels