Xi Ye (@xiye_nlp) Twitter Tweets • TwiCopy

Jiao Sun

a year ago

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference NeurIPS Conference We have ethical reviews for authors, but missed it for invited speakers? 😡

Mitigating racial bias from LLMs is a lot easier than removing it from humans!

Can’t believe this happened at the best AI conference <a href="/NeurIPSConf/">NeurIPS Conference</a>

We have ethical reviews for authors, but missed it for invited speakers? 😡

thumb_up_off_alt3,3K

chat_bubble_outline184

repeat837

shareShare

Greg Durrett

@gregd_nlp

10 months ago

Huge congrats to Prasann Singhal for being one of the 8 CRA Outstanding Undergraduate Researcher Award winners! It has been an absolute privilege to work with Prasann during his time at UT. (And he's applying for PhD programs this year...hint hint...) Prasann's work... 🧵

Huge congrats to <a href="/prasann_singhal/">Prasann Singhal</a> for being one of the 8 CRA Outstanding Undergraduate Researcher Award winners! It has been an absolute privilege to work with Prasann during his time at UT. (And he's applying for PhD programs this year...hint hint...)

Prasann's work... 🧵

thumb_up_off_alt99

chat_bubble_outline3

repeat12

shareShare

Xi Ye

@xiye_nlp

9 months ago

R1 is awesome and I love how R1, being a super introvert, is trying hard to pretend to be an extrovert.

thumb_up_off_alt20

chat_bubble_outline0

repeat1

shareShare

Yong Lin

@yong18850571

9 months ago

🚀 Introducing Goedel-Prover: A 7B LLM achieving SOTA open-source performance in automated theorem proving! 🔥 ✅ Improving +7% over previous open source SOTA on miniF2F 🏆 Ranking 1st on the PutnamBench Leaderboard 🤖 Solving 1.9X total problems compared to prior works on Lean

thumb_up_off_alt271

chat_bubble_outline13

repeat63

shareShare

Hongli Zhan

@honglizhan

9 months ago

Constitutional AI works great for aligning LLMs, but the principles can be too generic to apply. Can we guide responses with context-situated principles instead? Introducing SPRI, a system that produces principles tailored to each query, with minimal to no human effort. [1/5]

thumb_up_off_alt30

chat_bubble_outline1

repeat9

shareShare

Alex Wettig

@_awettig

9 months ago

🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N

thumb_up_off_alt195

chat_bubble_outline5

repeat48

shareShare

Jessy Li

@jessyjli

8 months ago

🌟Job ad🌟 We (Greg Durrett, Matt Lease and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!

thumb_up_off_alt71

chat_bubble_outline1

repeat23

shareShare

Association for Computing Machinery

@theofficialacm

8 months ago

Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! bit.ly/4hpdsbD

thumb_up_off_alt1,1K

chat_bubble_outline35

repeat479

shareShare

Fangyuan Xu

@brunchavecmoi

8 months ago

Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade rapidly in this setting. We present 𝐑𝐞𝐟𝐫𝐞𝐬𝐡𝐊𝐕, an inference method which ♻️ refreshes the smaller KV cache, which better preserves performance.

thumb_up_off_alt110

chat_bubble_outline1

repeat27

shareShare

Howard Yen

@howardyen1

7 months ago

Llama 4 Scout claims to support a context window of 10M tokens; the needle-in-a-haystack results are perfect, but can it handle real long-context tasks? We evaluate them on HELMET, our diverse and application-centric long-context benchmark, to be presented at #ICLR2025!

thumb_up_off_alt26

chat_bubble_outline2

repeat2

shareShare

Manya Wadhwa

@manyawadhwa1

6 months ago

Evaluating language model responses on open-ended tasks is hard! 🤔 We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️. EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

thumb_up_off_alt116

chat_bubble_outline4

repeat33

shareShare

Princeton PLI

@princetonpli

6 months ago

In a new blog post, Howard Yen and Xi Ye introduce HELMET and LongProc, two benchmarks from a recent effort to build a holistic test suite for evaluating long-context LMs. Read now: pli.princeton.edu/blog/2025/long…

In a new blog post, <a href="/HowardYen1/">Howard Yen</a> and <a href="/xiye_nlp/">Xi Ye</a> introduce HELMET and LongProc, two benchmarks from a recent effort to build a holistic test suite for evaluating long-context LMs.

Read now: pli.princeton.edu/blog/2025/long…

thumb_up_off_alt19

chat_bubble_outline0

repeat8

shareShare

Wenting Zhao

@wzhao_nlp

6 months ago

Some personal news: I'll join UMass Amherst CS as an assistant professor in fall 2026. Until then, I'll postdoc at Meta nyc. Reasoning will continue to be my main interest, with a focus on data-centric approaches🤩 If you're also interested, apply to me (phds & a postdoc)!

thumb_up_off_alt833

chat_bubble_outline95

repeat31

shareShare

Liyan Tang

@liyantang4

5 months ago

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

thumb_up_off_alt70

chat_bubble_outline2

repeat26

shareShare