Luxi (Lucy) He (@luxihelucy) 's Twitter Profile
Luxi (Lucy) He

@luxihelucy

Princeton CS PhD @PrincetonPLI. Previously @Harvard ‘23 CS & Math.

ID: 1583989164223172608

linkhttps://lumos23.github.io/ calendar_today23-10-2022 01:10:27

66 Tweet

963 Followers

370 Following

Yangsibo Huang (@yangsibohuang) 's Twitter Profile Photo

Attending Conference on Language Modeling from 10/6 to 10/9! If you want to chat about GenAI security, privacy, safety, or reasoning (I just started exploring it!), DM me :) & My team at Google AI is looking for interns. Email me ([email protected]) your resume if you are interested.

Tianyu Gao (@gaotianyu1350) 's Twitter Profile Photo

Very proud to introduce two of our recent long-context works: HELMET (best long-context benchmark imo): shorturl.at/JnBHD ProLong (a cont’d training & SFT recipe + a SoTA 512K 8B model): shorturl.at/XQV7a Here is a story of how we arrived there

Very proud to introduce two of our recent long-context works:

HELMET (best long-context benchmark imo): shorturl.at/JnBHD
ProLong (a cont’d training & SFT recipe + a SoTA 512K 8B model): shorturl.at/XQV7a

Here is a story of how we arrived there
Luxi (Lucy) He (@luxihelucy) 's Twitter Profile Photo

I'm attending Conference on Language Modeling next week! Excited to meet folks and chat about alignment, safety, reasoning, LM evaluations, and more! Please feel free to reach out anytime :) Mengzhou Xia and I will present our work on data selection + safety on Tuesday afternoon, come chat with us!

I'm attending <a href="/COLM_conf/">Conference on Language Modeling</a> next week! Excited to meet folks and chat about alignment, safety, reasoning, LM evaluations, and more! Please feel free to reach out anytime :)
<a href="/xiamengzhou/">Mengzhou Xia</a> and I will present our work on data selection + safety on Tuesday afternoon, come chat with us!
Sadhika Malladi (@sadhikamalladi) 's Twitter Profile Photo

Theory + exps in our new work show that preference tuning can move probability mass in unexpected ways, causing aligned models (across scales and settings) to unalign. For example, training a model to prefer "No" over "Never" makes prob of "Yes" increase. arxiv.org/abs/2410.08847

Luxi (Lucy) He (@luxihelucy) 's Twitter Profile Photo

Join us today at 3 pm ET for a discussion on AI safety and alignment with David Krueger 🤩 Submit your questions in advance at the link in the post!

Yangsibo Huang (@yangsibohuang) 's Twitter Profile Photo

Unlearning allows users request the removal of specific data from a trained model. Sounds great, right? 👿 BUT: we show how adversaries can exploit this to completely DESTROY model accuracy—plummeting to just 3.6% on CIFAR-10 and 0.4% on ImageNet after the attack! (1/n)

Unlearning allows users request the removal of specific data from a trained model.

Sounds great, right? 

👿 BUT: we show how adversaries can exploit this to completely DESTROY model accuracy—plummeting to just 3.6% on CIFAR-10 and 0.4% on ImageNet after the attack!

(1/n)
Ryan Liu @ NeurIPS 2024 (@theryanliu) 's Twitter Profile Photo

Is encouraging LLMs to reason through a task always beneficial?🤔 NO🛑- inspired by when verbal thinking makes humans worse at tasks, we predict when CoT impairs LLMs & find 3 types of failure cases. In one OpenAI o1 preview accuracy drops 36.3% compared to GPT-4o zero-shot!😱

Is encouraging LLMs to reason through a task always beneficial?🤔

NO🛑- inspired by when verbal thinking makes humans worse at tasks, we predict when CoT impairs LLMs &amp; find 3 types of failure cases. 

In one OpenAI o1 preview accuracy drops 36.3% compared to GPT-4o zero-shot!😱
Luxi (Lucy) He (@luxihelucy) 's Twitter Profile Photo

Excited for the talk today at 2pm ET! YouTube link here youtube.com/@PrincetonPLI and submit your questions via forms.gle/7GQXAr9aonfvy1… 🤩

Sadhika Malladi (@sadhikamalladi) 's Twitter Profile Photo

Congratulations to Ai2 on the exciting Tulu 3 release! We had Nathan Lambert on PASS a few weeks ago to talk all about it. Check out the recording for an easy primer to the paper: youtube.com/watch?v=ltSzUI…

Tianyu Gao (@gaotianyu1350) 's Twitter Profile Photo

Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by simply prepending source URLs to training documents. arxiv.org/abs/2501.01956

Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by simply prepending source URLs to training documents.

arxiv.org/abs/2501.01956
Simon Park (@parksimon0808) 's Twitter Profile Photo

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance. Paper arxiv.org/abs/2501.02669 Code github.com/princeton-pli/… Abhishek Panigrahi Yun (Catherine) Cheng Dingli Yu Anirudh Goyal Sanjeev Arora

Does all LLM reasoning transfer to VLM? In context of Simple-to-Hard generalization we show: NO! We also give ways to reduce this modality imbalance.

Paper arxiv.org/abs/2501.02669
Code github.com/princeton-pli/…

<a href="/Abhishek_034/">Abhishek Panigrahi</a> <a href="/chengyun01/">Yun (Catherine) Cheng</a> <a href="/dingli_yu/">Dingli Yu</a> <a href="/anirudhg9119/">Anirudh Goyal</a> <a href="/prfsanjeevarora/">Sanjeev Arora</a>
Yangsibo Huang (@yangsibohuang) 's Twitter Profile Photo

LLM safety guardrails can be easily removed through fine-tuning. While defenses have been proposed, our #ICLR2025 paper shows flawed evaluations can create a false sense of security. Check out the thread by Boyi Wei for more details 🧵

Alex Wettig (@_awettig) 's Twitter Profile Photo

🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N

🤔 Ever wondered how prevalent some type of web content is during LM pre-training?

In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐

Key takeaway: domains help us curate better pre-training data! 🧵/N
Peter Henderson (@peterhndrsn) 's Twitter Profile Photo

Preserving alignment during customization & fine-tuning is a challenging problem! Here's another work showing how language models can be broadly misaligned by finetuning. If interested, can also check out work from our group by Luxi (Lucy) He Boyi Wei, Xiangyu Qi, & others!

Preserving alignment during customization &amp; fine-tuning is a challenging problem! Here's another work showing how language models can be broadly misaligned by finetuning. If interested, can also check out work from our group by <a href="/LuxiHeLucy/">Luxi (Lucy) He</a> <a href="/wei_boyi/">Boyi Wei</a>, <a href="/xiangyuqi_pton/">Xiangyu Qi</a>, &amp; others!
Peter Henderson (@peterhndrsn) 's Twitter Profile Photo

Very excited that our work, "Safety Alignment Should be Made More Than Just a Few Tokens Deep" was recognized for an Outstanding Paper Award at #ICLR2025! We hope this is a step forward in improving and understanding robustness of language model alignment. It was great working