Yoonsang Lee (@yoonsang_) 's Twitter Profile
Yoonsang Lee

@yoonsang_

Incoming CS PhD @princeton_nlp @princetonPLI; prev @SeoulNatlUni

ID: 1617847185990905856

linkhttps://yoonsanglee.com calendar_today24-01-2023 11:30:10

74 Tweet

196 Followers

549 Following

Michael Zhang (@mjqzhang) 's Twitter Profile Photo

Why and when do preference annotators disagree? And how do reward models + LLM-as-Judge evaluators handle disagreements? We explore both these questions in a ✨new preprint✨ from my Ai2 internship! [1/6]

Why and when do preference annotators disagree? And how do reward models + LLM-as-Judge evaluators handle disagreements?

We explore both these questions in a ✨new preprint✨ from my <a href="/allen_ai/">Ai2</a> internship!

[1/6]
Jaemin Cho (on faculty job market) (@jmin__cho) 's Twitter Profile Photo

Check out M3DocRAG -- multimodal RAG for question answering on Multi-Modal & Multi-Page & Multi-Documents (+ a new open-domain benchmark + strong results on 3 benchmarks)! ⚡️Key Highlights: ➡️ M3DocRAG flexibly accommodates various settings: - closed & open-domain document

Check out M3DocRAG -- multimodal RAG for question answering on Multi-Modal &amp; Multi-Page &amp; Multi-Documents (+ a new open-domain benchmark + strong results on 3 benchmarks)!

⚡️Key Highlights:

➡️ M3DocRAG flexibly accommodates various settings:
- closed &amp; open-domain document
Seoyoung Kim (@seoy_kim) 's Twitter Profile Photo

I'll be presenting this work at #CSCW2024! Is the Same Performance Really the Same?: Understanding How Listeners Perceive ASR Results Differently According to the Speaker's Accent Nov 12th 11:00 – 12:30 Session 2d (Room: Central 1) I'm also on the job market 😁

Akari Asai (@akariasai) 's Twitter Profile Photo

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚 UW NLP Ai2 With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts. Try out our demo! We also introduce ꜱᴄʜᴏʟᴀʀQᴀʙᴇɴᴄʜ,

Xi Ye (@xiye_nlp) 's Twitter Profile Photo

🔔 I'm recruiting multiple fully funded MSc/PhD students University of Alberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!

Jaemin Cho (on faculty job market) (@jmin__cho) 's Twitter Profile Photo

🚨 I’m on the 2024-2025 academic job market! j-min.io I work on ✨ Multimodal AI ✨, with a special focus on enhancing reasoning in both understanding and generation tasks by: 1⃣Making it more scalable 2⃣Making it more faithful 3⃣Evaluating and refining multimodal

🚨 I’m on the 2024-2025 academic job market!
j-min.io

I work on ✨ Multimodal AI ✨, with a special focus on enhancing reasoning in both understanding and generation tasks by:
1⃣Making it more scalable
2⃣Making it more faithful
3⃣Evaluating and refining multimodal
Yoonsang Lee (@yoonsang_) 's Twitter Profile Photo

Heading to #NeurIPS2024 ✈️ Excited to present my work at the ENLSP workshop on Saturday, 12/14! I'm also applying for PhD programs this cycle! Please feel free to DM or chat about potential opportunities😊!!

Tianyu Gao (@gaotianyu1350) 's Twitter Profile Photo

Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by simply prepending source URLs to training documents. arxiv.org/abs/2501.01956

Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by simply prepending source URLs to training documents.

arxiv.org/abs/2501.01956
Xi Ye (@xiye_nlp) 's Twitter Profile Photo

🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem? 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize

🤔Now most LLMs have &gt;= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem?
🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize
Yijia Shao (@echoshao8899) 's Twitter Profile Photo

LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmation or my deep thinking.

Jinu Lee @ NAACL25 (@jinulee_v) 's Twitter Profile Photo

I am happy to announce that my first-author paper is accepted to NAACL 2025 Main! Existing backward chaining (top-down reasoning) methods are incomplete, leading to suboptimal performance. We build SymBa, a complete neuro-symbolic backward chaining method using SLD-Resolution.

I am happy to announce that my first-author paper is accepted to NAACL 2025 Main!

Existing backward chaining (top-down reasoning) methods are incomplete, leading to suboptimal performance. We build SymBa, a complete neuro-symbolic backward chaining method using SLD-Resolution.
Alex Wettig (@_awettig) 's Twitter Profile Photo

🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N

🤔 Ever wondered how prevalent some type of web content is during LM pre-training?

In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐

Key takeaway: domains help us curate better pre-training data! 🧵/N
Jessy Li (@jessyjli) 's Twitter Profile Photo

🌟Job ad🌟 We (Greg Durrett, Matt Lease and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!

Nathan Lambert (@natolambert) 's Twitter Profile Photo

First 11 chapters of RLHF Book have v0 draft done. Should be quick useful now. Next: * Crafting more blog content into future topics, * DPO+ chapter, * Meeting with publishers to get wheels turning on physical copies, * Cleaning & cohesiveness

First 11 chapters of RLHF Book have v0 draft done. Should be quick useful now.

Next:
* Crafting more blog content into future topics,
* DPO+ chapter,
* Meeting with publishers to get wheels turning on physical copies,
* Cleaning &amp; cohesiveness
Fangyuan Xu (@brunchavecmoi) 's Twitter Profile Photo

Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade rapidly in this setting. We present 𝐑𝐞𝐟𝐫𝐞𝐬𝐡𝐊𝐕, an inference method which ♻️ refreshes the smaller KV cache, which better preserves performance.

Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade rapidly in this setting. We present 𝐑𝐞𝐟𝐫𝐞𝐬𝐡𝐊𝐕, an inference method which ♻️ refreshes the smaller KV cache, which better preserves performance.
Victor Wang (@victorwang37) 's Twitter Profile Photo

LLM judges have become ubiquitous, but valuable signal is often ignored at inference. We analyze design decisions for leveraging judgment distributions from LLM-as-a-judge: 🧵 w/ Michael Zhang Eunsol Choi!

LLM judges have become ubiquitous, but valuable signal is often ignored at inference.

We analyze design decisions for leveraging judgment distributions from LLM-as-a-judge: 🧵

w/ <a href="/mjqzhang/">Michael Zhang</a> <a href="/eunsolc/">Eunsol Choi</a>!
Anuj Diwan (@anuj_diwan) 's Twitter Profile Photo

Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models! Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.

Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models!
Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.
Yixiao Song (@yixiao_song) 's Twitter Profile Photo

Introducing 🐻 BEARCUBS 🐻, a “small but mighty” dataset of 111 QA pairs designed to assess computer-using web agents in multimodal interactions on the live web! ✅ Humans achieve 85% accuracy ❌ OpenAI Operator: 24% ❌ Anthropic Computer Use: 14% ❌ Convergence AI Proxy: 13%

Introducing 🐻 BEARCUBS 🐻, a “small but mighty” dataset of 111 QA pairs designed to assess computer-using web agents in multimodal interactions on the live web!
✅ Humans achieve 85% accuracy
❌ OpenAI Operator: 24%
❌ Anthropic Computer Use: 14%
❌ Convergence AI Proxy: 13%
Han Wang (@hanwang98) 's Twitter Profile Photo

🚨Real-world retrieval is messy: queries can be ambiguous, or documents may conflict/have incorrect/irrelevant info. How can we jointly address all these problems? We introduce: ➡️ RAMDocs, a challenging dataset with ambiguity, misinformation, and noise. ➡️ MADAM-RAG, a

🚨Real-world retrieval is messy: queries can be ambiguous, or documents may conflict/have incorrect/irrelevant info.
How can we jointly address all these problems?

We introduce:
➡️ RAMDocs, a challenging dataset with ambiguity, misinformation, and noise.
➡️ MADAM-RAG, a
Manya Wadhwa (@manyawadhwa1) 's Twitter Profile Photo

Evaluating language model responses on open-ended tasks is hard! 🤔 We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️. EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇