Yoonsang Lee (@yoonsang_) Twitter Tweets • TwiCopy

Michael Zhang

a year ago

Why and when do preference annotators disagree? And how do reward models + LLM-as-Judge evaluators handle disagreements? We explore both these questions in a ✨new preprint✨ from my Ai2 internship! [1/6]

thumb_up_off_alt137

chat_bubble_outline1

repeat32

shareShare

Jaemin Cho (on faculty job market)

@jmin__cho

a year ago

Check out M3DocRAG -- multimodal RAG for question answering on Multi-Modal & Multi-Page & Multi-Documents (+ a new open-domain benchmark + strong results on 3 benchmarks)! ⚡️Key Highlights: ➡️ M3DocRAG flexibly accommodates various settings: - closed & open-domain document

thumb_up_off_alt319

chat_bubble_outline5

repeat90

shareShare

Seoyoung Kim

@seoy_kim

a year ago

I'll be presenting this work at #CSCW2024! Is the Same Performance Really the Same?: Understanding How Listeners Perceive ASR Results Differently According to the Speaker's Accent Nov 12th 11:00 – 12:30 Session 2d (Room: Central 1) I'm also on the job market 😁

thumb_up_off_alt38

chat_bubble_outline1

repeat13

shareShare

Akari Asai

@akariasai

a year ago

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚 UW NLP Ai2 With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts. Try out our demo! We also introduce ꜱᴄʜᴏʟᴀʀQᴀʙᴇɴᴄʜ,

thumb_up_off_alt1,1K

chat_bubble_outline33

repeat281

shareShare

Xi Ye

@xiye_nlp

a year ago

🔔 I'm recruiting multiple fully funded MSc/PhD students University of Alberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!

thumb_up_off_alt526

chat_bubble_outline16

repeat158

shareShare

Jaemin Cho (on faculty job market)

@jmin__cho

a year ago

🚨 I’m on the 2024-2025 academic job market! j-min.io I work on ✨ Multimodal AI ✨, with a special focus on enhancing reasoning in both understanding and generation tasks by: 1⃣Making it more scalable 2⃣Making it more faithful 3⃣Evaluating and refining multimodal

thumb_up_off_alt218

chat_bubble_outline6

repeat43

shareShare

Yoonsang Lee

@yoonsang_

a year ago

Heading to #NeurIPS2024 ✈️ Excited to present my work at the ENLSP workshop on Saturday, 12/14! I'm also applying for PhD programs this cycle! Please feel free to DM or chat about potential opportunities😊!!

thumb_up_off_alt20

chat_bubble_outline0

repeat2

shareShare

Tianyu Gao

@gaotianyu1350

a year ago

Introducing MeCo (metadata conditioning then cooldown), a remarkably simple method that accelerates LM pre-training by simply prepending source URLs to training documents. arxiv.org/abs/2501.01956

thumb_up_off_alt197

chat_bubble_outline4

repeat43

shareShare

Xi Ye

@xiye_nlp

a year ago

🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem？ 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize

thumb_up_off_alt209

chat_bubble_outline3

repeat41

shareShare

Yijia Shao

@echoshao8899

a year ago

LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmation or my deep thinking.

thumb_up_off_alt189

chat_bubble_outline17

repeat92

shareShare

Jinu Lee @ NAACL25

@jinulee_v

10 months ago

I am happy to announce that my first-author paper is accepted to NAACL 2025 Main! Existing backward chaining (top-down reasoning) methods are incomplete, leading to suboptimal performance. We build SymBa, a complete neuro-symbolic backward chaining method using SLD-Resolution.

thumb_up_off_alt71

chat_bubble_outline2

repeat8

shareShare

Alex Wettig

@_awettig

10 months ago

🤔 Ever wondered how prevalent some type of web content is during LM pre-training? In our new paper, we propose WebOrganizer which *constructs domains* based on the topic and format of CommonCrawl web pages 🌐 Key takeaway: domains help us curate better pre-training data! 🧵/N

thumb_up_off_alt195

chat_bubble_outline5

repeat48

shareShare

Jessy Li

@jessyjli

9 months ago

🌟Job ad🌟 We (Greg Durrett, Matt Lease and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!

thumb_up_off_alt71

chat_bubble_outline1

repeat23

shareShare

Nathan Lambert

@natolambert

9 months ago

First 11 chapters of RLHF Book have v0 draft done. Should be quick useful now. Next: * Crafting more blog content into future topics, * DPO+ chapter, * Meeting with publishers to get wheels turning on physical copies, * Cleaning & cohesiveness

thumb_up_off_alt346

chat_bubble_outline6

repeat48

shareShare

Fangyuan Xu

@brunchavecmoi

9 months ago

Can we generate long text from compressed KV cache? We find existing KV cache compression methods (e.g., SnapKV) degrade rapidly in this setting. We present 𝐑𝐞𝐟𝐫𝐞𝐬𝐡𝐊𝐕, an inference method which ♻️ refreshes the smaller KV cache, which better preserves performance.

thumb_up_off_alt110

chat_bubble_outline1

repeat27

shareShare

Victor Wang

@victorwang37

9 months ago

LLM judges have become ubiquitous, but valuable signal is often ignored at inference. We analyze design decisions for leveraging judgment distributions from LLM-as-a-judge: 🧵 w/ Michael Zhang Eunsol Choi!

thumb_up_off_alt85

chat_bubble_outline5

repeat22

shareShare

Anuj Diwan

@anuj_diwan

9 months ago

Introducing ParaSpeechCaps, our large-scale style captions dataset that enables rich, expressive control for text-to-speech models! Beyond basic pitch or speed controls, our models can generate speech that sounds "guttural", "scared", "whispered" and more; 59 style tags in total.

thumb_up_off_alt71

chat_bubble_outline3

repeat17

shareShare

Yixiao Song

@yixiao_song

9 months ago

Introducing 🐻 BEARCUBS 🐻, a “small but mighty” dataset of 111 QA pairs designed to assess computer-using web agents in multimodal interactions on the live web! ✅ Humans achieve 85% accuracy ❌ OpenAI Operator: 24% ❌ Anthropic Computer Use: 14% ❌ Convergence AI Proxy: 13%

thumb_up_off_alt53

chat_bubble_outline1

repeat19

shareShare

Han Wang

@hanwang98

8 months ago

🚨Real-world retrieval is messy: queries can be ambiguous, or documents may conflict/have incorrect/irrelevant info. How can we jointly address all these problems? We introduce: ➡️ RAMDocs, a challenging dataset with ambiguity, misinformation, and noise. ➡️ MADAM-RAG, a

thumb_up_off_alt56

chat_bubble_outline2

repeat29

shareShare

Manya Wadhwa

@manyawadhwa1

7 months ago

Evaluating language model responses on open-ended tasks is hard! 🤔 We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️. EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

thumb_up_off_alt116

chat_bubble_outline4

repeat33

shareShare