DukeNLP (@duke_nlp) 's Twitter Profile
DukeNLP

@duke_nlp

Natural Language Processing at Duke University.

ID: 1308768060187213824

linkhttps://www.cs.duke.edu/research/artificialintelligence#nlp calendar_today23-09-2020 14:00:04

29 Tweet

1,1K Followers

540 Following

Sam Wiseman (@_samwiseman) 's Twitter Profile Photo

Newish #EMNLP2021 work w/ Arturs Backurs & Karl Stratos: we try to generate text (in a data-to-text setting) by splicing together pieces of retrieved neighbor text. Paper: arxiv.org/pdf/2101.08248… 1/3

Newish #EMNLP2021 work w/ Arturs Backurs & Karl Stratos: we try to generate text (in a data-to-text setting) by splicing together pieces of retrieved neighbor text.

Paper: arxiv.org/pdf/2101.08248…

1/3
DukeNLP (@duke_nlp) 's Twitter Profile Photo

The DukeNLP group is hiring PhD students in all areas of natural language processing! Apply at gradschool.duke.edu/admissions/app… by Dec 15 to work with Sam Wiseman or Bhuwan Dhingra.

Mohit Bansal (@mohitban47) 's Twitter Profile Photo

See a glimpseπŸ‘‡of how beautiful @unc +research triangle fall colors are 😍 Come join our awesome group of UNC NLP UNC Computer Science students+staff+faculty (& great neighbors eg. DukeNLP). We are hiring at all levels (phd, postdocs, faculty); feel free to ping any of us with questions πŸ™

Bhuwan Dhingra (@bhuwandhingra) 's Twitter Profile Photo

πŸ€” When does a factoid question need a *long* answer? πŸ€– "Long" could mean multiple things: either you ask for a city with a very long name or … Read Ivan Stelmakh's internship paper to get the second part of the answer! arxiv.org/abs/2204.06092

Phyllis Ang (@phyllis_ang_) 's Twitter Profile Photo

Increasing the input length often increases accuracy on NLP tasks like summarization. But given limited time and a fixed number of GPUs, is it better to increase model size or input sequence length? Find the answer in our latest work: arxiv.org/abs/2204.07288 1/3

Bhuwan Dhingra (@bhuwandhingra) 's Twitter Profile Photo

New Preprint from Yukun Huang! Can an LLM determine when its responses are incorrect? Our latest paper dives into "Calibrating long-form generations from an LLM". Discover more at arxiv.org/abs/2402.06544 (1/n)

New Preprint from <a href="/YukunHuang9/">Yukun Huang</a>!

Can an LLM determine when its responses are incorrect? Our latest paper dives into "Calibrating long-form generations from an LLM". Discover more at arxiv.org/abs/2402.06544 (1/n)
Jinho Choi (@jinho_d_choi) 's Twitter Profile Photo

150+ people registered for the SouthNLP 2024 at Emory University on 4/5. The schedule is available on our website: southnlp.github.io/southnlp2024/ Registration is open until March 10th. If you plan to attend, please register by completing the form here: forms.gle/NBWrgtgM5KgUq3…

150+ people registered for the SouthNLP 2024 at Emory University on 4/5. The schedule is available on our website: southnlp.github.io/southnlp2024/

Registration is open until March 10th. If you plan to attend, please register by completing the form here: forms.gle/NBWrgtgM5KgUq3…
Bhuwan Dhingra (@bhuwandhingra) 's Twitter Profile Photo

🧐 Can we generate *LLM-proof* math problems❓ πŸ‘‰Check out the new preprint from @ruoyuxyz , Chengxuan Huang and Junlin Wang : arxiv.org/abs/2402.17916 #LLMs #NLProc 🧡(1/6)

🧐 Can we generate *LLM-proof* math problems❓

πŸ‘‰Check out the new preprint from @ruoyuxyz , Chengxuan Huang and <a href="/JunlinWang3/">Junlin Wang</a> : arxiv.org/abs/2402.17916 #LLMs #NLProc

🧡(1/6)
Junlin Wang (@junlinwang3) 's Twitter Profile Photo

🦝Excited to announce our work on robustness & security of LLM systems! π‘πšπœπœπ¨π¨π§: Prompt Extraction Benchmark of LLM-Integrated Applications Prompt extraction from LLM-integrated apps like GPT-s is a critical security concern. ‼️

🦝Excited to announce our work on robustness &amp; security of LLM systems! π‘πšπœπœπ¨π¨π§: Prompt Extraction Benchmark of LLM-Integrated Applications

Prompt extraction from LLM-integrated apps like GPT-s is a critical security concern. ‼️
Roy Xie (@royxie_) 's Twitter Profile Photo

🚨 Breaking: >90% AUC on the WikiMIA dataset for membership inference! Want to know if your data is in LLM's training set?πŸ” Check out our latest work "ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods" ✨ royxie.com/recall-project… 🧡1/6

🚨 Breaking: &gt;90% AUC on the WikiMIA dataset for membership inference!

Want to know if your data is in LLM's training set?πŸ” 
Check out our latest work "ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods" ✨
royxie.com/recall-project…

🧡1/6
Ghazal Khalighinejad (@ghazalkhn) 's Twitter Profile Photo

πŸŽ‰ Excited to share that IsoBench has been accepted at Conference on Language Modeling! IsoBench features isomorphic inputs across Math/Graph problems, Chess games, and Physics/Chemistry questions. Check out the dataset here: huggingface.co/datasets/isobe…

Bhuwan Dhingra (@bhuwandhingra) 's Twitter Profile Photo

🧡When should LLMs trust external contexts in RAG? New paper from Yukun Huang and Sanxing Chen enhances LLMs’ *situated faithfulness* to external contexts -- even when they are wrong!πŸ‘‡

🧡When should LLMs trust external contexts in RAG?

New paper from <a href="/YukunHuang9/">Yukun Huang</a> and <a href="/sanxing_chen/">Sanxing Chen</a>  enhances LLMs’ *situated faithfulness* to external contexts -- even when they are wrong!πŸ‘‡
Ghazal Khalighinejad (@ghazalkhn) 's Twitter Profile Photo

πŸ“’ New preprint on a benchmark for multimodal information extraction! Structured data extraction from long documents consisting of interconnected data in text, tables, and figures remains a challenge. MatViX aims to fill this gap. matvix-bench.github.io

πŸ“’ New preprint on a benchmark for multimodal information extraction! 

Structured data extraction from long documents consisting of interconnected data in text, tables, and figures remains a challenge. MatViX aims to fill this gap.

matvix-bench.github.io
Bhuwan Dhingra (@bhuwandhingra) 's Twitter Profile Photo

**New paper from Roy Xie ** Do LLMs know when they have read enough to answer a question? We show how language models can STOP processing input text early without losing accuracy – Why waste 40,000 tokens when 500 suffice? 🧡 πŸ“„ Paper: arxiv.org/abs/2502.01025

**New paper from <a href="/RoyXie_/">Roy Xie</a> **

Do LLMs know when they have read enough to answer a question?

We show how language models can STOP processing input text early without losing accuracy – Why waste 40,000 tokens when 500 suffice? 🧡

πŸ“„ Paper: arxiv.org/abs/2502.01025
Junlin Wang (@junlinwang3) 's Twitter Profile Photo

Excited to share work from my Together AI internshipβ€”a deep dive into inference‑time scaling methods 🧠 We rigorously evaluated verifier‑free inference-time scaling methods across both reasoning and non‑reasoning LLMs. Some key findings: πŸ”‘ Even with huge rollout budgets,

Excited to share work from my <a href="/togethercompute/">Together AI</a> internshipβ€”a deep dive into inference‑time scaling methods 🧠

We rigorously evaluated verifier‑free inference-time scaling methods across both reasoning and non‑reasoning LLMs. Some key findings:

πŸ”‘ Even with huge rollout budgets,
Bhuwan Dhingra (@bhuwandhingra) 's Twitter Profile Photo

πŸ“’ New Preprint from Raghuveer @ NAACL25 on Multimodal Contrastive Learning: Breaking the Batch Barrier (B3) πŸ“’ TL;DR: Smart batch mining based on community detection achieves state of the art on the MMEB benchmark. Preprint: arxiv.org/pdf/2505.11293 Code: github.com/raghavlite/B3

Bhuwan Dhingra (@bhuwandhingra) 's Twitter Profile Photo

Glad to share a new ACL Findings paper from @MaxHolsman and Yukun Huang! We introduce Fuzzy Speculative Decoding (FSD) which extends speculative decoding to allow a tunable exchange of generation quality and inference acceleration. Paper: arxiv.org/abs/2502.20704

Glad to share a new ACL Findings paper from @MaxHolsman and <a href="/YukunHuang9/">Yukun Huang</a>!

We introduce Fuzzy Speculative Decoding (FSD) which extends speculative decoding to allow a tunable exchange of generation quality and inference acceleration.

Paper: arxiv.org/abs/2502.20704
Roy Xie (@royxie_) 's Twitter Profile Photo

Can we train reasoning LLMs to generate answers as they think? Introducing 𝐈𝐧𝐭𝐞𝐫π₯𝐞𝐚𝐯𝐞𝐝 π‘πžπšπ¬π¨π§π’π§π ! We train LLMs to alternate between thinking & answering πŸš€ Reducing Time-to-First-Token (TTFT) by over 80% ⚑AND improving Pass@1 accuracy up to 19.3%!πŸ“ˆ 🧡 1/n

Can we train reasoning LLMs to generate answers as they think?
Introducing 𝐈𝐧𝐭𝐞𝐫π₯𝐞𝐚𝐯𝐞𝐝 π‘πžπšπ¬π¨π§π’π§π ! We train LLMs to  alternate between thinking &amp; answering πŸš€
Reducing Time-to-First-Token (TTFT) by over 80% ⚑AND improving Pass@1 accuracy up to 19.3%!πŸ“ˆ

🧡 1/n