Qin Liu (@qinliu_nlp) 's Twitter Profile
Qin Liu

@qinliu_nlp

PhD student @UC_Davis | MS & BA @FudanUni | AI safety and Trustworthy LLMs

ID: 4536349159

linkhttps://qinliu9.github.io calendar_today12-12-2015 08:39:00

48 Tweet

107 Takipรงi

311 Takip Edilen

Qin Liu (@qinliu_nlp) 's Twitter Profile Photo

๐ŸŒŸ Check out our latest comprehensive survey on: ๐ŸŒŸ โš ๏ธEmergent backdoor threats to LLMs ๐Ÿ‘ปSafety challenges to LLMs ๐Ÿ’กFuture research directions in this area Invited paper at 60th Annual Allerton Conference: ieeexplore.ieee.org/abstract/documโ€ฆ

๐ŸŒŸ Check out our latest comprehensive survey on: ๐ŸŒŸ
โš ๏ธEmergent backdoor threats to LLMs
๐Ÿ‘ปSafety challenges to LLMs
 ๐Ÿ’กFuture research directions in this area

 Invited paper at 60th Annual Allerton Conference: ieeexplore.ieee.org/abstract/documโ€ฆ
Bowen Jin (@bowenjin13) 's Twitter Profile Photo

๐Ÿš€ Introducing ๐—ฆ๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต-๐—ฅ๐Ÿญ โ€“ the first ๐—ฟ๐—ฒ๐—ฝ๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐˜๐—ถ๐—ผ๐—ป ๐—ผ๐—ณ ๐——๐—ฒ๐—ฒ๐—ฝ๐˜€๐—ฒ๐—ฒ๐—ธ-๐—ฅ๐Ÿญ (๐˜‡๐—ฒ๐—ฟ๐—ผ) for training reasoning and search-augmented LLM agents with reinforcement learning! This is a step towards training an ๐—ผ๐—ฝ๐—ฒ๐—ป-๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—ข๐—ฝ๐—ฒ๐—ป๐—”๐—œ โ€œ๐——๐—ฒ๐—ฒ๐—ฝ

Sheng Zhang (@sheng_zh) 's Twitter Profile Photo

๐Ÿš€ Excited to share MetaScale, our latest work advancing LLM reasoning capabilities! MetaScale empowers GPT-4o to match or even surpass frontier reasoning models like o1, Claude-3.5-Sonnet, and o1-mini on the challenging Arena-Hard benchmark (lmarena.ai). Additionally, MetaScale

๐Ÿš€ Excited to share MetaScale, our latest work advancing LLM reasoning capabilities! MetaScale empowers GPT-4o to match or even surpass frontier reasoning models like o1, Claude-3.5-Sonnet, and o1-mini on the challenging Arena-Hard benchmark (<a href="/lmarena_ai/">lmarena.ai</a>). Additionally, MetaScale
Wenjie Jacky Mo (@wenjie_jacky_mo) 's Twitter Profile Photo

Worried about backdoors in LLMs? ๐ŸŒŸ Check out our #NAACL2025 work on test-time backdoor mitigation! โœ… Black-box ๐Ÿ“ฆ โœ… Plug-and-play ๐Ÿ›ก๏ธ We explore: โ†’ Defensive Demonstrations ๐Ÿงช โ†’ Self-generated Prefixes ๐Ÿงฉ โ†’ Self-refinement โœ๏ธ ๐Ÿ“„ arxiv.org/abs/2311.09763 ๐Ÿงต[1/n]

Worried about backdoors in LLMs?

๐ŸŒŸ Check out our #NAACL2025 work on test-time backdoor mitigation!

โœ… Black-box ๐Ÿ“ฆ
โœ… Plug-and-play ๐Ÿ›ก๏ธ

We explore:
โ†’ Defensive Demonstrations ๐Ÿงช
โ†’ Self-generated Prefixes ๐Ÿงฉ
โ†’ Self-refinement โœ๏ธ

๐Ÿ“„ arxiv.org/abs/2311.09763

๐Ÿงต[1/n]
๐ŸŒดMuhao Chen๐ŸŒด (@muhao_chen) 's Twitter Profile Photo

๐Ÿšจ Call for Papers! ACL 2025 ๐Ÿšจ LLM Security Workshop @ ACL 2025 (the first workshop of ACL SIGSEC) ๐Ÿ” Topics: Adversarial attacks, defenses, vulnerabilities, ethical & legal aspects, safe deployment of LLMs and more ๐Ÿ“… Submission Deadline: April 15, 2025 ๐Ÿ“ August 1, 2025 in

Fei Wang (@fwang_nlp) 's Twitter Profile Photo

๐ŸŽ‰ Excited to share that our paper, "MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding", will be presented at #ICLR2025!โ€‹ ๐Ÿ“… Date: April 24 ๐Ÿ•’ Time: 3:00 PM ๐Ÿ“ Location: Hall 3 + Hall 2B #11 MuirBench challenges multimodal LLMs with diverse multi-image

๐ŸŽ‰ Excited to share that our paper, "MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding", will be presented at #ICLR2025!โ€‹
๐Ÿ“… Date: April 24
๐Ÿ•’ Time: 3:00 PM
๐Ÿ“ Location: Hall 3 + Hall 2B #11
MuirBench challenges multimodal LLMs with diverse multi-image
Hadi Askari (@hadiaskari67) 's Twitter Profile Photo

๐Ÿงต1/ Excited to share our #NAACL2025 work! ๐ŸŽ‰ "Assessing LLMs for Zero-Shot Abstractive Summarization Through the Lens of Relevance Paraphrasing" We study how robust LLM summarization is to our relevance paraphrasing method? ๐Ÿง ๐Ÿ“ More details below:๐Ÿ‘‡ arxiv.org/abs/2406.03993

Xiaofei Wen (@xiaofei_wen_mk) 's Twitter Profile Photo

Can LLM guardrails think twice before deciding? โœจ Check out our #ACL2025 paper: THINKGUARD โ€” a critique-augmented safety guardrail! โœ… Structured critiques โœ… Interpretable decisions โœ… Robust against adversarial prompts ๐Ÿ“‘ arxiv.org/abs/2502.13458 ๐Ÿงต[1/n]

Can LLM guardrails think twice before deciding?

โœจ Check out our #ACL2025 paper: THINKGUARD โ€” a critique-augmented safety guardrail!
โœ… Structured critiques
โœ… Interpretable decisions
โœ… Robust against adversarial prompts

๐Ÿ“‘ arxiv.org/abs/2502.13458
๐Ÿงต[1/n]
Tinghui Zhu (@darthzhu_) 's Twitter Profile Photo

๐Ÿ˜ด Extending modality based on an LLM has been a common practice when we are talking about multimodal LLMs. โ“ Can it generalize to omni-modality? We study the effects of extending modality and ask three questions: arxiv.org/abs/2506.01872 #LLM #MLLM #OmniModality

jakedineenasu (@jakedineenasu) 's Twitter Profile Photo

๐Ÿ” Introducing QA-LIGN: A reflective alignment approach using a draftโ†’reflectionโ†’revision pipeline. We create symbolic reward models that serve as both natural language critics & general reward models, bridging rule-based rewards and RLAIF. ๐Ÿ“„ Paper: arxiv.org/pdf/2506.08123

๐Ÿ” Introducing QA-LIGN: A reflective alignment approach using a draftโ†’reflectionโ†’revision pipeline. We create symbolic reward models that serve as both natural language critics &amp; general reward models, bridging rule-based rewards and RLAIF.

๐Ÿ“„ Paper: arxiv.org/pdf/2506.08123
Wenjie Jacky Mo (@wenjie_jacky_mo) 's Twitter Profile Photo

ACLRollingReview EMNLP 2025 Urgent help needed. acFZ: initial score 3 ๐ŸงŠ Complete silence during discussion. โฐ 4am PST, 9 min before deadline: quietly drops to 2. with โ€œThanks for the rebuttal. I have updated the score.โ€ โš ๏ธ No explanation. No notice. No chance to respond. (0/n)

<a href="/ReviewAcl/">ACLRollingReview</a> <a href="/emnlpmeeting/">EMNLP 2025</a>  Urgent help needed.

acFZ: initial score 3

๐ŸงŠ Complete silence during discussion.
โฐ 4am PST, 9 min before deadline: quietly drops to 2.
with โ€œThanks for the rebuttal. I have updated the score.โ€
โš ๏ธ No explanation. No notice. No chance to respond. 
(0/n)
Tenghao Huang (@tenghaohuang45) 's Twitter Profile Photo

๐ŸŽ‰ Excited to share our ACL 2025 paper: ๐Ÿค–R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory ๐Ÿง  ๐Ÿ“„ Paper: arxiv.org/abs/2501.12485 ๐Ÿ“Poster: Hall 4/5, Session 4 Wednesday, July 30 11:00-12:30 ๐Ÿงต๐Ÿ‘‡

๐ŸŽ‰ Excited to share our ACL 2025 paper:
๐Ÿค–R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory ๐Ÿง 

๐Ÿ“„ Paper: arxiv.org/abs/2501.12485
๐Ÿ“Poster: Hall 4/5, Session 4  Wednesday, July 30 11:00-12:30

๐Ÿงต๐Ÿ‘‡
Dongwon Jung (@dong_w0n) 's Twitter Profile Photo

Excited to share that two of my first-author papers were accepted to #EMNLP2025! โœจ๐Ÿ“š 1๏ธโƒฃ Code Execution as Grounded Supervision for LLM Reasoning (Main) 2๏ธโƒฃ Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation (Findings) Huge thanks to my collaborators๐Ÿ™Œ

jakedineenasu (@jakedineenasu) 's Twitter Profile Photo

Thrilled to share QA-LIGN ๐š๐ญ #EMNLP2025! Bridging rule-based rewards and LLM-as-a-Judge via LLM-derived symbolic reward rubrics. ๐Ÿ”— arxiv.org/pdf/2506.08123