LLM Security (@llm_sec) 's Twitter Profile
LLM Security

@llm_sec

Research, papers, jobs, and news on large language model security.

Got something relevant? DM / tag @llm_sec

ID: 1649129451815596032

linkhttp://llmsec.net calendar_today20-04-2023 19:14:47

825 Tweet

9,9K Takipรงi

296 Takip Edilen

Sizhe Chen (@_sizhe_chen_) 's Twitter Profile Photo

Safety comes first to deploying LLMs in applications like agents. For richer opportunities of LLMs, we mitigate prompt injections, the #1 security threat by OWASP, via Structured Queries (StruQ). Preserving utility, StruQ discourages all existing prompt injections to an ASR <2%.

Safety comes first to deploying LLMs in applications like agents. For richer opportunities of LLMs, we mitigate prompt injections, the #1 security threat by OWASP, via Structured Queries (StruQ). Preserving utility, StruQ discourages all existing prompt injections to an ASR &lt;2%.
LLM Security (@llm_sec) 's Twitter Profile Photo

Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis ๐ŸŒถ๏ธ "Our study evaluates prominent scanners - Garak, Giskard, PyRIT, and CyberSecEval - that adapt red-teaming practices to expose these vulnerabilities. We detail the distinctive features

Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis ๐ŸŒถ๏ธ

"Our study evaluates prominent scanners - Garak, Giskard, PyRIT, and CyberSecEval - that adapt red-teaming practices to expose these vulnerabilities. We detail the distinctive features
Nanna Inie (@nannainie) 's Twitter Profile Photo

unpopular opinion: maybe let insecure be insecure and worry about the downstream effects on end users instead of protecting the companies that bake it into their own software.

LLM Security (@llm_sec) 's Twitter Profile Photo

Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge "This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information." "for unlearning methods with utility constraints, the

Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge

"This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information."
"for unlearning methods with utility constraints, the
LLM Security (@llm_sec) 's Twitter Profile Photo

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents "To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. We find (1) leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking, (2) simple universal

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

"To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. We find (1) leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking, (2) simple universal
LLM Security (@llm_sec) 's Twitter Profile Photo

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models (-- look at that perf/latency pareto frontier. game on!) "State-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). We propose

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models

(-- look at that perf/latency pareto frontier. game on!)

"State-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). We propose
LLM Security (@llm_sec) 's Twitter Profile Photo

LLMmap: Fingerprinting For Large Language Models "With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM

LLMmap: Fingerprinting For Large Language Models

"With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM
LLM Security (@llm_sec) 's Twitter Profile Photo

Automated Red Teaming with GOAT: the Generative Offensive Agent Tester "we introduce the Generative Offensive Agent Tester (GOAT), an automated agentic red teaming system that simulates plain language adversarial conversations while leveraging multiple adversarial prompting

LLM Security (@llm_sec) 's Twitter Profile Photo

ChatTL;DR โ€“ You Really Ought to Check What the LLM Said on Your Behalf ๐ŸŒถ๏ธ "assuming that in the near term itโ€™s just not machines talking to machines all the way down, how do we get people to check the output of LLMs before they copy and paste it to friends, colleagues, course

ChatTL;DR โ€“ You Really Ought to Check What the LLM Said on Your Behalf ๐ŸŒถ๏ธ

"assuming that in the near term itโ€™s just not machines talking to machines all the way down, how do we get people to check the output of LLMs before they copy and paste it to friends, colleagues, course
LLM Security (@llm_sec) 's Twitter Profile Photo

Gritty Pixy "We leverage the sensitivity of existing QR code readers and stretch them to their detection limit. This is not difficult to craft very elaborated prompts and to inject them into QR codes. What is difficult is to make them inconspicuous as we do here with Gritty

Gritty Pixy

"We leverage the sensitivity of existing QR code readers and stretch them to their detection limit. This is not difficult to craft very elaborated prompts and to inject them into QR codes. What is difficult is to make them inconspicuous as we do here with Gritty
Leon Derczynski โœ๐Ÿป ๐Ÿ‚๐Ÿ (@leonderczynski) 's Twitter Profile Photo

First keynote at LLMSEC 2025, ACL: "A Bunch of Garbage and Hoping: LLMs, Agentic Security, and Where We Go From Here" Erick Galinkin Friday 09.05 Hall B Details: sig.llmsecurity.net/workshop/ - #ACL2025NLP

First keynote at LLMSEC 2025, ACL:

"A Bunch of Garbage and Hoping: LLMs, Agentic Security, and Where We Go From Here" Erick Galinkin

Friday 09.05 Hall B

Details: sig.llmsecurity.net/workshop/ - #ACL2025NLP
Leon Derczynski โœ๐Ÿป ๐Ÿ‚๐Ÿ (@leonderczynski) 's Twitter Profile Photo

Come to LLMSEC at ACL & hear Niloofar's keynote "What does it mean for agentic AI to preserve privacy?" - Niloofar (โœˆ๏ธ ACL), Meta/CMU (Friday 1st Aug, 11.00; Austria Center Vienna Hall B) See you there! #acl2025 #acl2025nlp

Come to LLMSEC at ACL &amp; hear Niloofar's keynote

"What does it mean for agentic AI to preserve privacy?" - <a href="/niloofar_mire/">Niloofar (โœˆ๏ธ ACL)</a>, Meta/CMU

(Friday 1st Aug, 11.00; Austria Center Vienna Hall B)

See you there!

 #acl2025 #acl2025nlp
Leon Derczynski โœ๐Ÿป ๐Ÿ‚๐Ÿ (@leonderczynski) 's Twitter Profile Photo

At ACL in Vienna? Hear the world's leading prompt injector talk at LLMSEC on Friday! Johann Rehberger Johann Rehberger will be presenting the afternoon keynote at 14.00 in Hall B > sig.llmsecurity.net/workshop/ #ACL2025NLP #ACL2025

At ACL in Vienna? Hear the world's leading prompt injector talk at LLMSEC on Friday! 

Johann Rehberger <a href="/wunderwuzzi23/">Johann Rehberger</a> will be presenting the afternoon keynote at 14.00 in Hall B

&gt; sig.llmsecurity.net/workshop/

#ACL2025NLP #ACL2025
Hannah Rose Kirk (@hannahrosekirk) 's Twitter Profile Photo

Listen up all talented early-stage researchers! ๐Ÿ‘‚๐Ÿค– We're hiring for a 6-month residency in my team at AI Security Institute to assist cutting-edge research on how frontier AI influences humans! It's an exciting & well-paid role for MSc/PhD students in ML/AI/Psych/CogSci/CompSci ๐Ÿงต