LLM Security (@llm_sec) Twitter Tweets • TwiCopy

Sizhe Chen

a year ago

Safety comes first to deploying LLMs in applications like agents. For richer opportunities of LLMs, we mitigate prompt injections, the #1 security threat by OWASP, via Structured Queries (StruQ). Preserving utility, StruQ discourages all existing prompt injections to an ASR <2%.

thumb_up_off_alt52

chat_bubble_outline3

repeat13

shareShare

LLM Security

@llm_sec

a year ago

Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis 🌶️ "Our study evaluates prominent scanners - Garak, Giskard, PyRIT, and CyberSecEval - that adapt red-teaming practices to expose these vulnerabilities. We detail the distinctive features

thumb_up_off_alt51

chat_bubble_outline1

repeat11

shareShare

Nanna Inie

@nannainie

a year ago

unpopular opinion: maybe let insecure be insecure and worry about the downstream effects on end users instead of protecting the companies that bake it into their own software.

thumb_up_off_alt5

chat_bubble_outline1

repeat2

shareShare

LLM Security

@llm_sec

a year ago

Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge "This paper reveals that applying quantization to models that have undergone unlearning can restore the "forgotten" information." "for unlearning methods with utility constraints, the

thumb_up_off_alt169

chat_bubble_outline3

repeat36

shareShare

LLM Security

@llm_sec

a year ago

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents "To facilitate research on LLM agent misuse, we propose a new benchmark called AgentHarm. We find (1) leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking, (2) simple universal

thumb_up_off_alt82

chat_bubble_outline4

repeat26

shareShare

LLM Security

@llm_sec

a year ago

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models (-- look at that perf/latency pareto frontier. game on!) "State-of-the-art models suffer from over-defense issues, with accuracy dropping close to random guessing levels (60%). We propose

thumb_up_off_alt35

chat_bubble_outline2

repeat5

shareShare

LLM Security

@llm_sec

a year ago

LLMmap: Fingerprinting For Large Language Models "With as few as 8 interactions, LLMmap can accurately identify 42 different LLM versions with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLM

thumb_up_off_alt95

chat_bubble_outline0

repeat26

shareShare

LLM Security

@llm_sec

a year ago

Automated Red Teaming with GOAT: the Generative Offensive Agent Tester "we introduce the Generative Offensive Agent Tester (GOAT), an automated agentic red teaming system that simulates plain language adversarial conversations while leveraging multiple adversarial prompting

thumb_up_off_alt35

chat_bubble_outline0

repeat4

shareShare

LLM Security

@llm_sec

a year ago

ChatTL;DR – You Really Ought to Check What the LLM Said on Your Behalf 🌶️ "assuming that in the near term it’s just not machines talking to machines all the way down, how do we get people to check the output of LLMs before they copy and paste it to friends, colleagues, course

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

garak: LLM vulnerability scanner

@garak_llm

a year ago

garak has moved to NVIDIA! New repo link: github.com/NVIDIA/garak

thumb_up_off_alt210

chat_bubble_outline3

repeat39

shareShare

LLM Security

@llm_sec

a year ago

Gritty Pixy "We leverage the sensitivity of existing QR code readers and stretch them to their detection limit. This is not difficult to craft very elaborated prompts and to inject them into QR codes. What is difficult is to make them inconspicuous as we do here with Gritty

thumb_up_off_alt28

chat_bubble_outline1

repeat3

shareShare

Leon Derczynski ✍🏻 🍂🍏

@leonderczynski

8 months ago

Call for papers: LLMSEC 2025 Deadline 15 April, held w/ ACL 2025 in Vienna Formats: long/short/war stories More: >> sig.llmsecurity.net/workshop/

thumb_up_off_alt11

chat_bubble_outline0

repeat4

shareShare

Leon Derczynski ✍🏻 🍂🍏

@leonderczynski

4 months ago

First keynote at LLMSEC 2025, ACL: "A Bunch of Garbage and Hoping: LLMs, Agentic Security, and Where We Go From Here" Erick Galinkin Friday 09.05 Hall B Details: sig.llmsecurity.net/workshop/ - #ACL2025NLP

thumb_up_off_alt27

chat_bubble_outline0

repeat4

shareShare

Leon Derczynski ✍🏻 🍂🍏

@leonderczynski

4 months ago

Come to LLMSEC at ACL & hear Niloofar's keynote "What does it mean for agentic AI to preserve privacy?" - Niloofar (✈️ ACL), Meta/CMU (Friday 1st Aug, 11.00; Austria Center Vienna Hall B) See you there! #acl2025 #acl2025nlp

Come to LLMSEC at ACL & hear Niloofar's keynote

"What does it mean for agentic AI to preserve privacy?" - <a href="/niloofar_mire/">Niloofar (✈️ ACL)</a>, Meta/CMU

(Friday 1st Aug, 11.00; Austria Center Vienna Hall B)

See you there!

#acl2025 #acl2025nlp

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

Leon Derczynski ✍🏻 🍂🍏

@leonderczynski

4 months ago

At ACL in Vienna? Hear the world's leading prompt injector talk at LLMSEC on Friday! Johann Rehberger Johann Rehberger will be presenting the afternoon keynote at 14.00 in Hall B > sig.llmsecurity.net/workshop/ #ACL2025NLP #ACL2025

At ACL in Vienna? Hear the world's leading prompt injector talk at LLMSEC on Friday!

Johann Rehberger <a href="/wunderwuzzi23/">Johann Rehberger</a> will be presenting the afternoon keynote at 14.00 in Hall B

> sig.llmsecurity.net/workshop/

#ACL2025NLP #ACL2025

thumb_up_off_alt14

chat_bubble_outline0

repeat3

shareShare

Leon Derczynski ✍🏻 🍂🍏

@leonderczynski

4 months ago

LLMSEC proceedings are up! sig.llmsecurity.net/proceedings.pdf (Anthology is processing) #ACL2025NLP

thumb_up_off_alt24

chat_bubble_outline1

repeat6

shareShare

LLM Security

@llm_sec

4 months ago

Senior Security Architect - AI and ML @ NVIDIA nvidia.wd5.myworkdayjobs.com/en-US/NVIDIAEx…

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Hannah Rose Kirk

@hannahrosekirk

4 months ago

Listen up all talented early-stage researchers! 👂🤖 We're hiring for a 6-month residency in my team at AI Security Institute to assist cutting-edge research on how frontier AI influences humans! It's an exciting & well-paid role for MSc/PhD students in ML/AI/Psych/CogSci/CompSci 🧵

thumb_up_off_alt293

chat_bubble_outline12

repeat33

shareShare