threlfall (@whitehacksec) Twitter Tweets • TwiCopy

Greg Wells

8 months ago

Massive day at Dreadnode! We built a team and suite of products that combine the best of AI and offensive security. Red teams benefit from AI's power, and AI developers receive the latest attacks and techniques. Proud of this crew!

thumb_up_off_alt15

chat_bubble_outline0

repeat2

shareShare

CackalackyCon

@cackalackycon

8 months ago

This year we were honored to received more than 80 CFP submissions across a wide range of topics and expert levels. We are so thankful for each submission and are always blown away by the quality of talks proposed. Speakers should hear from us by next week! -sq33k

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

dreadnode

@dreadnode

8 months ago

Where AI meets offensive security 🤝 Dreadnode is proud to be an organizer of Offensive AI Con (OAIC), the first conference dedicated to exploring the use of AI in offensive cyber. See you in Oceanside this October? Request an invite at offensiveaicon.com.

thumb_up_off_alt27

chat_bubble_outline1

repeat7

shareShare

threlfall

@whitehacksec

7 months ago

OAI ajust published a prompting guide for GPT 4.1: "XML performed well in our long context testing." "JSON performed particularly poorly." Anthropic have posted similar instructions consistently too. Anyone know why MCPs call for JSON?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Casey Handmer, PhD

@cjhandmer

6 months ago

caseyhandmer.wordpress.com/2025/04/25/why…

thumb_up_off_alt136

chat_bubble_outline8

repeat3

shareShare

threlfall

@whitehacksec

6 months ago

I've updated the wiki with some research into agent hacking, the limitations and strengths. Also updated is the prompt injection techniques. Increasingly there is convergence in the techniques, where a successful attack is 3 or more techniques at once. wiki.offsecml.com/Adversarial+ML…

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Maxime Rivest 🧙‍♂️🦙

@maximerivest

6 months ago

I strongly encourage anybody that ever called one llm programmatically to carve out 1 hr of your time and run through all examples in the 'get started' dspy page. It will click, I promise! Link below. It's right on the homepage. Deceptively short. Very powerful.

thumb_up_off_alt168

chat_bubble_outline3

repeat11

shareShare

dreadnode

@dreadnode

6 months ago

v3 of Rigging is out now. If you’re working with LLMs to build agents or run evaluations, check it out. We just added: - Prompt caching for supported providers - A unified tool system for function calling and fallbacks to xml/json parsing - Native MCP integration - Lots of

thumb_up_off_alt29

chat_bubble_outline4

repeat11

shareShare

Rico Angell

@rico_angell

4 months ago

What causes jailbreaks to transfer between LLMs? We find that jailbreak strength and model representation similarity predict transferability, and we can engineer model similarity to improve transfer. Details in🧵

thumb_up_off_alt51

chat_bubble_outline3

repeat11

shareShare

METR

@metr_evals

4 months ago

In measurements using our set of multi-step software and reasoning tasks, Claude 4 Opus and Sonnet reach 50%-time-horizon point estimates of about 80 and 65 minutes, respectively.

thumb_up_off_alt275

chat_bubble_outline8

repeat35

shareShare

threlfall

@whitehacksec

4 months ago

Incalmo enables LLMs to specify Offensive high-level actions through expert agents. In 9 out of 10 networks in MHBench, LLMs using Incalmo achieve at least some of the attack goals. Code is in paper I’m keen to try this vs CAI and will update. arxiv.org/pdf/2501.16466

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

threlfall

@whitehacksec

4 months ago

If you haven't been to wiki.offsecml.com in a while, there's a few new things to check out. Namely: -Big improvements in open source hackbots. and the variety of architectures available including collaborative red/blue agents. - Explosion in MCP resources

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

threlfall

@whitehacksec

4 months ago

Not really loving these AI email summaries lol

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

threlfall

@whitehacksec

3 months ago

your code gets merged because its good mine gets merged so my mistakes are on the permanent record

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

threlfall

@whitehacksec

3 months ago

arxiv.org/abs/2501.19012 Important data to keep in mind as attackers, given that AI IDE's re-attempt the install of packages when sandboxed outside the sandbox (w/ user approval). thanks Leon Derczynski ✍🏻 🌞🏠🌲 & co.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Amy Deng

@amydeng_

3 months ago

I spent the past months investigating: Can we trust reasoning models' CoTs? Researchers showed that LLMs aren't always faithful, but that's not the full story. LLMs are very faithful when the reasoning is complex, and unfaithful CoTs remain monitorable! Check out my latest work🥳

thumb_up_off_alt49

chat_bubble_outline1

repeat4

shareShare