threlfall (@whitehacksec) 's Twitter Profile
threlfall

@whitehacksec

working at intersection of offensive security, ml & supply chains. sharing @ 5stars217.GitHub.io & wiki.offsecml.com

ID: 2462852076

calendar_today25-04-2014 07:45:32

647 Tweet

483 Followers

387 Following

Greg Wells (@wellsgr) 's Twitter Profile Photo

Massive day at Dreadnode! We built a team and suite of products that combine the best of AI and offensive security. Red teams benefit from AI's power, and AI developers receive the latest attacks and techniques. Proud of this crew!

CackalackyCon (@cackalackycon) 's Twitter Profile Photo

This year we were honored to received more than 80 CFP submissions across a wide range of topics and expert levels. We are so thankful for each submission and are always blown away by the quality of talks proposed. Speakers should hear from us by next week! -sq33k

dreadnode (@dreadnode) 's Twitter Profile Photo

Where AI meets offensive security 🤝 Dreadnode is proud to be an organizer of Offensive AI Con (OAIC), the first conference dedicated to exploring the use of AI in offensive cyber. See you in Oceanside this October? Request an invite at offensiveaicon.com.

threlfall (@whitehacksec) 's Twitter Profile Photo

OAI ajust published a prompting guide for GPT 4.1: "XML performed well in our long context testing." "JSON performed particularly poorly." Anthropic have posted similar instructions consistently too. Anyone know why MCPs call for JSON?

threlfall (@whitehacksec) 's Twitter Profile Photo

I've updated the wiki with some research into agent hacking, the limitations and strengths. Also updated is the prompt injection techniques. Increasingly there is convergence in the techniques, where a successful attack is 3 or more techniques at once. wiki.offsecml.com/Adversarial+ML…

Maxime Rivest 🧙‍♂️🦙 (@maximerivest) 's Twitter Profile Photo

I strongly encourage anybody that ever called one llm programmatically to carve out 1 hr of your time and run through all examples in the 'get started' dspy page. It will click, I promise! Link below. It's right on the homepage. Deceptively short. Very powerful.

dreadnode (@dreadnode) 's Twitter Profile Photo

v3 of Rigging is out now. If you’re working with LLMs to build agents or run evaluations, check it out. We just added: - Prompt caching for supported providers - A unified tool system for function calling and fallbacks to xml/json parsing - Native MCP integration - Lots of

v3 of Rigging is out now. If you’re working with LLMs to build agents or run evaluations, check it out. We just added:

- Prompt caching for supported providers
- A unified tool system for function calling and fallbacks to xml/json parsing
- Native MCP integration
- Lots of
METR (@metr_evals) 's Twitter Profile Photo

In measurements using our set of multi-step software and reasoning tasks, Claude 4 Opus and Sonnet reach 50%-time-horizon point estimates of about 80 and 65 minutes, respectively.

In measurements using our set of multi-step software and reasoning tasks, Claude 4 Opus and Sonnet reach 50%-time-horizon point estimates of about 80 and 65 minutes, respectively.
threlfall (@whitehacksec) 's Twitter Profile Photo

Incalmo enables LLMs to specify Offensive high-level actions through expert agents. In 9 out of 10 networks in MHBench, LLMs using Incalmo achieve at least some of the attack goals. Code is in paper I’m keen to try this vs CAI and will update. arxiv.org/pdf/2501.16466

threlfall (@whitehacksec) 's Twitter Profile Photo

If you haven't been to wiki.offsecml.com in a while, there's a few new things to check out. Namely: -Big improvements in open source hackbots. and the variety of architectures available including collaborative red/blue agents. - Explosion in MCP resources

threlfall (@whitehacksec) 's Twitter Profile Photo

arxiv.org/abs/2501.19012 Important data to keep in mind as attackers, given that AI IDE's re-attempt the install of packages when sandboxed outside the sandbox (w/ user approval). thanks Leon Derczynski ✍🏻 🌞🏠🌲 & co.

Amy Deng (@amydeng_) 's Twitter Profile Photo

I spent the past months investigating: Can we trust reasoning models' CoTs? Researchers showed that LLMs aren't always faithful, but that's not the full story. LLMs are very faithful when the reasoning is complex, and unfaithful CoTs remain monitorable! Check out my latest work🥳