Harry Coppock (@harrycoppock) 's Twitter Profile
Harry Coppock

@harrycoppock

No. 10 Downing Street Innovation Fellow | Research Scientist at AISI | Visiting Lecturer at Imperial College London
Working on AI Evaluation and AI for Medicine

ID: 1370429239711969281

calendar_today12-03-2021 17:39:47

184 Tweet

190 Followers

319 Following

summerfieldlab @summerfieldlab.bsky.social (@summerfieldlab) 's Twitter Profile Photo

If you want to do world class AI research from within government, this is your chance. AISI is building a really strong technical team to work on AI safety - several positions now available: gov.uk/government/new…

Laura Gilbert (@statsarepower) 's Twitter Profile Photo

Are you a data scientist who wants to work for the team ministers have recently called "The best delivery team in government", and "the unsung heroes of Whitehall"? Do you want to make a difference? Apply now! shorturl.at/elrwD

Saffron Huang (@saffronhuang) 's Twitter Profile Photo

The UK AI Safety Institute is hiring more technical staff! I believe AISI is one of the best places to do ML research/eng for the public good. We’re having impact at the scale of government, while moving at the pace of a startup (what more could you ask for?)

The UK AI Safety Institute is hiring more technical staff! I believe AISI is one of the best places to do ML research/eng for the public good. 

We’re having impact at the scale of government, while moving at the pace of a startup (what more could you ask for?)
Jobie Budd (@jobiebudd) 's Twitter Profile Photo

Is your AI-enabled diagnostic tool accurate, or does your dataset have confounding bias? Our Turing-RSS Health Data Lab paper, published today in Nature Machine Intelligence, investigates audio-based AI classifiers for COVID-19 screening. nature.com/articles/s4225…

Xander Davies (@alxndrdavies) 's Twitter Profile Photo

@AISafetyInst will be at DEF CON! If you'd like to chat abt attacking, defending, & evaling frontier models, DM me or fill out our form (in 🧵)

Xander Davies (@alxndrdavies) 's Twitter Profile Photo

Jailbreaking evals ~always focus on simple chatbots—excited to announce AgentHarm, a dataset for measuring harmfulness of LLM 𝑎𝑔𝑒𝑛𝑡𝑠 developed at @AISafetyInst in collaboration with Gray Swan AI! 🧵 1/N

Jailbreaking evals ~always focus on simple chatbots—excited to announce AgentHarm, a dataset for measuring harmfulness of LLM 𝑎𝑔𝑒𝑛𝑡𝑠 developed at @AISafetyInst in collaboration with <a href="/GraySwanAI/">Gray Swan AI</a>!
🧵 1/N
Gray Swan AI (@grayswanai) 's Twitter Profile Photo

Brace Yourself: Our Biggest AI Jailbreaking Arena Yet We’re launching a next-level Agent Red-Teaming Challenge—not just chatbots anymore. Think direct & indirect attacks on anonymous frontier models. $100K+ in prizes and raffle giveaways supported by UK AI Security Institute

Brace Yourself: Our Biggest AI Jailbreaking Arena Yet

We’re launching a next-level Agent Red-Teaming Challenge—not just chatbots anymore. Think direct &amp; indirect attacks on anonymous frontier models.

$100K+ in prizes and raffle giveaways supported by UK <a href="/AISecurityInst/">AI Security Institute</a>
Xander Davies (@alxndrdavies) 's Twitter Profile Photo

My team is hiring AI Security Institute! I think this is one of the most important times in history to have strong technical expertise in government. Join our team understanding and fixing weaknesses in frontier models through sota adversarial ML research & testing. 🧵 1/4

My team is hiring <a href="/AISecurityInst/">AI Security Institute</a>! I think this is one of the most important times in history to have strong technical expertise in government. Join our team understanding and fixing weaknesses in frontier models through sota adversarial ML research &amp; testing. 🧵 1/4
AI Security Institute (@aisecurityinst) 's Twitter Profile Photo

🧵 AI Systems are developing advanced cyber capabilities. This means they’re helping strengthen defences - but can also be used as threats. To keep on top of these risks, we need more rigorous evaluations of agentic AI, which is why we’re releasing Inspect Cyber 🔍

Andy Zou (@andyzou_jiaming) 's Twitter Profile Photo

We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵

We deployed 44 AI agents and offered the internet $170K to attack them.

1.8M attempts, 62K breaches, including data leakage and financial loss.

🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵
AI Security Institute (@aisecurityinst) 's Twitter Profile Photo

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️

Xander Davies (@alxndrdavies) 's Twitter Profile Photo

We at AI Security Institute worked with OpenAI to test GPT-5's safeguards. We identified multiple jailbreaks, including a universal jailbreak that evades all layers of mitigations and is being patched. Excited to continue partnering with OpenAI to test & strengthen safeguards.

We at <a href="/AISecurityInst/">AI Security Institute</a> worked with <a href="/OpenAI/">OpenAI</a> to test GPT-5's safeguards. We identified multiple jailbreaks, including a universal jailbreak that evades all layers of mitigations and is being patched. Excited to continue partnering with OpenAI to test &amp; strengthen safeguards.
AI Security Institute (@aisecurityinst) 's Twitter Profile Photo

How can open-weight Large Language Models be safeguarded against malicious uses? In our new paper with EleutherAI, we find that removing harmful data before training can be over 10x more effective at resisting adversarial fine-tuning than defences added after training 🧵

How can open-weight Large Language Models be safeguarded against malicious uses?

In our new paper with <a href="/AiEleuther/">EleutherAI</a>, we find that removing harmful data before training can be over 10x more effective at resisting adversarial fine-tuning than defences added after training 🧵
Harry Coppock (@harrycoppock) 's Twitter Profile Photo

This is great news for the UK. Having worked with Jade over the past 2 years, setting up AI Security Institute, I am confident that there are very few, if any, who are better placed to take on this role.

Robert Kirk (@_robertkirk) 's Twitter Profile Photo

Since I started working on safeguards, we've seen substantial progress in defending certain hosted models, but less progress in measuring & managing misuse risks from open weight models. Three directions I want explored more, drawn from our AI Security Institute post today 🧵

Since I started working on safeguards, we've seen substantial progress in defending certain hosted models, but less progress in measuring &amp; managing misuse risks from open weight models. Three directions I want explored more, drawn from our <a href="/AISecurityInst/">AI Security Institute</a> post today 🧵
AI Security Institute (@aisecurityinst) 's Twitter Profile Photo

🔎 People are increasingly using chatbots to seek out new information, raising concerns about how they could misinform voters or distort public opinion. But how is AI actually influencing real-world political beliefs? Our new study explores this question 👇

🔎 People are increasingly using chatbots to seek out new information, raising concerns about how they could misinform voters or distort public opinion.

But how is AI actually influencing real-world political beliefs? Our new study explores this question 👇
Xander Davies (@alxndrdavies) 's Twitter Profile Photo

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6