Bertie Vidgen (@bertievidgen) 's Twitter Profile
Bertie Vidgen

@bertievidgen

AI and LLM safety evaluation

ID: 100974895

linkhttps://rewire.online/ calendar_today01-01-2010 13:24:47

940 Tweet

889 Takipçi

549 Takip Edilen

Turing Public Policy (@turingpubpol) 's Twitter Profile Photo

"According to the new study, the majority of users who send abusive tweets to players are not necessarily hiding behind anonymous accounts as they (...) regularly send non-abusive tweets to players too." Full report👉 tinyurl.com/ycyd9hne Team up to tackle #onlineabuse ⚽️

Tristan Thrush (@tristanthrush) 's Twitter Profile Photo

We’re going to do it! We’ll train and release masked and causal language models (e.g. BERT & GPT-2) on new Common Crawl snapshots as they come out! We call this project Online Language Modeling (OLM). What applications or research questions can we enable or help answer? A 🧵:

Rewire (acquired by ActiveFence) (@rewire_online) 's Twitter Profile Photo

We're super excited to announce that as part of the ALPHA programme, our co-founders Bertie Vidgen & Paul Röttger are participating in "Lisbon's tech fest" this November 1-4 as part of the Web Summit startup programme. Be sure to come to say hello! 🤩 #websummit #techforgood

We're super excited to announce that as part of the ALPHA programme, our co-founders <a href="/bertievidgen/">Bertie Vidgen</a> &amp; <a href="/paul_rottger/">Paul Röttger</a> are participating in "Lisbon's tech fest" this November 1-4 as part of the <a href="/WebSummit/">Web Summit</a> startup programme. 

Be sure to come to say hello! 🤩 
#websummit #techforgood
Bertie Vidgen (@bertievidgen) 's Twitter Profile Photo

Sexist content is all too common online - many platforms just dont have the right tools to find and action it at scale We set out to fix this problem with our SemEval task, building explainable AI models to detect sexism. 👇 Sign up now to help make the Internet safer!

Bertie Vidgen (@bertievidgen) 's Twitter Profile Photo

Toxic and harmful behaviour is not just a problem on #socialmedia - it appears everywhere. That's why we're so excited to be working with Deutsche Bahn Personenverkehr to handle their customer feedback and identify when it becomes #toxic . DM me if you want to find out more :)

Adina Williams (@adinamwilliams) 's Twitter Profile Photo

How can we improve benchmarking? The Dynabench experiment aims to make faster progress with dynamic data collection, and today, we are pleased to introduce our next stage: @MetaAI has funded 5 exciting research proposals on the theme of "Rethinking Benchmarking”! Congrats to:

Christopher Bouzy (spoutible.com/cbouzy) (@cbouzy) 's Twitter Profile Photo

YouTube and platforms like YouTube must be regulated. We have allowed social media platforms to self-regulate, and it has been a complete disaster.

Ethio NLP (@ethionlp) 's Twitter Profile Photo

We are glad to announce the first SemEval shared task targeting African languages, AfriSenti-SemEval, Task 12. The shared task includes different prizes. Competition: codalab.lisn.upsaclay.fr/competitions/7… AfriSenti: afrisenti-semeval.github.io AfriSenti SemEval-2023 Shared Task 12 Ethio NLP EMNLP 2024

Simon Kendall (@skendallfcdo) 's Twitter Profile Photo

What a privilege to moderate the 🇬🇧 -Bavarian online harms symposium at #MTM22 Thanks to Ian Stevenson & Bertie Vidgen for highlighting the work of 🇬🇧 world-class safety tech sector and discussing future UK–Bavarian collaboration with Dr Thorsten Schmiege BLM & Jana Heigl

What a privilege to moderate the 🇬🇧 -Bavarian online harms symposium at #MTM22 Thanks to <a href="/iancyclops/">Ian Stevenson</a> &amp; <a href="/bertievidgen/">Bertie Vidgen</a> for highlighting the work of 🇬🇧 world-class safety tech sector and discussing future UK–Bavarian collaboration with Dr Thorsten Schmiege <a href="/BLM_Bayern/">BLM</a> &amp; <a href="/janaheigl/">Jana Heigl</a>
Paul Röttger (@paul_rottger) 's Twitter Profile Photo

🥳 New paper at #EMNLP2022 (Main) 🥳 Too much hate speech research focuses just on English content! To help fix this, we tried to expand hate detection models into under-resourced languages 🌍 without needing lots of new data 💸 arxiv.org/abs/2210.11359 ⬇️ Highlights below ⬇️

Hannah Rose Kirk (@hannahrosekirk) 's Twitter Profile Photo

New #EMNLP2022 paper! Do you research online harms, misinformation or negative biases? Could your datasets contain examples of harmful text? 🤔 If yes, read our paper! 🫵arxiv.org/abs/2204.14256 Shoutout to my brilliant co-authors: Abeba Birhane, Leon Derczynski ✍🏻🌹☀️ & Bertie Vidgen A 🧵

Paul Röttger (@paul_rottger) 's Twitter Profile Photo

NEW PREPRINT! LLMs should be helpful AND harmless. This is a difficult balance to get right... Some models refuse even safe requests if they superficially resemble unsafe ones. We built XSTest to systematically test for such "exaggerated safety". 🧵 arxiv.org/abs/2308.01263

NEW PREPRINT!

LLMs should be helpful AND harmless. This is a difficult balance to get right...

Some models refuse even safe requests if they superficially resemble unsafe ones. We built XSTest to systematically test for such "exaggerated safety".

🧵

arxiv.org/abs/2308.01263
PatronusAI (@patronusai) 's Twitter Profile Photo

We are launching out of stealth today with a $3M seed round led by Lightspeed, with participation from Amjad Masad, Gokul Rajaram, Matt Hartman and other fortune 500 execs and board members 🚀 Read our story here: patronus.ai/blog/patronus-…

Bertie Vidgen (@bertievidgen) 's Twitter Profile Photo

I am very biased but this is an amazing launch by great people, creating a much-needed and incredibly powerful product! If you're using an #LLM then you need to know how it works, which means #evaluating it. No-one has solved how to do that reliably and at scale ...until now 🥳

Paul Röttger (@paul_rottger) 's Twitter Profile Photo

If you’re working on LLM safety, check out SafetyPrompts.com! SafetyPrompts.com is a catalogue of open datasets for evaluating and improving LLM safety. I started building this over the holidays, and I know there are still datasets missing, so I need your help 🧵

If you’re working on LLM safety, check out SafetyPrompts.com!

SafetyPrompts.com is a catalogue of open datasets for evaluating and improving LLM safety. I started building this over the holidays, and I know there are still datasets missing, so I need your help 🧵
PatronusAI (@patronusai) 's Twitter Profile Photo

1/ Introducing Lynx - the leading hallucination detection model 🚀👀 - Beats GPT-4o on hallucination tasks - Open source, open weights, open data - Excels in real-world domains like medicine and finance We are excited to launch Lynx with Day 1 integration partners: NVIDIA,