Bertie Vidgen (@bertievidgen) Twitter Tweets • TwiCopy

Adina Williams

3 years ago

How can we improve benchmarking? The Dynabench experiment aims to make faster progress with dynamic data collection, and today, we are pleased to introduce our next stage: @MetaAI has funded 5 exciting research proposals on the theme of "Rethinking Benchmarking”! Congrats to:

thumb_up_off_alt82

chat_bubble_outline2

repeat21

shareShare

Christopher Bouzy (spoutible.com/cbouzy)

@cbouzy

3 years ago

YouTube and platforms like YouTube must be regulated. We have allowed social media platforms to self-regulate, and it has been a complete disaster.

thumb_up_off_alt340

chat_bubble_outline11

repeat65

shareShare

Ethio NLP

@ethionlp

3 years ago

We are glad to announce the first SemEval shared task targeting African languages, AfriSenti-SemEval, Task 12. The shared task includes different prizes. Competition: codalab.lisn.upsaclay.fr/competitions/7… AfriSenti: afrisenti-semeval.github.io AfriSenti SemEval-2023 Shared Task 12 Ethio NLP EMNLP 2025

thumb_up_off_alt20

chat_bubble_outline2

repeat14

shareShare

Bertie Vidgen

@bertievidgen

3 years ago

Thanks for having us! A pleasure to talk about the new SemEval task on explainable ways of detecting online sexism, just launched at Rewire (acquired by ActiveFence)

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Simon Kendall

@skendallfcdo

3 years ago

What a privilege to moderate the 🇬🇧 -Bavarian online harms symposium at #MTM22 Thanks to Ian Stevenson & Bertie Vidgen for highlighting the work of 🇬🇧 world-class safety tech sector and discussing future UK–Bavarian collaboration with Dr Thorsten Schmiege BLM & Jana Heigl

What a privilege to moderate the 🇬🇧 -Bavarian online harms symposium at #MTM22 Thanks to <a href="/iancyclops/">Ian Stevenson</a> & <a href="/bertievidgen/">Bertie Vidgen</a> for highlighting the work of 🇬🇧 world-class safety tech sector and discussing future UK–Bavarian collaboration with Dr Thorsten Schmiege <a href="/BLM_Bayern/">BLM</a> & <a href="/janaheigl/">Jana Heigl</a>

thumb_up_off_alt15

chat_bubble_outline0

repeat4

shareShare

Paul Röttger

@paul_rottger

3 years ago

🥳 New paper at #EMNLP2022 (Main) 🥳 Too much hate speech research focuses just on English content! To help fix this, we tried to expand hate detection models into under-resourced languages 🌍 without needing lots of new data 💸 arxiv.org/abs/2210.11359 ⬇️ Highlights below ⬇️

thumb_up_off_alt104

chat_bubble_outline4

repeat26

shareShare

Hannah Rose Kirk

@hannahrosekirk

3 years ago

New #EMNLP2022 paper! Do you research online harms, misinformation or negative biases? Could your datasets contain examples of harmful text? 🤔 If yes, read our paper! 🫵arxiv.org/abs/2204.14256 Shoutout to my brilliant co-authors: Abeba Birhane, Leon Derczynski ✍🏻 🌞🏠🌲 & Bertie Vidgen A 🧵

thumb_up_off_alt103

chat_bubble_outline4

repeat22

shareShare

Paul Röttger

@paul_rottger

2 years ago

NEW PREPRINT! LLMs should be helpful AND harmless. This is a difficult balance to get right... Some models refuse even safe requests if they superficially resemble unsafe ones. We built XSTest to systematically test for such "exaggerated safety". 🧵 arxiv.org/abs/2308.01263

thumb_up_off_alt122

chat_bubble_outline7

repeat20

shareShare

PatronusAI

@patronusai

2 years ago

We are launching out of stealth today with a $3M seed round led by Lightspeed, with participation from Amjad Masad, Gokul Rajaram, Matt Hartman and other fortune 500 execs and board members 🚀 Read our story here: patronus.ai/blog/patronus-…

thumb_up_off_alt174

chat_bubble_outline9

repeat28

shareShare

Bertie Vidgen

@bertievidgen

2 years ago

I am very biased but this is an amazing launch by great people, creating a much-needed and incredibly powerful product! If you're using an #LLM then you need to know how it works, which means #evaluating it. No-one has solved how to do that reliably and at scale ...until now 🥳

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Paul Röttger

@paul_rottger

2 years ago

If you’re working on LLM safety, check out SafetyPrompts.com! SafetyPrompts.com is a catalogue of open datasets for evaluating and improving LLM safety. I started building this over the holidays, and I know there are still datasets missing, so I need your help 🧵

thumb_up_off_alt216

chat_bubble_outline9

repeat53

shareShare

PatronusAI

@patronusai

a year ago

1/ Introducing Lynx - the leading hallucination detection model 🚀👀 - Beats GPT-4o on hallucination tasks - Open source, open weights, open data - Excels in real-world domains like medicine and finance We are excited to launch Lynx with Day 1 integration partners: NVIDIA,

thumb_up_off_alt335

chat_bubble_outline7

repeat71

shareShare

Shikib Mehri

@shikibmehri

10 months ago

🚨 You're evaluating your LLM wrong 🚨 🤔 Human eval is unscalable, expensive & can be noisy, especially at the edge of capabilities 📉 Reward models compress complex notions of quality into opaque scores, and can't be steered after training 🤖 'LLM as a Judge' can't learn

thumb_up_off_alt29

chat_bubble_outline2

repeat7

shareShare

Bertie Vidgen

@bertievidgen

10 months ago

Without evaluation you have no idea what your model is doing. But using LM-as-a-judge isn't great and human annotators are expensive and noisy. LMUnit solves this tradeoff to scalably, reliably, and explainably eval your model. A lot of fun to work on this at Contextual AI

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Paul Röttger

@paul_rottger

9 months ago

Today, we are releasing MSTS, a new Multimodal Safety Test Suite for vision-language models! MSTS is exciting because it tests for safety risks *created by multimodality*. Each prompt consists of a text + image that *only in combination* reveal their full unsafe meaning. 🧵

thumb_up_off_alt48

chat_bubble_outline1

repeat20

shareShare

Bertie Vidgen

@bertievidgen

9 months ago

Thoroughly enjoying #Nero by Anthony Everitt and Roddy Ashworth -- but wasn't Germanicus Claudius' brother, not his father? Given it talks about Germanicus falling sick and dying in Syria, rather than having a horse collapse on him, I think this might be a mistake 🤯

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Shikib Mehri

@shikibmehri

8 months ago

LLMs must be factually accurate. Especially to power autonomous agents for complex, long-horizon tasks. But how do we actually do this in practice? 🧵[1/16] Learn about groundedness, why LLMs hallucinate, and how ContextualAI built the world's most grounded LLM!

thumb_up_off_alt30

chat_bubble_outline1

repeat9

shareShare

Bertie Vidgen

@bertievidgen

8 months ago

Having a grounded LM is not just about good retrievals -- even if you pass the model the right information, it still needs to *use it in the right way* I was surprised by how much LMs hallucinate even when literally told the right info... thankfully, Contextual is solving it :)

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Bertie Vidgen

@bertievidgen

8 months ago

🚨 50 million free tokens 🤯 Our reranker is SOTA and -- much more excitingly -- is the world's first **steerable** reranker. You can give it instructions in free text and watch it get to work!! This is the most effective way of handling real-world messiness in docs.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Bertie Vidgen

@bertievidgen

3 months ago

The Mercor grad fellowship is worth $50k!! It's inspiring to work for a startup that offers exceptional people from _anywhere such incredible opportunities. As a PhD student I hustled to get extra income... Mercor would have been a godsend. Apply to work on our platform now.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare