Preethi Seshadri (@preethi__s_) 's Twitter Profile
Preethi Seshadri

@preethi__s_

PhD Student interested in safety and societal issues around language and multimodal models + data and model evaluation 💜💛🏀

ID: 1249818301342965761

calendar_today13-04-2020 21:55:08

157 Tweet

324 Followers

1,1K Following

Nathaniel R. Robinson (@robinson_n8) 's Twitter Profile Photo

Many LLMs struggle to produce Dialectal Arabic. As practitioners attempt to mitigate this, new evaluation methods are needed. We present AL-QASIDA (Analyzing LLM Quality + Accuracy Systematically In Dialectal Arabic), a comprehensive eval of LLM Dialectal Arabic proficiency (1/7)

Many LLMs struggle to produce Dialectal Arabic. As practitioners attempt to mitigate this, new evaluation methods are needed. We present AL-QASIDA (Analyzing LLM Quality + Accuracy Systematically In Dialectal Arabic), a comprehensive eval of LLM Dialectal Arabic proficiency (1/7)
Danish Pruthi (@danish037) 's Twitter Profile Photo

Our study highlighting plagiarism concerns in AI-generated research is now accepted to ACL (main conference): arxiv.org/abs/2502.16487. Effort led by amazing Tarun Gupta. Will share other accepted papers soon. Stay tuned 🙂

Sahil Verma (@sahil1v) 's Twitter Profile Photo

🚨 New Paper! 🚨 Guard models slow, language-specific, and modality-limited? Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀 arxiv.org/abs/2505.23856

🚨 New Paper! 🚨
Guard models slow, language-specific, and modality-limited?

Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀

arxiv.org/abs/2505.23856
Chris Toukmaji (@christoukmaji) 's Twitter Profile Photo

🧵Excited to share our paper “Prompt, Translate, Fine-Tune, Re-Initialize or Instruction-Tune? Adapting LLMs for In-Context Learning in Low-Resource Languages” was accepted to ACL GEM! The largest study of its kind; here’s what we found over 4.1K+ GPU hrs… (1/5) #ACL2025 #NLProc

🧵Excited to share our paper “Prompt, Translate, Fine-Tune, Re-Initialize or Instruction-Tune? Adapting LLMs for In-Context Learning in Low-Resource Languages” was accepted to ACL GEM! The largest study of its kind; here’s what we found over 4.1K+ GPU hrs… (1/5) #ACL2025 #NLProc
Sushant Agarwal (@_sushantagarwal) 's Twitter Profile Photo

Presenting "Optimal Fair Learning Robust to Adversarial Distribution Shift" at #ICML2025 (openreview.net/pdf?id=TGcXwWd…) 📍East Exhibition Hall A-B #E-1001 ⏲️16th July, 4:30-7PM Please have a look, and do stop by if it sounds interesting to you! RT's appreciated😊Summary to follow

Presenting "Optimal Fair Learning Robust to Adversarial Distribution Shift" at #ICML2025 (openreview.net/pdf?id=TGcXwWd…)

📍East Exhibition Hall A-B #E-1001
⏲️16th July, 4:30-7PM  

Please have a look, and do stop by if it sounds interesting to you!
RT's appreciated😊Summary to follow
Yu Fei (@walter_fei) 's Twitter Profile Photo

Excited to present our work at #ACL2025NLP's Panel 2: LLM Alignment! 🚀 One of just 25 papers selected for panel out of 8300+ submissions—don't miss it! 🌐 Project: fywalter.github.io/nudging/ 🆕 Code (API & caching): github.com/fywalter/nudgi… 🆕 Interactive Demo:

Excited to present our work at #ACL2025NLP's Panel 2: LLM Alignment! 🚀

One of just 25 papers selected for panel out of 8300+ submissions—don't miss it!

🌐 Project: fywalter.github.io/nudging/

🆕 Code (API & caching): github.com/fywalter/nudgi…

🆕 Interactive Demo:
Lisa Alazraki (@lisaalazraki) 's Twitter Profile Photo

We have released #AgentCoMa, an agentic reasoning benchmark where each task requires a mix of commonsense and math to be solved 🧐 LLM agents performing real-world tasks should be able to combine these different types of reasoning, but are they fit for the job? 🤔 🧵⬇️

We have released #AgentCoMa, an agentic reasoning benchmark where each task requires a mix of commonsense and math to be solved 🧐

LLM agents performing real-world tasks should be able to combine these different types of reasoning, but are they fit for the job? 🤔

🧵⬇️
Preethi Seshadri (@preethi__s_) 's Twitter Profile Photo

Arjun and Vagrant will be presenting our work at Session 3 (11AM 🕚 poster session) at Conference on Language Modeling. If you're interested in LLM evaluation practices, probability vs. generation-based evals, harms and misgendering, please go say hi 👋! Link to paper 📜: arxiv.org/abs/2504.17075

Samuel Cahyawijaya (@scahyawijaya) 's Twitter Profile Photo

We are looking for participants to join a 30-minute user study on how people interact with AI systems. If you’re a Singaporean or someone who is comfortable conversing in Singlish 🇸🇬, your insights are invaluable! 🚀 Link to user study: tally.so/r/3xOYAJ

Lisa Alazraki (@lisaalazraki) 's Twitter Profile Photo

Just arrived at #EMNLP2025 in Suzhou. Looking forward to meeting with everyone! Will be giving an oral presentation of our paper No Need For Explanations: LLMs Can Implicitly Learn from Mistakes In Context this Friday 7th November at 11.30 am in Hall A108 🎤

Just arrived at #EMNLP2025 in Suzhou. Looking forward to meeting with everyone! Will be giving an oral presentation of our paper No Need For Explanations: LLMs Can Implicitly Learn from Mistakes In Context this Friday 7th November at 11.30 am in Hall A108 🎤
Yanai Elazar (@yanaiela) 's Twitter Profile Photo

Interested in interpretability, data attribution, evaluation, and similar topics? Interested in doing a postdoc with me? Apply to the prestigious Azrieli program! Link below 👇 DMs are open (email is good too!)

Siva Reddy (@sivareddyg) 's Twitter Profile Photo

Seraphina Goldfarb-Tarrant (Seraphina Goldfarb-Tarrant) on not forgetting the users Move from Static single-turn to dynamic multi-turn A hard evaluation problem Have a user in the loop or simulate the user On Tau-Bench, user LLM matters. Different LLMs result in differerent success rates.

Seraphina Goldfarb-Tarrant (<a href="/seraphinagt/">Seraphina Goldfarb-Tarrant</a>) on not forgetting the users 

Move from Static single-turn to dynamic multi-turn
A hard evaluation problem
Have a user in the loop or simulate the user

On Tau-Bench, user LLM matters. Different LLMs result in differerent success rates.
Seraphina Goldfarb-Tarrant (@seraphinagt) 's Twitter Profile Photo

Thanks for the invite IVADO ! Featuring work by Preethi Seshadri @ NeurIPS 2025 on not forgetting the simulated user in dynamic evals (or you forget users outside the Anglosphere) & Nishant Balepur on how to make plans that actually increase productivity (*don't* ask humans/models their pref)

Preethi Seshadri (@preethi__s_) 's Twitter Profile Photo

I’ll be at #NeurIPS2025 ☀️ Please say hi :) If you want to chat about evaluation, data, safety, societal impact, harms, or anything related, let’s grab ☕️. I’m also looking for industry roles and would love to connect about opportunities!

Kolby Nottingham (@kolbytn) 's Twitter Profile Photo

I've added an AIxGames channel to the #NeurIPS2025 AtConf app to chat all things AI & Games. Join up and spread the word! ♟️🎲🕹️🎮 We're also planning an AIxGames mixer in SD this week. Details comings soon.

Valentina Pyatkin (@valentina__py) 's Twitter Profile Photo

I started a part-time role at ETH AI Center, mentoring students and working on post-training for the Swiss AI Initiative! 🤩Looking forward to working with interesting people like Hanna Yukhymenko Imanol Schlag Yixuan Xu Nathan Arnout Devos If you are a student at ETHZ or EPFL

Tamanna Hossain-Kay (@thossainkay) 's Twitter Profile Photo

📢 #AACL2025 oral presentation tomorrow @10pm PST on my new paper from my internship at @dataminr! We show WHAT info is lost in Mamba SSM LMs (eg. numbers!), while prior work only shows WHEN loss occurs in this efficient alternative to transformers 🧵 arxiv.org/abs/2512.15653