Bo Li (@uiuc_aisecure) 's Twitter Profile
Bo Li

@uiuc_aisecure

Virtue AI, UIUC
@VirtueAI_co

ID: 1260314954051313665

linkhttps://aisecure.github.io/ calendar_today12-05-2020 21:04:52

194 Tweet

1,1K Takipçi

312 Takip Edilen

Virtue AI (@virtueai_co) 's Twitter Profile Photo

AI Safety Comparison: OpenAI o3-mini vs. Deepseek-R1 VirtueAI conducted an in-depth red-teaming evaluation of two leading AI models to assess their safety, bias, privacy protections, and robustness. Key findings: 1. o3-mini demonstrates stronger privacy safeguards and fairness

AI Safety Comparison: OpenAI o3-mini vs. Deepseek-R1

VirtueAI conducted an in-depth red-teaming evaluation of two leading AI models to assess their safety, bias, privacy protections, and robustness. Key findings:

1. o3-mini demonstrates stronger privacy safeguards and fairness
Yihe Deng (@yihe__deng) 's Twitter Profile Photo

New paper & model release! Excited to introduce DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails, showcasing our new DuoGuard-0.5B model. - Model: huggingface.co/DuoGuard/DuoGu… - Paper: arxiv.org/abs/2502.05163 - GitHub: github.com/yihedeng9/DuoG… Grounded in a

New paper & model release!

Excited to introduce DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails, showcasing our new DuoGuard-0.5B model.

- Model: huggingface.co/DuoGuard/DuoGu…
- Paper: arxiv.org/abs/2502.05163
- GitHub: github.com/yihedeng9/DuoG…

Grounded in a
Virtue AI (@virtueai_co) 's Twitter Profile Photo

We’re partnering with Glean to adapt Virtue AI’s pioneering research in AI security, including content moderation, guardrails, and red teaming to Glean’s enterprise customers. Find out more: bit.ly/3CZs6I6

We’re partnering with <a href="/glean/">Glean</a> to adapt Virtue AI’s pioneering research in AI security, including content moderation, guardrails, and red teaming to Glean’s enterprise customers. Find out more: bit.ly/3CZs6I6
Yue Huang (@howieh36226) 's Twitter Profile Photo

Toward Trustworthy Generative Foundation Models (GenFMs) 🚀 🎇After six months of hard work and thanks to the efforts of the entire team, our report on the trustworthiness of generative foundation models (GenFMs) has finally been released. 💡In this work, we: -Developed a

Toward Trustworthy Generative Foundation Models (GenFMs) 🚀

🎇After six months of hard work and thanks to the efforts of the entire team, our report on the trustworthiness of generative foundation models (GenFMs) has finally been released.

💡In this work, we:
-Developed a
Virtue AI (@virtueai_co) 's Twitter Profile Photo

Can Reasoning Improve Safety & Security? Red-Teaming Analysis for Claude 3.7 🚀 Claude 3.7 Sonnet Thinking: A New Era of Hybrid Reasoning? Anthropic's latest release introduces a Thinking mode, letting users switch between rapid responses and step-by-step reasoning. But does

Can Reasoning Improve Safety &amp; Security? Red-Teaming Analysis for Claude 3.7

🚀 Claude 3.7 Sonnet Thinking: A New Era of Hybrid Reasoning?

Anthropic's latest release introduces a Thinking mode, letting users switch between rapid responses and step-by-step reasoning. But does
Virtue AI (@virtueai_co) 's Twitter Profile Photo

Virtue AI just released our red-teaming analysis of OpenAI’s GPT-4.5 in comparison with Paul Jankura’s Claude 3.7! We tested them on safety, security, hallucination, regulatory compliance, codeGen vulnerabilities, and more. Here’s what we found... (1/9)

Virtue AI just released our red-teaming analysis of <a href="/OpenAI/">OpenAI</a>’s GPT-4.5 in comparison with <a href="/Anthropic/">Paul Jankura</a>’s Claude 3.7! We tested them on safety, security, hallucination, regulatory compliance, codeGen vulnerabilities, and more. Here’s what we found... (1/9)
Virtue AI (@virtueai_co) 's Twitter Profile Photo

Virtue AI is honored to be recognized by Intel's new CEO, Lip-Bu Tan, during his opening keynote at #IntelVision. Wishing him great success as he leads Intel into its next exciting chapter. AI security & safety have become the critical last mile for AI applications. At Virtue

Virtue AI (@virtueai_co) 's Twitter Profile Photo

Join Virtue AI Co-founder Sanmi Koyejo for a live webinar on why protecting your AI apps isn’t just about safety—it’s the key to faster deployment and growth. 📅 April 24 | 🕙 10 AM PT | 💻 Virtual In this session, we’ll cover: ✅ Why traditional security tooling falls short

Join Virtue AI Co-founder Sanmi Koyejo for a live webinar on why protecting your AI apps isn’t just about safety—it’s the key to faster deployment and growth.

📅 April 24 | 🕙 10 AM PT | 💻 Virtual

In this session, we’ll cover:
✅ Why traditional security tooling falls short
Virtue AI (@virtueai_co) 's Twitter Profile Photo

We’ve raised $30M in Seed + Series A funding led by Lightspeed and Walden Catalyst Ventures, with participation from Prosperity7 Ventures, Factory, Osage University Partners (OUP), Lip-Bu Tan, Chris Re, and more. Virtue AI is the first unified platform for securing AI across

Carlos Guestrin (@guestrin) 's Twitter Profile Photo

We are super excited to empower developers to focus on their goal of building innovative AI applications; we’ll take care of safety and security! What an awesome ride with Bo Li Bo Li, Sanmi Koyejo, Dawn Song and the whole Virtue AI team!

Zhaorun Chen @ICLR2025 (@zrchen_aisafety) 's Twitter Profile Photo

📷Come and check our #ICLR2025 poster 𝗦𝗮𝗳𝗲𝗪𝗮𝘁𝗰𝗵!!🔥 Today April 25th 3:00 pm -5:30 pm 📷 at Poster Session Hall 3 #547📷📷

Virtue AI (@virtueai_co) 's Twitter Profile Photo

🚨 3 days out from our live webinar on the EU AI Act hosted by Sanmi Koyejo and Jan Eißfeldt! Register now: us06web.zoom.us/webinar/regist… ⬇️ Details below

Zhaorun Chen @ICLR2025 (@zrchen_aisafety) 's Twitter Profile Photo

VirtueAgent provides the first systematic guardrails for general AI agents!! Super exciting work such that we can rest assured and let our agents handle things for us!👍

Virtue AI (@virtueai_co) 's Twitter Profile Photo

🚨 Introducing VirtueGuard Code: Real-time vulnerability detection for AI-generated code. As coding assistants like Cursor and GitHub Copilot become standard in development workflows, it’s critical to ensure that generated code meets security standards. VirtueGuard Code is

Virtue AI (@virtueai_co) 's Twitter Profile Photo

Congrats to our partners at Glean on their first-ever hashtag #GleanGO conference! 🎉 We’re honored to be part of their security and governance ecosystem, helping power trusted AI across the enterprise.

Congrats to our partners at <a href="/glean/">Glean</a> on their first-ever hashtag #GleanGO conference! 🎉 

We’re honored to be part of their security and governance ecosystem, helping power trusted AI across the enterprise.
Virtue AI (@virtueai_co) 's Twitter Profile Photo

Autonomous AI agents are rapidly being deployed across industries, from web browsing copilots to code-writing assistants and enterprise workflow agents. But these systems come with a new class of security risks that traditional guardrails and red teaming aren’t equipped to

Autonomous AI agents are rapidly being deployed across industries, from web browsing copilots to code-writing assistants and enterprise workflow agents. But these systems come with a new class of security risks that traditional guardrails and red teaming aren’t equipped to
Bo Li (@uiuc_aisecure) 's Twitter Profile Photo

Safety & security definitions are domain-specific in most cases -- We provide the first domain-specific, and policy-grounded guardrail benchmark! Exciting to enter the stage of nuanced guardrail protection for foundation models and AI applications!

Together AI (@togethercompute) 's Twitter Profile Photo

🛡️ VirtueGuard is LIVE on Together AI 🚀 AI security and safety model that screens input and output for harmful content: ⚡ Under 10ms 𝗿𝗲𝘀𝗽𝗼𝗻𝘀𝗲 🎯 𝟴𝟵% 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 vs 76% (AWS Bedrock) 🧠 𝗖𝗼𝗻𝘁𝗲𝘅𝘁-𝗮𝘄𝗮𝗿𝗲 - adapts to your policies, not just keywords 👇

🛡️ VirtueGuard is LIVE on Together AI 🚀

AI security and safety model that screens input and output for harmful content:

⚡ Under 10ms 𝗿𝗲𝘀𝗽𝗼𝗻𝘀𝗲  
🎯 𝟴𝟵% 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 vs 76% (AWS Bedrock)
🧠 𝗖𝗼𝗻𝘁𝗲𝘅𝘁-𝗮𝘄𝗮𝗿𝗲 - adapts to your policies, not just keywords 👇