Sotiris Anagnostidis (@sanagnostidis) 's Twitter Profile
Sotiris Anagnostidis

@sanagnostidis

PhD in ETH Zürich. MLP-pilled 💊. Previously @Meta GenAI, @GoogleDeepMind, @Huawei, @ntua

ID: 1471221876693377038

linkhttp://sanagnos.pages.dev/ calendar_today15-12-2021 20:53:39

51 Tweet

165 Takipçi

455 Takip Edilen

Dimitri von Rütte (@dvruette) 's Twitter Profile Photo

Attending #NeurIPS2023 in New Orleans this week to present OpenAssistant (arxiv.org/abs/2304.07327)! Happy to chat about open-source LLMs, personalized image generation, and more. DMs are open!

Gregor Bachmann (@gregorbachmann1) 's Twitter Profile Photo

I’ll be presenting "Scaling MLPs" at #NeurIPS2023, tomorrow (Wed) at 10:45am! Hyped to discuss things like inductive bias, the bitter lesson, compute-optimality and scaling laws 👷⚖️📈

I’ll be presenting "Scaling MLPs" at #NeurIPS2023, tomorrow (Wed) at 10:45am! 
Hyped to discuss things like inductive bias, the bitter lesson, compute-optimality and scaling laws 👷⚖️📈
AK (@_akhaliq) 's Twitter Profile Photo

LIME: Localized Image Editing via Attention Regularization in Diffusion Models paper page: huggingface.co/papers/2312.09… Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation.

LIME: Localized Image Editing via Attention Regularization in Diffusion Models

paper page: huggingface.co/papers/2312.09…

Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation.
Dimitri von Rütte (@dvruette) 's Twitter Profile Photo

🚨 Calling on all FABRIC users! We need your help to learn about how you’ve been using FABRIC. Help us by taking 5 minutes to fill out the survey. Haven’t tried FABRIC yet? Just try it using our Gradio demo! ✨👨‍🎨 📊 Survey: forms.gle/aMWLDW8xvyhkLb… 👾 Demo:

Dimitri von Rütte (@dvruette) 's Twitter Profile Photo

🚨📜 Announcing our latest work on LLM interpretability: We are able to control a model's humor, creativity, quality, truthfulness, and compliance by applying concept vectors to its hidden neural activations. 🧵 arxiv.org/abs/2402.14433

🚨📜 Announcing our latest work on LLM interpretability: We are able to control a model's humor, creativity, quality, truthfulness, and compliance by applying concept vectors to its hidden neural activations. 🧵
arxiv.org/abs/2402.14433
Dimitri von Rütte (@dvruette) 's Twitter Profile Photo

To try it out yourself and for technical implementation details, check out our HF space and GitHub. 🤗 Demo: huggingface.co/spaces/dvruett… 👾 Code: github.com/dvruette/conce…

Bobby (@bobby_he) 's Twitter Profile Photo

Outlier Features (OFs) aka “neurons with big features” emerge in standard transformer training & prevent benefits of quantisation🥲but why do OFs appear & which design choices minimise them? Our new work (+Lorenzo Noci Daniele Paliotta Imanol Schlag T. Hofmann) takes a look👀🧵

Outlier Features (OFs) aka “neurons with big features” emerge in standard transformer training & prevent benefits of quantisation🥲but why do OFs appear & which design choices minimise them?

Our new work (+<a href="/lorenzo_noci/">Lorenzo Noci</a> <a href="/DanielePaliotta/">Daniele Paliotta</a> <a href="/ImanolSchlag/">Imanol Schlag</a> T. Hofmann) takes a look👀🧵
Aurelien Lucchi (@aurelienlucchi) 's Twitter Profile Photo

The University of Basel, Switzerland, is offering an open-rank Professorship in AI and Foundation Models. For more information, visit this link: jobs.unibas.ch/offene-stellen….

Dimitri von Rütte (@dvruette) 's Twitter Profile Photo

We’re presenting our work on concept guidance today at 13:30’s ICML poster session (# 706). Come by and say hi! #ICML #ICML2024

Samuel Albanie 🇬🇧 (@samuelalbanie) 's Twitter Profile Photo

Are LLMs easily influenced? Interesting work from Sotiris Anagnostidis and Jannis Bulian TLDR: Having an LLM advocate for a question answer in the prompt significantly influences predictions

Are LLMs easily influenced?

Interesting work from <a href="/SAnagnostidis/">Sotiris Anagnostidis</a> and <a href="/jannis1/">Jannis Bulian</a>  

TLDR: Having an LLM advocate for a question answer in the prompt significantly influences predictions
Bobby (@bobby_he) 's Twitter Profile Photo

Updated camera ready arxiv.org/abs/2405.19279. New results include: - non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor) - Scaling to 7B params - showing our methods to reduce OFs translate to PTQ int8 quantisation ease. Check it out!

Updated camera ready arxiv.org/abs/2405.19279. New results include:

- non-diagonal preconditioners (SOAP/Shampoo) minimise OFs compared to diagonal (Adam/AdaFactor)
- Scaling to 7B params
 - showing our methods to reduce OFs translate to PTQ int8 quantisation ease.

Check it out!
Dimitri von Rütte (@dvruette) 's Twitter Profile Photo

🚨 NEW PAPER DROP! Wouldn't it be nice if LLMs could spot and correct their own mistakes? And what if we could do so directly from pre-training, without any SFT or RL? We present a new class of discrete diffusion models, called GIDD, that are able to do just that: 🧵1/12

Weronika Ormaniec (@wormaniec) 's Twitter Profile Photo

Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique? With Sidak Pal Singh & Felix Dangel, we explore this via the Hessian in our #ICLR2025 spotlight paper! Key insights👇 1/8

Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer components make its loss landscape unique?

With <a href="/unregularized/">Sidak Pal Singh</a> &amp; <a href="/f_dangel/">Felix Dangel</a>, we explore this via the Hessian in our #ICLR2025 spotlight paper!

Key insights👇 1/8
Enea Monzio Compagnoni (@eneamc) 's Twitter Profile Photo

If you are at ICLR 2026, come by my poster tomorrow at 10.00 am! You find me at Hall 3 + Hall 2B #367! See you there! iclr.cc/virtual/2025/p… #ICLR2025

Artsiom Sanakoyeu (@artsiom_s) 's Twitter Profile Photo

Thrilled to share that our CVPR 2025 paper “𝐀𝐮𝐭𝐨𝐫𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐯𝐞 𝐃𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬”(ARD) has been selected as an Oral! ✨ Catch us at CVPR on Saturday, June 14 🗣 Oral Session 4A — 14:00-14:15, Karl Dean Ballroom

Thrilled to share that our CVPR 2025 paper “𝐀𝐮𝐭𝐨𝐫𝐞𝐠𝐫𝐞𝐬𝐬𝐢𝐯𝐞 𝐃𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐃𝐢𝐟𝐟𝐮𝐬𝐢𝐨𝐧 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬”(ARD) has been selected as an Oral! ✨

Catch us at CVPR on Saturday, June 14
 🗣 Oral Session 4A — 14:00-14:15, Karl Dean Ballroom
Edgar Schoenfeld (@schoenfeldedgar) 's Twitter Profile Photo

🚀 Want to speed up your image and video model inference? Come see our highlight poster at #CVPR2026 : "FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute" 📍 Today at 4 PM, ExHall D – Poster #205 🔗 arxiv.org/abs/2502.20126 Work done

🚀 Want to speed up your image and video model inference?

Come see our highlight poster at <a href="/CVPR/">#CVPR2026</a> :

"FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute"

📍 Today at 4 PM, ExHall D – Poster #205 🔗 arxiv.org/abs/2502.20126

Work done
Neil Houlsby (@neilhoulsby) 's Twitter Profile Photo

📣 Anthropic Zurich is hiring again 🇨🇭 The team has been shaping up fantastically over the last months, and I have re-opened applications for pre-training. We welcome applications from anywhere along the "scientist/engineer spectrum". If building the future of AI for the