![Simone Scardapane(@s_scardapane) 's Twitter Profileg](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
Simone Scardapane
@s_scardapane
I fall in love with a new #machinelearning topic every month 🙄
Tenure-track Ass. Prof. @SapienzaRoma | Previously @iaml_it @SmarterPodcast | @GoogleDevExpert
ID:1235205731747540993
https://www.sscardapane.it/ 04-03-2020 14:09:51
1,4K Tweets
8,2K Followers
672 Following
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*A Primer on the Inner Workings of Transformer LMs*
by Javier Ferrando Gabriele Sarti Arianna Bisazza Marta R. Costa-jussa
I was waiting for this! Cool comprehensive survey on interpretability methods for LLMs, with a focus on recent techniques (e.g., logit lens).
arxiv.org/abs/2405.00208
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-05-03 13:08:37 *A Primer on the Inner Workings of Transformer LMs* by @javifer_96 @gsarti_ @AriannaBisazza @costajussamarta I was waiting for this! Cool comprehensive survey on interpretability methods for LLMs, with a focus on recent techniques (e.g., logit lens). arxiv.org/abs/2405.00208 *A Primer on the Inner Workings of Transformer LMs* by @javifer_96 @gsarti_ @AriannaBisazza @costajussamarta I was waiting for this! Cool comprehensive survey on interpretability methods for LLMs, with a focus on recent techniques (e.g., logit lens). arxiv.org/abs/2405.00208](https://pbs.twimg.com/media/GMqAhvrW4AAGuxr.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*Kolmogorov-Arnold Networks (KANs)* by Ziming Liu et al.
Since everyone is talking about KANs, I wrote some notes on Notion with a few research questions I find interesting.
First time I do something like this, give me some feedback. 🙃
sscardapane.notion.site/Kolmogorov-Arn…
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-05-02 17:07:35 *Kolmogorov-Arnold Networks (KANs)* by @ZimingLiu11 et al. Since everyone is talking about KANs, I wrote some notes on Notion with a few research questions I find interesting. First time I do something like this, give me some feedback. 🙃 sscardapane.notion.site/Kolmogorov-Arn… *Kolmogorov-Arnold Networks (KANs)* by @ZimingLiu11 et al. Since everyone is talking about KANs, I wrote some notes on Notion with a few research questions I find interesting. First time I do something like this, give me some feedback. 🙃 sscardapane.notion.site/Kolmogorov-Arn…](https://pbs.twimg.com/media/GMlt-JLXIAEa5gN.jpg)
![Mustafa Hajij(@HajijMustafa) 's Twitter Profile Photo Mustafa Hajij(@HajijMustafa) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1650000927439613952/pKDwp2Rr_200x200.jpg)
I am glad to announce that our position paper on Topological Deep Learning has been accepted to ICML!
Congrats to the authors for the great effort!
arxiv.org/abs/2402.08871
Theodore Papamarkou
Tolga Birdal
Nina Miolane
Michael Bronstein
Petar Veličković
Bastian Grossenbacher-Rieck
Justin Curry
Simone Scardapane
![Mustafa Hajij (@HajijMustafa) on Twitter photo 2024-05-02 05:16:07 I am glad to announce that our position paper on Topological Deep Learning has been accepted to ICML! Congrats to the authors for the great effort! arxiv.org/abs/2402.08871 @theopapamarkou @tolga_birdal @ninamiolane @mmbronstein @PetarV_93 @Pseudomanifold @currying @s_scardapane I am glad to announce that our position paper on Topological Deep Learning has been accepted to ICML! Congrats to the authors for the great effort! arxiv.org/abs/2402.08871 @theopapamarkou @tolga_birdal @ninamiolane @mmbronstein @PetarV_93 @Pseudomanifold @currying @s_scardapane](https://pbs.twimg.com/media/GMjLfyhaAAA2fZ0.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*Decomposing and Editing Predictions by Modeling Model Computation*
by Harshay Shah Aleksander Madry andrewilyas
Learning to predict the effect of ablating a model component (e.g., a head) is helpful for understanding the model behavior and also for editing.
arxiv.org/abs/2404.11534
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-04-29 13:40:07 *Decomposing and Editing Predictions by Modeling Model Computation* by @harshays_ @aleks_madry @andrewilyas Learning to predict the effect of ablating a model component (e.g., a head) is helpful for understanding the model behavior and also for editing. arxiv.org/abs/2404.11534 *Decomposing and Editing Predictions by Modeling Model Computation* by @harshays_ @aleks_madry @andrewilyas Learning to predict the effect of ablating a model component (e.g., a head) is helpful for understanding the model behavior and also for editing. arxiv.org/abs/2404.11534](https://pbs.twimg.com/media/GMVhzUGWUAAkrsB.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*REPAIR: REnormalizing Permuted Activations for Interpolation Repair*
by Keller Jordan Hanie Sedghi Olga Saukh RahimEntezari
Behnam Neyshabur
Correcting the statistics of a layer significantly improves model fusion based on permutations of units.
arxiv.org/abs/2211.08403
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-04-29 12:22:51 *REPAIR: REnormalizing Permuted Activations for Interpolation Repair* by @kellerjordan0 @HanieSedghi @osaukh @rahiment @bneyshabur Correcting the statistics of a layer significantly improves model fusion based on permutations of units. arxiv.org/abs/2211.08403 *REPAIR: REnormalizing Permuted Activations for Interpolation Repair* by @kellerjordan0 @HanieSedghi @osaukh @rahiment @bneyshabur Correcting the statistics of a layer significantly improves model fusion based on permutations of units. arxiv.org/abs/2211.08403](https://pbs.twimg.com/media/GMVQT9OW8AAdaiz.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*A Multimodal Automated Interpretability Agent*
by Tamar Rott Shaham Sarah Schwettmann
Franklin Wang Achyuta Rajaram
Evan Hernandez Jacob Andreas
An experiment in using a multimodal VLM to generate hypotheses to explain a given neuron behavior.
arxiv.org/abs/2404.14394
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-04-23 12:34:29 *A Multimodal Automated Interpretability Agent* by @TamarRottShaham @cogconfluence @f_x_wang @AchyutaBot @evanqed @jacobandreas An experiment in using a multimodal VLM to generate hypotheses to explain a given neuron behavior. arxiv.org/abs/2404.14394 *A Multimodal Automated Interpretability Agent* by @TamarRottShaham @cogconfluence @f_x_wang @AchyutaBot @evanqed @jacobandreas An experiment in using a multimodal VLM to generate hypotheses to explain a given neuron behavior. arxiv.org/abs/2404.14394](https://pbs.twimg.com/media/GL2ZZ12WYAENyBk.png)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*Patchscopes: Inspecting Hidden Representations of LLMs*
by Asma Ghandeharioun Avi Caciularu Adam Pearce
iislucas (Lucas Dixon) Mor Geva
A framework for explaining LLMs via 'patching', where a separate LLM is used to translate the internal embeddings to an explanation.
arxiv.org/abs/2401.06102
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-04-19 10:25:56 *Patchscopes: Inspecting Hidden Representations of LLMs* by @ghandeharioun @clu_avi @adamrpearce @iislucas @megamor2 A framework for explaining LLMs via 'patching', where a separate LLM is used to translate the internal embeddings to an explanation. arxiv.org/abs/2401.06102 *Patchscopes: Inspecting Hidden Representations of LLMs* by @ghandeharioun @clu_avi @adamrpearce @iislucas @megamor2 A framework for explaining LLMs via 'patching', where a separate LLM is used to translate the internal embeddings to an explanation. arxiv.org/abs/2401.06102](https://pbs.twimg.com/media/GLhVdLzWIAA5X-f.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*RHO-1: Not All Tokens Are What You Need*
by Zhibin Gou Weizhu Chen
A small training phase on curated data helps in filtering out useful and harmful tokens for language modeling.
arxiv.org/abs/2404.07965
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-04-12 14:11:34 *RHO-1: Not All Tokens Are What You Need* by @zebgou @WeizhuChen A small training phase on curated data helps in filtering out useful and harmful tokens for language modeling. arxiv.org/abs/2404.07965 *RHO-1: Not All Tokens Are What You Need* by @zebgou @WeizhuChen A small training phase on curated data helps in filtering out useful and harmful tokens for language modeling. arxiv.org/abs/2404.07965](https://pbs.twimg.com/media/GK-GAQ_WQAA0oIE.png)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*DiPaCo: Distributed Path Composition*
by Arthur Douillard Qixuan Feng Andrei A. Rusu Ionel Gog Marc'Aurelio Ranzato
MoE-like models may be fundamental for transcontinental training of large models, by sharding data *and model's paths* across locations.
arxiv.org/abs/2403.10616
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-04-08 14:54:50 *DiPaCo: Distributed Path Composition* by @Ar_Douillard @qixuan_feng @andreialexrusu @ICGog @MarcRanzato MoE-like models may be fundamental for transcontinental training of large models, by sharding data *and model's paths* across locations. arxiv.org/abs/2403.10616 *DiPaCo: Distributed Path Composition* by @Ar_Douillard @qixuan_feng @andreialexrusu @ICGog @MarcRanzato MoE-like models may be fundamental for transcontinental training of large models, by sharding data *and model's paths* across locations. arxiv.org/abs/2403.10616](https://pbs.twimg.com/media/GKppqaVXEAAIEUs.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
Need some arctic in your life?
We have open PhD/Postdocs on relational graph and temporal ML for energy analytics! 🔥
Top-tier research w/ competitive salaries, hosted in the beautiful UiT in Norway & supervised by Filippo Maria Bianchi.
All details here -> en.uit.no/project/relay
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-04-05 14:03:41 Need some arctic in your life? We have open PhD/Postdocs on relational graph and temporal ML for energy analytics! 🔥 Top-tier research w/ competitive salaries, hosted in the beautiful UiT in Norway & supervised by @FilippoMariaBi1. All details here -> en.uit.no/project/relay Need some arctic in your life? We have open PhD/Postdocs on relational graph and temporal ML for energy analytics! 🔥 Top-tier research w/ competitive salaries, hosted in the beautiful UiT in Norway & supervised by @FilippoMariaBi1. All details here -> en.uit.no/project/relay](https://pbs.twimg.com/media/GKaAl-PWoAEaUeg.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*Mixture-of-Depths: Dynamically allocating compute in transformer-based LMs*
by sam ritter Blake Richards Adam Santoro
A variant of MoEs having only a single expert per block which can be either skipped or applied per-token up to some given capacity.
arxiv.org/abs/2404.02258
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-04-04 09:31:14 *Mixture-of-Depths: Dynamically allocating compute in transformer-based LMs* by @ritterstorm @tyrell_turing @santoroAI A variant of MoEs having only a single expert per block which can be either skipped or applied per-token up to some given capacity. arxiv.org/abs/2404.02258 *Mixture-of-Depths: Dynamically allocating compute in transformer-based LMs* by @ritterstorm @tyrell_turing @santoroAI A variant of MoEs having only a single expert per block which can be either skipped or applied per-token up to some given capacity. arxiv.org/abs/2404.02258](https://pbs.twimg.com/media/GKT4M0wXsAA7U37.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*Equivariant Adaptation of Large Pretrained Models*
Arnab Siba Smarak Panigrahi Oumar Kaba
Sai Rajeswar
A technique to make pre-trained models robust to learned symmetries by combining them with a 'canonicalization' network.
arxiv.org/abs/2310.01647
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-03-25 11:31:45 *Equivariant Adaptation of Large Pretrained Models* @ArnabMondal96 @sibasmarak @sekoumarkaba @RajeswarSai A technique to make pre-trained models robust to learned symmetries by combining them with a 'canonicalization' network. arxiv.org/abs/2310.01647 *Equivariant Adaptation of Large Pretrained Models* @ArnabMondal96 @sibasmarak @sekoumarkaba @RajeswarSai A technique to make pre-trained models robust to learned symmetries by combining them with a 'canonicalization' network. arxiv.org/abs/2310.01647](https://pbs.twimg.com/media/GJg06V_WIAEe-RC.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*Variational Learning is Effective for Large Deep Networks*
by Nico Daheim Emtiyaz Khan Gian Maria Marconi
Peter Nickl Rio Yokota Thomas Möllenhoff
A variant of Adam provides a scalable algorithm to train networks via variational inference.
arxiv.org/abs/2402.17641
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-03-22 15:00:01 *Variational Learning is Effective for Large Deep Networks* by @ndaheim_ @EmtiyazKhan @ShhhPeaceful @PeterNickl_ @rioyokota @tmoellenhoff A variant of Adam provides a scalable algorithm to train networks via variational inference. arxiv.org/abs/2402.17641 *Variational Learning is Effective for Large Deep Networks* by @ndaheim_ @EmtiyazKhan @ShhhPeaceful @PeterNickl_ @rioyokota @tmoellenhoff A variant of Adam provides a scalable algorithm to train networks via variational inference. arxiv.org/abs/2402.17641](https://pbs.twimg.com/media/GJMCB7MWUAIsqgw.jpg)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference*
by Piotr Nawrot Adrian Lancucki Edoardo Ponti
A dynamic KV cache for LLM generation that can be trained to satisfy a given memory budget.
arxiv.org/abs/2403.09636
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-03-21 14:30:00 *Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference* by @p_nawrot @AdrianLancucki @PontiEdoardo A dynamic KV cache for LLM generation that can be trained to satisfy a given memory budget. arxiv.org/abs/2403.09636 *Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference* by @p_nawrot @AdrianLancucki @PontiEdoardo A dynamic KV cache for LLM generation that can be trained to satisfy a given memory budget. arxiv.org/abs/2403.09636](https://pbs.twimg.com/media/GJIF1d1XUAAkA5R.png)
![Simone Scardapane(@s_scardapane) 's Twitter Profile Photo Simone Scardapane(@s_scardapane) 's Twitter Profile Photo](https://pbs.twimg.com/profile_images/1679966961524981761/a0ORHr1T_200x200.jpg)
*Vision Transformer (ViT) Prisma Library*
by Sonia Joseph Praneet Suresh Yash Vadi
Simple library to perform basic 'mechanistic interpretability' visualizations such as the logit lens on vision models (ViTs, CLIP).
github.com/soniajoseph/Vi…
![Simone Scardapane (@s_scardapane) on Twitter photo 2024-03-20 15:46:51 *Vision Transformer (ViT) Prisma Library* by @soniajoseph_ @magicProgrammer @IamYashVadi Simple library to perform basic 'mechanistic interpretability' visualizations such as the logit lens on vision models (ViTs, CLIP). github.com/soniajoseph/Vi… *Vision Transformer (ViT) Prisma Library* by @soniajoseph_ @magicProgrammer @IamYashVadi Simple library to perform basic 'mechanistic interpretability' visualizations such as the logit lens on vision models (ViTs, CLIP). github.com/soniajoseph/Vi…](https://pbs.twimg.com/media/GJH_UWYWQAATupg.png)