Efstathios Karypidis (@k_sta8is) Twitter Tweets • TwiCopy

Harry Thasarathan

10 months ago

🌌🛰️Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"! arxiv.org/abs/2502.03714 (1/9)

thumb_up_off_alt349

chat_bubble_outline4

repeat93

shareShare

Kosta Derpanis

@csprofkgd

8 months ago

Made with Sora Input: KITTI image Prompt 1: “Make this into a semantic segmentation map” Prompt 2: “Make this into a depth map”

thumb_up_off_alt191

chat_bubble_outline9

repeat18

shareShare

Rudy Gilman

@rgilman33

8 months ago

The sdxl-VAE models a substantial amount of noise. Things we can't even see. It meticulously encodes the noise, uses precious bottleneck capacity to store it, then faithfully reconstructs it in the decoder. I grabbed what I thought was a simple black vector circle on a white

thumb_up_off_alt418

chat_bubble_outline20

repeat42

shareShare

Thodoris Kouzelis

@thkouz

7 months ago

1/n Introducing ReDi (Representation Diffusion): a new generative approach that leverages a diffusion model to jointly capture – Low-level image details (via VAE latents) – High-level semantic features (via DINOv2)🧵

thumb_up_off_alt320

chat_bubble_outline3

repeat49

shareShare

Thodoris Kouzelis

@thkouz

7 months ago

EQ-VAE is accepted at #ICML2025 😁. Grateful to my co-authors for their guidance and collaboration! Ioannis Kakogeorgiou, Spyros Gidaris, Nikos Komodakis.

thumb_up_off_alt25

chat_bubble_outline0

repeat4

shareShare

AI Native Foundation

@ainativef

6 months ago

8. Multi-Token Prediction Needs Registers 🔑 Keywords: multi-token prediction, MuToR, language model, fine-tuning, generative tasks 💡 Category: Natural Language Processing 🌟 Research Objective: The paper introduces MuToR, a novel approach aimed at enhancing multi-token

thumb_up_off_alt2

chat_bubble_outline1

repeat2

shareShare

Anastasios Gerontopoulos

@nasosger

6 months ago

1/n Multi-token prediction boosts LLMs (DeepSeek-V3), tackling key limitations of the next-token setup: • Short-term focus • Struggles with long-range decisions • Weaker supervision Prior methods add complexity (extra layers) 🔑 Our fix? Register tokens—elegant and powerful

thumb_up_off_alt134

chat_bubble_outline3

repeat17

shareShare

Spyros Gidaris

@spyrosgidaris

6 months ago

Better LLM training? Gregor Bachmann & Vaishnavh Nagarajan showed next-token prediction causes shortcut learning. A fix? Multi-token prediction training (thanks Fabian Gloeckle) We use register tokens: minimal architecture changes & scalable prediction horizons x.com/NasosGer/statu…

thumb_up_off_alt9

chat_bubble_outline0

repeat5

shareShare

Bin Lin

@linbin46984

6 months ago

🚀UniWorld: a unified model that skips VAEs and uses semantic features from SigLIP! Using just 1% of BAGEL’s data, it outperforms on image editing and excels in understanding & generation. 🌟Now data, model, training & evaluation script are open-source! github.com/PKU-YuanGroup/…

thumb_up_off_alt190

chat_bubble_outline4

repeat33

shareShare

Andrei Bursuc

@abursuc

6 months ago

Achievement unlocked: having Alyosha at our FUNGI poster, the one person I had in mind when working on this paper on cheap and better representations for k-nn classification and not only #cvprinparis #cvpr2025

thumb_up_off_alt62

chat_bubble_outline1

repeat6

shareShare

valeo.ai

@valeoai

6 months ago

Just back from CVPR@Paris 🇫🇷, what a fantastic event! Great talks, great posters, and great to connect with the French & European vision community. Kudos to the organizers, hoping that it returns next year! 🤞 #CVPR2025 #CVPR2025

thumb_up_off_alt26

chat_bubble_outline0

repeat4

shareShare

Sophia Sirko-Galouchenko

@sophia_sirko

5 months ago

1/n 🚀New paper out - accepted at #ICCV2025! Introducing DIP: unsupervised post-training that enhances dense features in pretrained ViTs for dense in-context scene understanding Below: Low-shot in-context semantic segmentation examples. DIP features outperform DINOv2!

1/n 🚀New paper out - accepted at <a href="/ICCVConference/">#ICCV2025</a>!

Introducing DIP: unsupervised post-training that enhances dense features in pretrained ViTs for dense in-context scene understanding

Below: Low-shot in-context semantic segmentation examples. DIP features outperform DINOv2!

thumb_up_off_alt121

chat_bubble_outline2

repeat25

shareShare

Shashank

@shawshank_v

5 months ago

New paper out - accepted at #ICCV2025 We introduce MoSiC, a self-supervised learning framework that learns temporally consistent representations from video using motion cues. Key idea: leverage long-range point tracks to enforce dense feature coherence across time.🧵

New paper out - accepted at
<a href="/ICCVConference/">#ICCV2025</a>

We introduce MoSiC, a self-supervised learning framework that learns temporally consistent representations from video using motion cues.

Key idea: leverage long-range point tracks to enforce dense feature coherence across time.🧵

thumb_up_off_alt127

chat_bubble_outline2

repeat23

shareShare

Andrei Bursuc

@abursuc

5 months ago

Nice trick for fine-tuning with multi-token prediction without architecture changes: interleave learnable register tokens into the input sequence & discard them at inference. It works for supervised fine-tuning, PEFT, pretraining, on both language and vision domains 👇

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Lucas Beyer (bl16)

@giffmana

5 months ago

Interesting alternative to multi-token prediction, though the figure is a bit unintuitive. Instead of attaching a head for each +d'th prediction, pass a dummy input token for each extra prediction through the model. This is A LOT more expensive, e.g. doing 2-step prediction

thumb_up_off_alt311

chat_bubble_outline12

repeat20

shareShare