linker (@keylinker) 's Twitter Profile
linker

@keylinker

computer vision/deep learning, Prev. leading ai part @ dealicious, leading ai research team @ oddconcepts, naver search & naverlabs's researcher @ naver.

ID: 1667766445

calendar_today13-08-2013 12:59:09

6,6K Tweet

193 Takipçi

927 Takip Edilen

Jay Z. Wu (@jayzhangjiewu) 's Twitter Profile Photo

🚀 ChronoEdit is now open-source! Edit images in only 8 diffusion steps (~4s per image on H100). 💻 Code: github.com/nv-tlabs/Chron… 🤗 Model: huggingface.co/nvidia/ChronoE… 🎨 Demo: huggingface.co/spaces/nvidia/… Huge thanks to AK for featuring our work 🙏

Unsloth AI (@unslothai) 's Twitter Profile Photo

You can now run Qwen3-VL locally! 💜 Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes. Qwen3-VL-2B runs at ~40 t/s on 4GB RAM. Fine-tune & RL via Unsloth free notebooks & export to GGUF. docs.unsloth.ai/models/qwen3-vl

You can now run Qwen3-VL locally! 💜

Run the 235B variant for SOTA vision/OCR on 128GB unified memory (dynamic 4-bit). Includes our chat template fixes.

Qwen3-VL-2B runs at ~40 t/s on 4GB RAM.

Fine-tune & RL via Unsloth free notebooks & export to GGUF.

docs.unsloth.ai/models/qwen3-vl
אגי-e/acc (@murage_kibicho) 's Twitter Profile Photo

We wrote 'The Annotated Diffusion Transformer'. OpenAI's Sora uses a diffusion transformer to generate video. DiTs answer the question: what if we replaced the U-net in a diffusion model with a Transformer. Link in comments.

Dmytro Mishkin 🇺🇦 (@ducha_aiki) 's Twitter Profile Photo

Instance-Level Composed Image Retrieval Bill Psomas George Retsinas, Nikos Efthymiadis, Panagiotis Filntisis,Yannis Avrithis, Petros Maragos, Ondrej Chum, Giorgos Tolias tl;dr: condition-based retrieval (and dataset) - old photo/sunset/night/aerial/model arxiv.org/abs/2510.25387

Instance-Level Composed Image Retrieval
<a href="/bill_psomas/">Bill Psomas</a> George Retsinas, Nikos Efthymiadis, Panagiotis Filntisis,Yannis Avrithis, Petros Maragos, Ondrej Chum, Giorgos Tolias

tl;dr: condition-based retrieval (and dataset) - old photo/sunset/night/aerial/model
arxiv.org/abs/2510.25387
机器之心 JIQIZHIXIN (@synced_global) 's Twitter Profile Photo

Can text queries help vision-language models see better? Enhanced Language-Image Pre-training (ELIP) uses the text input to generate visual prompts that guide the image encoder—boosting text-to-image retrieval without retraining the model. It plugs seamlessly into CLIP, SigLIP,

Can text queries help vision-language models see better?

Enhanced Language-Image Pre-training (ELIP) uses the text input to generate visual prompts that guide the image encoder—boosting text-to-image retrieval without retraining the model. It plugs seamlessly into CLIP, SigLIP,
Alec Helbling (@alec_helbling) 's Twitter Profile Photo

Data often lie on a low-dimensional manifold embedded in a high-dimensional space. But these manifolds are often highly non-linear, making linear dimensionality reduction methods like PCA insufficient. This has motivated the development of non-linear dimensionality reduction.

Niels Rogge (@nielsrogge) 's Twitter Profile Photo

This is a phenomenal video by Jia-Bin Huang explaining seminal papers in computer vision, including CLIP, SimCLR, DINO v1/v2/v3 in 15 minutes DINO is actually a brilliant idea, I found the decision of 65k neurons in the output head pretty interesting

This is a phenomenal video by <a href="/jbhuang0604/">Jia-Bin Huang</a> explaining seminal papers in computer vision, including CLIP, SimCLR, DINO v1/v2/v3 in 15 minutes 

DINO is actually a brilliant idea, I found the decision of 65k neurons in the output head pretty interesting
Qwen (@alibaba_qwen) 's Twitter Profile Photo

🚀 Qwen3-VL Tech report is now out on arXiv! From pretraining to post-training, architecture to infra, data to evaluation — we’ve packed in the details for anyone building on vision-language models. 🔥 3 models >1M downloads in just over a month 🏆 Qwen3-VL-8B leads with 2M+

🚀 Qwen3-VL Tech report is now out on arXiv!

From pretraining to post-training, architecture to infra, data to evaluation — we’ve packed in the details for anyone building on vision-language models.

🔥 3 models &gt;1M downloads in just over a month
🏆 Qwen3-VL-8B leads with 2M+
sway (@swaystar123) 's Twitter Profile Photo

arxiv: arxiv.org/abs/2512.12386 github: github.com/SwayStar123/Sp… huggingface (pretrained checkpoints): huggingface.co/SwayStar123/Sp… wandb: wandb.ai/kagaku-ai/REG/ code for all ablations (available in branches): github.com/SwayStar123/REG

机器之心 JIQIZHIXIN (@synced_global) 's Twitter Profile Photo

Score Distillation of Flow Matching Models - Apple Machine Learning Research The University of Texas at Austin, Apple Paper: machinelearning.apple.com/research/score… Page:yigu1008.github.io/SiD-DiT/

Score Distillation of Flow Matching Models - Apple Machine Learning Research

The University of Texas at Austin, Apple
Paper: machinelearning.apple.com/research/score…
Page:yigu1008.github.io/SiD-DiT/
Martin Ziqiao Ma (@ziqiao_ma) 's Twitter Profile Photo

NEPA: Next-Embedding Predictive Autoregression A simple objective for visual SSL and generative pretraining. Instead of reconstructing pixels or predicting discrete tokens, we train an autoregressive model to predict the next embedding given all previous embeddings. Key ideas:

NEPA: Next-Embedding Predictive Autoregression

A simple objective for visual SSL and generative pretraining. Instead of reconstructing pixels or predicting discrete tokens, we train an autoregressive model to predict the next embedding given all previous embeddings.

Key ideas:
lucas (@lucas_flatwhite) 's Twitter Profile Photo

Claude Code의 창시자 보리스가 공유하는 워크플로를 북마크하고 참고하세요! 💬 안녕하세요, Claude Code를 만든 보리스입니다. 많은 분께서 제가 Claude Code를 어떻게 활용하는지 궁금해하셔서, 제 설정을 조금 소개해 드리려고 합니다. 제 설정은 놀라울 정도로 vanilla 할지도 모릅니다!

Claude Code의 창시자 보리스가 공유하는 워크플로를 북마크하고 참고하세요!

💬 안녕하세요, Claude Code를 만든 보리스입니다.

많은 분께서 제가 Claude Code를 어떻게 활용하는지 궁금해하셔서, 제 설정을 조금 소개해 드리려고 합니다.

제 설정은 놀라울 정도로 vanilla 할지도 모릅니다!
Bill Psomas (@bill_psomas) 's Twitter Profile Photo

🚀New task: Instance-level Image+Text→Image Retrieval 🔎Given a query image + an edit (“during night”), retrieve the same specific instance after the change — not just any similar object. 🛢New dataset on HF: i-CIR huggingface.co/datasets/billp… 🔥Download, run, and share results!

🚀New task: Instance-level Image+Text→Image Retrieval

🔎Given a query image + an edit (“during night”), retrieve the same specific instance after the change — not just any similar object.

🛢New dataset on HF: i-CIR huggingface.co/datasets/billp…

🔥Download, run, and share results!
DailyPapers (@huggingpapers) 's Twitter Profile Photo

Black Forest Labs just released a quantized FLUX.2 [dev] on Hugging Face 32B-parameter image generation & editing with multi-reference support for up to 10 images, 4MP resolution, and optimized text rendering—now optimized for NVIDIA GPUs.

Black Forest Labs just released a quantized FLUX.2 [dev] on Hugging Face

32B-parameter image generation &amp; editing with multi-reference support for up to 10 images, 4MP resolution, and optimized text rendering—now optimized for NVIDIA GPUs.
Pruna AI (@prunaai) 's Twitter Profile Photo

🚀 Z-Image-Turbo 1/3: Image-to-Image version of Z-Image-Turbo with LoRA support! Our img2img version of z-image enables you to take existing images, mockups, or reference materials and instantly adapt them to meet specific needs. 𝗪𝗵𝘆 𝗶𝘀 𝗶𝘁 𝘂𝘀𝗲𝗳𝘂𝗹: • Reference