Lijie Fan (@lijie_fan) Twitter Tweets • TwiCopy

Lijie Fan

@lijie_fan

+ Follow

Research Scientist @GoogleDeepMind. CS PhD @MIT

ID: 1664080243374821387

linkhttp://lijiefan.me calendar_today01-06-2023 01:23:45

9 Tweet

231 Followers

51 Following

AK

@_akhaliq

2 years ago

Improving CLIP Training with Language Rewrites introduce Language augmented CLIP (LaCLIP), a simple yet highly effective approach to enhance CLIP training through language rewrites. Leveraging the in-context learning capability of large language models, we rewrite the text

thumb_up_off_alt192

chat_bubble_outline0

repeat36

shareShare

Dilip Krishnan

@dilipkay

2 years ago

New paper! We show how to leverage pre-trained LLMs (ChatGPT, Bard, LLaMa) to rewrite captions, and significantly improve over CLIP embeddings: arxiv.org/abs/2305.20088 Joint work with Yonglong Tian Phillip Isola (MIT), Dina Katabi (MIT) Lijie Fan (MIT)

thumb_up_off_alt152

chat_bubble_outline1

repeat29

shareShare

AK

@_akhaliq

2 years ago

StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners paper page: huggingface.co/papers/2306.00… We investigate the potential of learning visual representations using synthetic images generated by text-to-image models. This is a natural

thumb_up_off_alt128

chat_bubble_outline2

repeat26

shareShare

Phillip Isola

@phillip_isola

2 years ago

Should you train your vision system on real images or synthetic? In the era of stable diffusion, the answer seems to be: synthetic! One stable diffusion sample can be worth more than one real image. paper link: arxiv.org/abs/2306.00984

thumb_up_off_alt243

chat_bubble_outline7

repeat47

shareShare

Lijie Fan

@lijie_fan

2 years ago

🚀 Is the future of vision models Synthetic? Introducing SynCLR: our new pipeline leveraging LLMs & Text-to-image models to train vision models with only synthetic data! 🔥 Outperforming SOTAs like DinoV2 & CLIP on real images! SynCLR excels in fine-grained classification &

thumb_up_off_alt188

chat_bubble_outline3

repeat40

shareShare

Yonglong Tian

@yonglongt

2 years ago

HNY! Excited to share SynCLR, that rivals CLIP and Dino v2 but uses pure synthetic data. The interesting part - it can outperform models (e.g. CLIP) directly trained on LAION-2B, which was the dataset used to train SD 1.5 that we used to generate images. arxiv.org/abs/2312.17742

thumb_up_off_alt280

chat_bubble_outline5

repeat42

shareShare

Yonglong Tian

@yonglongt

a year ago

Do we still need codebook/quantization for scalable autoregressive visual generation? No! Thrilled to share our latest work on scaling w/ continuous tokens. We observe power-law scaling behavior on val loss, and obtain SOTA coco FID and GenEval score. arxiv.org/abs/2410.13863

thumb_up_off_alt279

chat_bubble_outline6

repeat46

shareShare

Tianhong Li

@tianhongli6

a year ago

Check out our latest VQ-free text-to-image model, Fluid! At last, an autoregressive model can generate the face of the Mona Lisa, thanks to continuous token modeling 🤣.

thumb_up_off_alt13

chat_bubble_outline1

repeat1

shareShare