Lijie Fan (@lijie_fan) 's Twitter Profile
Lijie Fan

@lijie_fan

Research Scientist @GoogleDeepMind. CS PhD @MIT

ID: 1664080243374821387

linkhttp://lijiefan.me calendar_today01-06-2023 01:23:45

9 Tweet

231 Followers

51 Following

AK (@_akhaliq) 's Twitter Profile Photo

Improving CLIP Training with Language Rewrites introduce Language augmented CLIP (LaCLIP), a simple yet highly effective approach to enhance CLIP training through language rewrites. Leveraging the in-context learning capability of large language models, we rewrite the text

Improving CLIP Training with Language Rewrites

introduce Language augmented CLIP (LaCLIP), a simple yet highly effective approach to enhance CLIP training through language rewrites. Leveraging the in-context learning capability of large language models, we rewrite the text
Dilip Krishnan (@dilipkay) 's Twitter Profile Photo

New paper! We show how to leverage pre-trained LLMs (ChatGPT, Bard, LLaMa) to rewrite captions, and significantly improve over CLIP embeddings: arxiv.org/abs/2305.20088 Joint work with Yonglong Tian Phillip Isola (MIT), Dina Katabi (MIT) Lijie Fan (MIT)

New paper! We show how to leverage pre-trained LLMs (ChatGPT, Bard, LLaMa) to rewrite captions, and significantly improve over CLIP embeddings: arxiv.org/abs/2305.20088

Joint work with <a href="/YonglongT/">Yonglong Tian</a> <a href="/phillip_isola/">Phillip Isola</a> (MIT), Dina Katabi (MIT) Lijie Fan (MIT)
AK (@_akhaliq) 's Twitter Profile Photo

StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners paper page: huggingface.co/papers/2306.00… We investigate the potential of learning visual representations using synthetic images generated by text-to-image models. This is a natural

StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

paper page: huggingface.co/papers/2306.00…

We investigate the potential of learning visual representations using synthetic images generated by text-to-image models. This is a natural
Phillip Isola (@phillip_isola) 's Twitter Profile Photo

Should you train your vision system on real images or synthetic? In the era of stable diffusion, the answer seems to be: synthetic! One stable diffusion sample can be worth more than one real image. paper link: arxiv.org/abs/2306.00984

Lijie Fan (@lijie_fan) 's Twitter Profile Photo

πŸš€ Is the future of vision models Synthetic? Introducing SynCLR: our new pipeline leveraging LLMs & Text-to-image models to train vision models with only synthetic data! πŸ”₯ Outperforming SOTAs like DinoV2 & CLIP on real images! SynCLR excels in fine-grained classification &

πŸš€ Is the future of vision models Synthetic? Introducing SynCLR: our new pipeline leveraging LLMs &amp; Text-to-image models to train vision models with only synthetic data!
πŸ”₯ Outperforming SOTAs like DinoV2 &amp; CLIP on real images! SynCLR excels in fine-grained classification &amp;
Yonglong Tian (@yonglongt) 's Twitter Profile Photo

HNY! Excited to share SynCLR, that rivals CLIP and Dino v2 but uses pure synthetic data. The interesting part - it can outperform models (e.g. CLIP) directly trained on LAION-2B, which was the dataset used to train SD 1.5 that we used to generate images. arxiv.org/abs/2312.17742

HNY! Excited to share SynCLR, that rivals CLIP and Dino v2 but uses pure synthetic data.

The interesting part - it can outperform models (e.g. CLIP) directly trained on LAION-2B, which was the dataset used to train SD 1.5 that we used to generate images. 
arxiv.org/abs/2312.17742
Yonglong Tian (@yonglongt) 's Twitter Profile Photo

Do we still need codebook/quantization for scalable autoregressive visual generation? No! Thrilled to share our latest work on scaling w/ continuous tokens. We observe power-law scaling behavior on val loss, and obtain SOTA coco FID and GenEval score. arxiv.org/abs/2410.13863

Do we still need codebook/quantization for scalable autoregressive visual generation?

No! Thrilled to share our latest work on scaling w/ continuous tokens. We observe power-law scaling behavior on val loss, and obtain SOTA coco FID and GenEval score.
arxiv.org/abs/2410.13863
Tianhong Li (@tianhongli6) 's Twitter Profile Photo

Check out our latest VQ-free text-to-image model, Fluid! At last, an autoregressive model can generate the face of the Mona Lisa, thanks to continuous token modeling 🀣.