tgeo92 (@tjoskz) Twitter Tweets • TwiCopy

Chunyuan Li

2 years ago

🚀 (1/6) Excited to release LLaVA-OneVision (OV)! An open, large multimodal model that excels across single-image, multi-image, and video tasks, while effortlessly unlocking new emerging capabilities with task transfer Paper: arxiv.org/abs/2408.03326 Blogs: llava-vl.github.io/blog/

thumb_up_off_alt111

chat_bubble_outline2

repeat32

shareShare

Lucas Beyer (bl16)

@giffmana

2 years ago

I wrote a blogpost "On the speed of ViTs and CNNs". Addresses the following concerns I often hear: - worry about ViTs speed at high resolution. - how high resolution do I need? - is it super important to keep the aspect ratio? I think Yann LeCun might like it too! Link below

thumb_up_off_alt707

chat_bubble_outline24

repeat93

shareShare

Sasha Rush

@srush_nlp

2 years ago

New Video: How to write an okay research paper. Reviewers all agree! Sasha Rush's papers are "reasonably structured" and "somewhat clear, despite other flaws". youtu.be/qNlwVGxkG7Q?si…

thumb_up_off_alt413

chat_bubble_outline2

repeat60

shareShare

Mickael Chen

@mickael_chen

2 years ago

I made a github repository exploring this question. github.com/mickaelChen/To… We visualize the embeddings inside a DiT and observe how it works around this issue. We also build a very simple toy experiments to see the limits of the method. tldr: not a big issue, but kind of ugly.

thumb_up_off_alt88

chat_bubble_outline4

repeat12

shareShare

Lucas Beyer (bl16)

@giffmana

2 years ago

I’ll be giving a talk on « vision in the age of LLMs » on Monday 11:00 cet which is open to public via the link below:

thumb_up_off_alt126

chat_bubble_outline6

repeat7

shareShare

Chanwoo Park

@chanwoopark20

2 years ago

I'm thrilled that the paper Kaiqing Zhang and I co-authored has made it onto the reading list for this course! ambujtewari.com/LLM-fall2024/ Any questions would be appreciated.

thumb_up_off_alt246

chat_bubble_outline4

repeat38

shareShare

Chunyuan Li

@chunyuanli

2 years ago

LLaVA-OneVision-Chat🤖💬 significantly improves the chat experience of LLaVA-OneVision, as demonstrated on 5 multimodal task! 🌟, achieved by preference alignment & self-critic feedback 📊 Doc: github.com/LLaVA-VL/LLaVA…

thumb_up_off_alt53

chat_bubble_outline0

repeat14

shareShare

Sasha Rush

@srush_nlp

2 years ago

Are you writing an ICLR paper, and wondering... * What level of detail should the background have? * Why do people read related work? * How do I convey convincing results? I have some mostly satisfactory answers to these questions. youtu.be/qNlwVGxkG7Q?si…

thumb_up_off_alt386

chat_bubble_outline1

repeat52

shareShare

Nick Jiang @ ICLR

@nickhjiang

2 years ago

🔥 Paper Drop 🔥 What can we understand by peering inside vision-language models (VLMs) like LLaVA? We show that image representations inside VLMs can be directly interpreted and edited in the language space, and we apply our findings to mitigate hallucinations!

thumb_up_off_alt215

chat_bubble_outline7

repeat35

shareShare

Christian Wolf (🦋🦋🦋)

@chriswolfvision

2 years ago

It looks like there are some efforts to improve on vanilla attention in transformers ... arxiv.org/abs/2410.02703 Selective Attention arxiv.org/abs/2410.11842… MHA as Mixture of Heads Attention arxiv.org/abs/2410.05258 Differential Transformer

thumb_up_off_alt100

chat_bubble_outline4

repeat16

shareShare

Yujin Tang

@yujintang99

2 years ago

Tips for Writing a Research Paper using LaTeX github.com/guanyingc/late…

thumb_up_off_alt1,1K

chat_bubble_outline7

repeat335

shareShare

Jitendra MALIK

@jitendramalikcv

a year ago

I'm happy to post course materials for my class at UC Berkeley "Robots that Learn", taught with the outstanding assistance of Toru. Lecture videos at youtube.com/playlist?list=… Lecture notes & other course materials at robots-that-learn.github.io

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat248

shareShare

Glen Berseth

@glenberseth

a year ago

I am teaching a class on #FoundationalModels for #robotics and Scaling #DeepRL algorithms. This class expands on last year's class and my generalist robotics policies tutorial and code. I plan to share the lectures and code assignments. Starting with the first lectures below.

thumb_up_off_alt717

chat_bubble_outline20

repeat120

shareShare

Sasha Rush

@srush_nlp

a year ago

Got talked into giving a DeepSeek talk this afternoon simons.berkeley.edu/workshops/llms… Not sure I have anything new to say here! But good excuse for me to read all the blogs.

thumb_up_off_alt474

chat_bubble_outline10

repeat52

shareShare

Soheil Feizi

@feizisoheil

a year ago

Re-sharing my 3.5 hr lecture on large language models as some might be interested. Will post updated lectures/materials/news in coming weeks. Stay tuned! Link: youtu.be/2yjzZfDQxy8 Topics: 0:00 Basics of language models 2:30 Word2vec 16:27 Transfer Learning 19:23 BERT

thumb_up_off_alt159

chat_bubble_outline2

repeat24

shareShare

Sander Dieleman

@sedielem

a year ago

New blog post: let's talk about latents! sander.ai/2025/04/15/lat…

thumb_up_off_alt946

chat_bubble_outline24

repeat188

shareShare

François Fleuret

@francoisfleuret

a year ago

What do you think are the most important improvements of the 2017 transformer architecture?

thumb_up_off_alt590

chat_bubble_outline48

repeat25

shareShare

François Fleuret

@francoisfleuret

a year ago

As expected, that was popular. Here is my attempt at consolidating all the answers into a list. - Prenorm: normalization in the residual blocks before the attention operation and the FFN respectively - GQA (Group Query Attention): more Q than (K, V)

thumb_up_off_alt704

chat_bubble_outline8

repeat64

shareShare

Rui Li

@leedaray

a year ago

🚀 Details of the #CVPR2025 award candidate papers are out. 14 of 2967 accepted papers made the list, spanning 3D vision, embodied AI, VLMs/MLLMs, learning systems, and scene understanding. 3D vision leads with the most entries. I collected the TL;DR, paper, and project links👇

thumb_up_off_alt349

chat_bubble_outline2

repeat39

shareShare

Lilian Weng

@lilianweng

a year ago

Giving your models more time to think before prediction, like via smart decoding, chain-of-thoughts reasoning, latent thoughts, etc, turns out to be quite effective for unblocking the next level of intelligence. New post is here :) “Why we think”: lilianweng.github.io/posts/2025-05-…

thumb_up_off_alt3,3K

chat_bubble_outline53

repeat418

shareShare