tgeo92 (@tjoskz) 's Twitter Profile
tgeo92

@tjoskz

ID: 863938419428175874

calendar_today15-05-2017 02:05:45

1,1K Tweet

441 Followers

4,4K Following

Chunyuan Li (@chunyuanli) 's Twitter Profile Photo

🚀 (1/6) Excited to release LLaVA-OneVision (OV)! An open, large multimodal model that excels across single-image, multi-image, and video tasks, while effortlessly unlocking new emerging capabilities with task transfer Paper: arxiv.org/abs/2408.03326 Blogs: llava-vl.github.io/blog/

🚀 (1/6) Excited to release LLaVA-OneVision (OV)! An open, large multimodal model that excels across single-image, multi-image, and video tasks, while effortlessly unlocking new emerging capabilities with task transfer
Paper: arxiv.org/abs/2408.03326
Blogs: llava-vl.github.io/blog/
Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

I wrote a blogpost "On the speed of ViTs and CNNs". Addresses the following concerns I often hear: - worry about ViTs speed at high resolution. - how high resolution do I need? - is it super important to keep the aspect ratio? I think Yann LeCun might like it too! Link below

I wrote a blogpost "On the speed of ViTs and CNNs".

Addresses the following concerns I often hear:

- worry about ViTs speed at high resolution.
- how high resolution do I need?
- is it super important to keep the aspect ratio?

I think <a href="/ylecun/">Yann LeCun</a> might like it too! Link below
Sasha Rush (@srush_nlp) 's Twitter Profile Photo

New Video: How to write an okay research paper. Reviewers all agree! Sasha Rush's papers are "reasonably structured" and "somewhat clear, despite other flaws". youtu.be/qNlwVGxkG7Q?si…

Mickael Chen (@mickael_chen) 's Twitter Profile Photo

I made a github repository exploring this question. github.com/mickaelChen/To… We visualize the embeddings inside a DiT and observe how it works around this issue. We also build a very simple toy experiments to see the limits of the method. tldr: not a big issue, but kind of ugly.

Chanwoo Park (@chanwoopark20) 's Twitter Profile Photo

I'm thrilled that the paper Kaiqing Zhang and I co-authored has made it onto the reading list for this course! ambujtewari.com/LLM-fall2024/ Any questions would be appreciated.

Chunyuan Li (@chunyuanli) 's Twitter Profile Photo

LLaVA-OneVision-Chat🤖💬 significantly improves the chat experience of LLaVA-OneVision, as demonstrated on 5 multimodal task! 🌟, achieved by preference alignment & self-critic feedback 📊 Doc: github.com/LLaVA-VL/LLaVA…

LLaVA-OneVision-Chat🤖💬 significantly improves the chat experience of LLaVA-OneVision, as demonstrated on 5 multimodal task! 🌟, achieved by preference alignment &amp; self-critic feedback 📊

Doc: github.com/LLaVA-VL/LLaVA…
Sasha Rush (@srush_nlp) 's Twitter Profile Photo

Are you writing an ICLR paper, and wondering... * What level of detail should the background have? * Why do people read related work? * How do I convey convincing results? I have some mostly satisfactory answers to these questions. youtu.be/qNlwVGxkG7Q?si…

Nick Jiang @ ICLR (@nickhjiang) 's Twitter Profile Photo

🔥 Paper Drop 🔥 What can we understand by peering inside vision-language models (VLMs) like LLaVA? We show that image representations inside VLMs can be directly interpreted and edited in the language space, and we apply our findings to mitigate hallucinations!

🔥 Paper Drop 🔥

What can we understand by peering inside vision-language models (VLMs) like LLaVA?

We show that image representations inside VLMs can be directly interpreted and edited in the language space, and we apply our findings to mitigate hallucinations!
Christian Wolf (🦋🦋🦋) (@chriswolfvision) 's Twitter Profile Photo

It looks like there are some efforts to improve on vanilla attention in transformers ... arxiv.org/abs/2410.02703 Selective Attention arxiv.org/abs/2410.11842… MHA as Mixture of Heads Attention arxiv.org/abs/2410.05258 Differential Transformer

Jitendra MALIK (@jitendramalikcv) 's Twitter Profile Photo

I'm happy to post course materials for my class at UC Berkeley "Robots that Learn", taught with the outstanding assistance of Toru. Lecture videos at youtube.com/playlist?list=… Lecture notes & other course materials at robots-that-learn.github.io

Glen Berseth (@glenberseth) 's Twitter Profile Photo

I am teaching a class on #FoundationalModels for #robotics and Scaling #DeepRL algorithms. This class expands on last year's class and my generalist robotics policies tutorial and code. I plan to share the lectures and code assignments. Starting with the first lectures below.

I am teaching a class on #FoundationalModels for #robotics and Scaling #DeepRL algorithms. This class expands on last year's class and my generalist robotics policies tutorial and code. I plan to share the lectures and code assignments. Starting with the first lectures below.
Sasha Rush (@srush_nlp) 's Twitter Profile Photo

Got talked into giving a DeepSeek talk this afternoon simons.berkeley.edu/workshops/llms… Not sure I have anything new to say here! But good excuse for me to read all the blogs.

Got talked into giving a DeepSeek talk this afternoon simons.berkeley.edu/workshops/llms…

Not sure I have anything new to say here! But good excuse for me  to read all the blogs.
Soheil Feizi (@feizisoheil) 's Twitter Profile Photo

Re-sharing my 3.5 hr lecture on large language models as some might be interested. Will post updated lectures/materials/news in coming weeks. Stay tuned! Link: youtu.be/2yjzZfDQxy8 Topics: 0:00 Basics of language models 2:30 Word2vec 16:27 Transfer Learning 19:23 BERT

François Fleuret (@francoisfleuret) 's Twitter Profile Photo

As expected, that was popular. Here is my attempt at consolidating all the answers into a list. - Prenorm: normalization in the residual blocks before the attention operation and the FFN respectively - GQA (Group Query Attention): more Q than (K, V)

Rui Li (@leedaray) 's Twitter Profile Photo

🚀 Details of the #CVPR2025 award candidate papers are out. 14 of 2967 accepted papers made the list, spanning 3D vision, embodied AI, VLMs/MLLMs, learning systems, and scene understanding. 3D vision leads with the most entries. I collected the TL;DR, paper, and project links👇

Lilian Weng (@lilianweng) 's Twitter Profile Photo

Giving your models more time to think before prediction, like via smart decoding, chain-of-thoughts reasoning, latent thoughts, etc, turns out to be quite effective for unblocking the next level of intelligence. New post is here :) “Why we think”: lilianweng.github.io/posts/2025-05-…