Erik Daxberger (@edaxberger) Twitter Tweets • TwiCopy

HanRong YE

a year ago

And finally, I am on the job market (graduating in Dec)! I worked on multi-modality multi-task understanding (MM-Ego, TaskPrompter, InvPT, TaskExpert, DiffusionMTL), generation (SegGen), and the alignment of understanding and generation (X-VILA). Feel free to reach out!

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Erik Daxberger

@edaxberger

a year ago

Check out MM-Ego, our new work towards building a multimodal foundation model for egocentric understanding! 😎 Led by our amazing intern HanRong YE, together with a great team that I am happy to have been part of!

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Haotian Zhang

@haotianzhang4ai

a year ago

A dedicated work focusing on improving VLM CoT. Worth to carefully read.

thumb_up_off_alt34

chat_bubble_outline1

repeat3

shareShare

Austin Tripp

@austinjtripp

a year ago

I wrote a blog post with some advice for students applying for PhDs in AI for science: austintripp.ca/blog/2024-10-2… Feedback appreciated. If you agree with the advice, feel free to share. If you disagree, let me know what you think I got wrong 🧑‍🔬

thumb_up_off_alt297

chat_bubble_outline10

repeat51

shareShare

Cambridge MLG

@cambridgemlg

a year ago

✨Applications are now open for PhDs at the Cambridge Machine Learning Group!✨ We're looking for outstanding candidates interested in fundamental ML research and applications to scientific domains! More info: mlg.eng.cam.ac.uk/phd_programme_… 🧵Find more about PIs & focus areas below!

thumb_up_off_alt185

chat_bubble_outline1

repeat62

shareShare

Laurence Aitchison

@laurence_ai

a year ago

I wrote a short note on why Bayesian neural networks are falling out-of-use with modern foundation models: because training for a single epoch on large datasets implicitly optimises the same objective as Bayes: laurencea.github.io/laurencea/dont…

thumb_up_off_alt187

chat_bubble_outline3

repeat15

shareShare

Enrico Fini

@donkeyshot21

a year ago

We release AIMv2, the second iteration of the AIM family of large autoregressive vision encoders. This time we bring multimodality into the game 🔥 Paper: arxiv.org/abs/2411.14402 Repo: github.com/apple/ml-aim Model Gallery: huggingface.co/collections/ap…

thumb_up_off_alt168

chat_bubble_outline6

repeat35

shareShare

Hadi Pouransari

@hpouransari

a year ago

What matters for runtime optimization in Vision Language Models (VLMs)? Vision encoder latency 🤔? Image resolution 🤔? Number of visual tokens 🤔? LLM size 🤔? In this thread, we break it all down and introduce FastVLM — a family of fast and accurate VLMs. (1/n 🧵)

thumb_up_off_alt77

chat_bubble_outline2

repeat21

shareShare

HanRong YE

@leoyerrrr

a year ago

It's official! Our foundational model for egocentric video understanding will be showcased at #ICLR2025 in Singapore! 😃 Huge thanks to my incredible mentors at Apple 💕❤️‍🔥🤗

thumb_up_off_alt20

chat_bubble_outline0

repeat4

shareShare

Afshin Dehghan

@afshin_dn

10 months ago

🚀 Model and data for our CubifyAnything project are now released! 🔗 github.com/apple/ml-cubif… #SpatialReasoning #3DObjectDetection #transformers #detection #ai #genai

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

HanRong YE

@leoyerrrr

10 months ago

#ICLR2025 The philosophy behind O3 is beautifully simple and powerful! Our long-video MM-Ego VLM actually also includes an implicit VIDEO REASONING process 🤣: it first takes a quick overview of the entire long video, then ZOOMS IN on key visual details with question in mind🧐

thumb_up_off_alt26

chat_bubble_outline4

repeat8

shareShare