Erik Daxberger (@edaxberger) 's Twitter Profile
Erik Daxberger

@edaxberger

ML @ 

ID: 810166630412066817

calendar_today17-12-2016 16:55:51

827 Tweet

1,1K Takipçi

387 Takip Edilen

HanRong YE (@leoyerrrr) 's Twitter Profile Photo

And finally, I am on the job market (graduating in Dec)! I worked on multi-modality multi-task understanding (MM-Ego, TaskPrompter, InvPT, TaskExpert, DiffusionMTL), generation (SegGen), and the alignment of understanding and generation (X-VILA). Feel free to reach out!

Erik Daxberger (@edaxberger) 's Twitter Profile Photo

Check out MM-Ego, our new work towards building a multimodal foundation model for egocentric understanding! 😎 Led by our amazing intern HanRong YE, together with a great team that I am happy to have been part of!

Austin Tripp (@austinjtripp) 's Twitter Profile Photo

I wrote a blog post with some advice for students applying for PhDs in AI for science: austintripp.ca/blog/2024-10-2… Feedback appreciated. If you agree with the advice, feel free to share. If you disagree, let me know what you think I got wrong 🧑‍🔬

Cambridge MLG (@cambridgemlg) 's Twitter Profile Photo

✨Applications are now open for PhDs at the Cambridge Machine Learning Group!✨ We're looking for outstanding candidates interested in fundamental ML research and applications to scientific domains! More info: mlg.eng.cam.ac.uk/phd_programme_… 🧵Find more about PIs & focus areas below!

Laurence Aitchison (@laurence_ai) 's Twitter Profile Photo

I wrote a short note on why Bayesian neural networks are falling out-of-use with modern foundation models: because training for a single epoch on large datasets implicitly optimises the same objective as Bayes: laurencea.github.io/laurencea/dont…

Enrico Fini (@donkeyshot21) 's Twitter Profile Photo

We release AIMv2, the second iteration of the AIM family of large autoregressive vision encoders. This time we bring multimodality into the game 🔥 Paper: arxiv.org/abs/2411.14402 Repo: github.com/apple/ml-aim Model Gallery: huggingface.co/collections/ap…

We release AIMv2, the second iteration of the AIM family of large autoregressive vision encoders. This time we bring multimodality into the game 🔥

Paper: arxiv.org/abs/2411.14402
Repo: github.com/apple/ml-aim
Model Gallery: huggingface.co/collections/ap…
Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

What matters for runtime optimization in Vision Language Models (VLMs)? Vision encoder latency 🤔? Image resolution 🤔? Number of visual tokens 🤔? LLM size 🤔? In this thread, we break it all down and introduce FastVLM — a family of fast and accurate VLMs. (1/n 🧵)

What matters for runtime optimization in Vision Language Models (VLMs)? Vision encoder latency 🤔? Image resolution 🤔? Number of visual tokens 🤔? LLM size 🤔?

In this thread, we break it all down and introduce FastVLM — a family of fast and accurate VLMs.

(1/n 🧵)
HanRong YE (@leoyerrrr) 's Twitter Profile Photo

It's official! Our foundational model for egocentric video understanding will be showcased at #ICLR2025 in Singapore! 😃 Huge thanks to my incredible mentors at Apple 💕❤️‍🔥🤗

Afshin Dehghan (@afshin_dn) 's Twitter Profile Photo

🚀 Model and data for our CubifyAnything project are now released! 🔗 github.com/apple/ml-cubif… #SpatialReasoning #3DObjectDetection #transformers #detection #ai #genai

HanRong YE (@leoyerrrr) 's Twitter Profile Photo

#ICLR2025 The philosophy behind O3 is beautifully simple and powerful! Our long-video MM-Ego VLM actually also includes an implicit VIDEO REASONING process 🤣: it first takes a quick overview of the entire long video, then ZOOMS IN on key visual details with question in mind🧐

#ICLR2025 The philosophy behind O3 is beautifully simple and powerful! Our long-video MM-Ego VLM actually also includes an implicit VIDEO REASONING process 🤣: it first takes a quick overview of the entire long video, then ZOOMS IN on key visual details with question in mind🧐