Marija Stanojevic (@mstanojevic118) 's Twitter Profile
Marija Stanojevic

@mstanojevic118

ML, NLP, ML4Health, MultiModality, and STEM geek. Travel enthusiast. [email protected]

ID: 150602726

linkhttps://marija-stanojevic.github.io/ calendar_today01-06-2010 10:18:23

305 Tweet

287 Takipçi

110 Takip Edilen

Soumith Chintala (@soumithchintala) 's Twitter Profile Photo

Here's details on Meta's 24k H100 Cluster Pods that we use for Llama3 training. * Network: two versions RoCEv2 or Infiniband. * Llama3 trains on RoCEv2 * Storage: NFS/FUSE based on Tectonic/Hammerspace * Stock PyTorch: no real modifications that aren't upstreamed * NCCL with

EMNLP 2025 (@emnlpmeeting) 's Twitter Profile Photo

EMNLP 2024 invites the submission of long and short papers featuring substantial, original, and unpublished research on empirical methods for Natural Language Processing. More info at: 2024.emnlp.org/calls/main_con… #EMNLP2024

Jim Fan (@drjimfan) 's Twitter Profile Photo

We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple. MM1 is a treasure trove of analysis. They discuss

We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple.

MM1 is a treasure trove of analysis. They discuss
Marija Stanojevic (@mstanojevic118) 's Twitter Profile Photo

Why is this done only up to the age of 84 when that's an average living time in some countries? It would also be interesting to see how it compares with data from other countries

Jim Fan (@drjimfan) 's Twitter Profile Photo

Today is the beginning of our moonshot to solve embodied AGI in the physical world. I’m so excited to announce Project GR00T, our new initiative to create a general-purpose foundation model for humanoid robot learning. The GR00T model will enable a robot to understand multimodal

Marija Stanojevic (@mstanojevic118) 's Twitter Profile Photo

While everyone is talking about NVIDIA Blackwell GPU, I am equally impressed by speed and quality of AI software development. With a support for all kinds of tasks and data, including preprocessing, training, finetuning, and postprocessing they make it easy for anyone to add ML.

While everyone is talking about NVIDIA Blackwell GPU, I am equally impressed by speed and quality of AI software development. With a support for all kinds of tasks and data, including preprocessing, training, finetuning, and postprocessing they make it easy for anyone to add ML.
Sasha Rush (@srush_nlp) 's Twitter Profile Photo

If you know Torch, I think you can code for GPU now with OpenAI's Triton language. We made some puzzles to help you rewire your brain. Starts easy, but gets quickly to fun modern models like FlashAttention and GPT-Q. Good luck! github.com/srush/Triton-P…

Chelsea Finn (@chelseabfinn) 's Twitter Profile Photo

Introducing a new, fully open robotics dataset! - 76k episodes - 564 unique scenes - 100 contributors - 13 labs/institutions - 3 continents droid-dataset.github.io A short 🧵 on the backstory

Jim Fan (@drjimfan) 's Twitter Profile Photo

Foundation Agent: a roadmap to build generally capable embodied AI that acts skillfully across many worlds, virtual or real. Project GR00T, the Humanoid robot foundation model, is a cornerstone for Foundation Agent. It's the North Star, the next grand challenge in our quest for

Kyunghyun Cho (@kchonyc) 's Twitter Profile Photo

a tiny bit of a cat is out now; we train our own large (medium) sized LM on our own proprietary data from scratch ourselves at Prescient Design and Genentech . very easy in my opinion, and Keunwoo Choi hates it whenever i say this 😂

AI21 Labs (@ai21labs) 's Twitter Profile Photo

Introducing Jamba, our groundbreaking SSM-Transformer open model! As the first production-grade model based on Mamba architecture, Jamba achieves an unprecedented 3X throughput and fits 140K context on a single GPU. 🥂Meet Jamba ai21.com/jamba 🔨Build on Hugging Face

Introducing Jamba, our groundbreaking SSM-Transformer open model!

As the first production-grade model based on Mamba architecture, Jamba achieves an unprecedented 3X throughput and fits 140K context on a single GPU.

🥂Meet Jamba ai21.com/jamba

🔨Build on <a href="/huggingface/">Hugging Face</a>
Marija Stanojevic (@mstanojevic118) 's Twitter Profile Photo

While companies are trying to find the talent they don't have, the interviews are mostly testing the alignment of candidate's knowledge with the talent they already have.

jack morris (@jxmnop) 's Twitter Profile Photo

New Research: a lot of talk today about "what happens" inside a language model, since they spend the exact same amount of compute on each token, regardless of difficulty. we touch on this question on our new theory paper, Do Language Models Plan for Future Tokens?

New Research:

a lot of talk today about "what happens" inside a language model, since they spend the exact same amount of compute on each token, regardless of difficulty.

we touch on this question on our new theory paper, Do Language Models Plan for Future Tokens?
Marija Stanojevic (@mstanojevic118) 's Twitter Profile Photo

I truly enjoyed giving this lecture on Machine Unlearning and was positively surprised by the interest of the audience! Hope to do it again next year!