Marija Stanojevic (@mstanojevic118) Twitter Tweets • TwiCopy

Soumith Chintala

2 years ago

Here's details on Meta's 24k H100 Cluster Pods that we use for Llama3 training. * Network: two versions RoCEv2 or Infiniband. * Llama3 trains on RoCEv2 * Storage: NFS/FUSE based on Tectonic/Hammerspace * Stock PyTorch: no real modifications that aren't upstreamed * NCCL with

thumb_up_off_alt1,1K

chat_bubble_outline91

repeat193

shareShare

EMNLP 2025

@emnlpmeeting

2 years ago

EMNLP 2024 invites the submission of long and short papers featuring substantial, original, and unpublished research on empirical methods for Natural Language Processing. More info at: 2024.emnlp.org/calls/main_con… #EMNLP2024

thumb_up_off_alt150

chat_bubble_outline0

repeat48

shareShare

ML4CMH Workshop @ AAAI Conference 2024

@ml4cmh

2 years ago

The proceedings of our workshop are available here: ceur-ws.org/Vol-3649/!

thumb_up_off_alt2

chat_bubble_outline0

repeat2

shareShare

Sasha Rush

@srush_nlp

2 years ago

Dimitris Papailiopoulos I'm biased but I think this paper is pretty cool too arxiv.org/abs/2311.13647 (at ICLR this year)

thumb_up_off_alt69

chat_bubble_outline2

repeat8

shareShare

Jim Fan

@drjimfan

2 years ago

We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple. MM1 is a treasure trove of analysis. They discuss

thumb_up_off_alt4,4K

chat_bubble_outline55

repeat732

shareShare

Marija Stanojevic

@mstanojevic118

2 years ago

Why is this done only up to the age of 84 when that's an average living time in some countries? It would also be interesting to see how it compares with data from other countries

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

clem 🤗

@clementdelangue

2 years ago

👀👀👀 huggingface.co/xai-org/grok-1

thumb_up_off_alt325

chat_bubble_outline12

repeat31

shareShare

Jim Fan

@drjimfan

2 years ago

Today is the beginning of our moonshot to solve embodied AGI in the physical world. I’m so excited to announce Project GR00T, our new initiative to create a general-purpose foundation model for humanoid robot learning. The GR00T model will enable a robot to understand multimodal

thumb_up_off_alt5,5K

chat_bubble_outline219

repeat1,1K

shareShare

Marija Stanojevic

@mstanojevic118

2 years ago

While everyone is talking about NVIDIA Blackwell GPU, I am equally impressed by speed and quality of AI software development. With a support for all kinds of tasks and data, including preprocessing, training, finetuning, and postprocessing they make it easy for anyone to add ML.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Sasha Rush

@srush_nlp

2 years ago

If you know Torch, I think you can code for GPU now with OpenAI's Triton language. We made some puzzles to help you rewire your brain. Starts easy, but gets quickly to fun modern models like FlashAttention and GPT-Q. Good luck! github.com/srush/Triton-P…

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat198

shareShare

Chelsea Finn

@chelseabfinn

2 years ago

Introducing a new, fully open robotics dataset! - 76k episodes - 564 unique scenes - 100 contributors - 13 labs/institutions - 3 continents droid-dataset.github.io A short 🧵 on the backstory

thumb_up_off_alt894

chat_bubble_outline15

repeat148

shareShare

Marija Stanojevic

@mstanojevic118

2 years ago

Great resource!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jim Fan

@drjimfan

2 years ago

Foundation Agent: a roadmap to build generally capable embodied AI that acts skillfully across many worlds, virtual or real. Project GR00T, the Humanoid robot foundation model, is a cornerstone for Foundation Agent. It's the North Star, the next grand challenge in our quest for

thumb_up_off_alt1,1K

chat_bubble_outline61

repeat259

shareShare

Marija Stanojevic

@mstanojevic118

2 years ago

Insane! So exciting!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Kyunghyun Cho

@kchonyc

2 years ago

a tiny bit of a cat is out now; we train our own large (medium) sized LM on our own proprietary data from scratch ourselves at Prescient Design and Genentech . very easy in my opinion, and Keunwoo Choi hates it whenever i say this 😂

thumb_up_off_alt83

chat_bubble_outline0

repeat6

shareShare

Marija Stanojevic

@mstanojevic118

2 years ago

When can I buy this for my husband? 😂

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

AI21 Labs

@ai21labs

2 years ago

Introducing Jamba, our groundbreaking SSM-Transformer open model! As the first production-grade model based on Mamba architecture, Jamba achieves an unprecedented 3X throughput and fits 140K context on a single GPU. 🥂Meet Jamba ai21.com/jamba 🔨Build on Hugging Face

thumb_up_off_alt1,1K

chat_bubble_outline37

repeat246

shareShare

Marija Stanojevic

@mstanojevic118

2 years ago

While companies are trying to find the talent they don't have, the interviews are mostly testing the alignment of candidate's knowledge with the talent they already have.

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

jack morris

@jxmnop

2 years ago

New Research: a lot of talk today about "what happens" inside a language model, since they spend the exact same amount of compute on each token, regardless of difficulty. we touch on this question on our new theory paper, Do Language Models Plan for Future Tokens?

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat150

shareShare

Marija Stanojevic

@mstanojevic118

a year ago

I truly enjoyed giving this lecture on Machine Unlearning and was positively surprised by the interest of the audience! Hope to do it again next year!

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare