Naman Goyal (@namangoyal21) 's Twitter Profile
Naman Goyal

@namangoyal21

Research @thinkymachines, previously pretraining LLAMA at GenAI Meta

ID: 941156280

calendar_today11-11-2012 12:10:24

197 Tweet

1,1K Takipçi

591 Takip Edilen

Aran Komatsuzaki (@arankomatsuzaki) 's Twitter Profile Photo

The False Promise of Imitating Proprietary LLMs Open-sourced LLMs are adept at mimicking ChatGPT’s style but not its factuality. There exists a substantial capabilities gap, which requires better base LM. arxiv.org/abs/2305.15717

The False Promise of Imitating Proprietary LLMs

Open-sourced LLMs are adept at mimicking ChatGPT’s style but not its factuality. There exists a substantial capabilities gap, which requires better base LM.

arxiv.org/abs/2305.15717
Armen Aghajanyan (@armenagha) 's Twitter Profile Photo

I’m excited to release our most recent work setting a new SOTA FID of 4.88 on text-to-image generation we call CM3Leon (pronounced chameleon)! ai.meta.com/research/publi…

I’m excited to release our most recent work setting a new SOTA FID of 4.88 on text-to-image generation we call CM3Leon (pronounced chameleon)! ai.meta.com/research/publi…
Naman Goyal (@namangoyal21) 's Twitter Profile Photo

Finished 30/30 radiation therapy sessions today. Past 3-4 months have been one of the most challenging part of my life. Recovery from surgery and radiation therapy was quite physically and mentally challenging. With due respect, Cancer, please stay from me from now on.

Mannat Singh (@mannat_singh) 's Twitter Profile Photo

Excited to share Emu Video, for high quality video generation! Our factorized {text}-to-image generation followed by {image, text}-to-video generation approach outperforms all prior work & commercial solutions in human evals. Demo + blog + paper: emu-video.metademolab.com #emuvideo

Mike Lewis (@ml_perception) 's Twitter Profile Photo

Excited to share a preview of Llama3, including the release of an 8B and 70B (82 MMLU, should be the best open weights model!), and preliminary results for a 405B model (still training, but already competitive with GPT4). Lots more still to come... ai.meta.com/blog/meta-llam…

Ahmad Al-Dahle (@ahmad_al_dahle) 's Twitter Profile Photo

It’s here! Meet Llama 3, our latest generation of models that is setting a new standard for state-of-the art performance and efficiency for openly available LLMs. Key highlights • 8B and 70B parameter openly available pre-trained and fine-tuned models. • Trained on more

It’s here! Meet Llama 3, our latest generation of models that is setting a new standard for state-of-the art performance and efficiency for openly available LLMs.

Key highlights

  • 8B and 70B parameter openly available pre-trained and fine-tuned models.
  • Trained on more
Naman Goyal (@namangoyal21) 's Twitter Profile Photo

Really proud of the work that went into making this possible, hope this helps the community push the field forward. Also in case anyone missed it, there's a sneak peak of what to come next at the end of blog post ai.meta.com/blog/meta-llam…

Naman Goyal (@namangoyal21) 's Twitter Profile Photo

Got curious about this. Suggests average case of achieving 1e6 * gpt4 (or 3e31) flops model by 2028. At 2500 bf16 Tflops, 1.2KW of B100, that will require roughly ~456 GW per hour power to train in 6 months. Which afaik, is roughly United States's entire electricity usage in 2023

Naman Goyal (@namangoyal21) 's Twitter Profile Photo

This is extremely exciting, looking forward to the impact it will have on biology. The team behind EvolutionaryScale is one of the most talented and passionate set of people, I have interacted with.

Naman Goyal (@namangoyal21) 's Twitter Profile Photo

Very excited to release the technical report and the model weights for the all 3 sizes of llama3 models. It has been exciting past 12 months. Really looking forward to the incredible research this will unlock from the community. Now on to llama4 🚀

lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Does style matter over substance in Arena? Can models "game" human preference through lengthy and well-formatted responses? Today, we're launching style control in our regression model for Chatbot Arena — our first step in separating the impact of style from substance in

Does style matter over substance in Arena? Can models "game" human preference through lengthy and well-formatted responses?

Today, we're launching style control in our regression model for Chatbot Arena — our first step in separating the impact of style from substance in
Naman Goyal (@namangoyal21) 's Twitter Profile Photo

Congrats amazing friends and ex colleagues on killer release! Pushing the frontier of open source models pushes the field collectively forward!

Vijay (@__tensorcore__) 's Twitter Profile Photo

🚨🔥 CUTLASS 4.0 is released 🔥🚨 pip install nvidia-cutlass-dsl 4.0 marks a major shift for CUTLASS: towards native GPU programming in Python slidehelloworld.png docs.nvidia.com/cutlass/media/…

🚨🔥 CUTLASS 4.0 is released 🔥🚨

pip install nvidia-cutlass-dsl

4.0 marks a major shift for CUTLASS: towards native GPU programming in Python

slidehelloworld.png

docs.nvidia.com/cutlass/media/…
Naman Goyal (@namangoyal21) 's Twitter Profile Photo

The past 4 months have been among the most rewarding of my career—filled with learning and building alongside some of the most talented ML research and infra folks I know. I truly believe magic happens when driven, talented people are aligned on a shared mission.