George Grigorev (@iamgrigorev) 's Twitter Profile
George Grigorev

@iamgrigorev

fine-tuning, evals at @togethercompute, rare specialty coffee lover

ID: 610717197

calendar_today17-06-2012 08:40:34

7,7K Tweet

1,1K Followers

854 Following

George Grigorev (@iamgrigorev) 's Twitter Profile Photo

it's interesting that 80% of people are kind of locked in 5 apps on their phone: maps X youtube messenger mail. I wonder if it would be possible to optimize this form factor, such as we see transformer architecture embedded onto a chip (Groq, SambaNova, Etched)

Andre Saraiva (@andresnds) 's Twitter Profile Photo

1/N Yesterday in Tokyo we OpenAI ran a 10‑hour live Humans vs AI exhibition at the AtCoder World Tour Finals Heuristic. We pointed an OpenAI reasoning model at the same brutal problem the finalists tackled—no human help, same rules, same clock. Buckle up. 👇

1/N Yesterday in Tokyo we <a href="/OpenAI/">OpenAI</a> ran a 10‑hour live Humans vs AI exhibition at the AtCoder World Tour Finals Heuristic. We pointed an OpenAI reasoning model at the same brutal problem the finalists tackled—no human help, same rules, same clock. Buckle up. 👇
George Grigorev (@iamgrigorev) 's Twitter Profile Photo

I’m sure there has been some significant progress in humanoid robots already (in China and in the US) but there’s no leader in the field. And probably currently there is a lot of secrecy. When such one successful product appears in the market (aka chatgpt moment) even if fully

Tilde (@tilderesearch) 's Twitter Profile Photo

Mixture‑of‑Experts (MoE) powers many frontier models like R1, K2, & Qwen3 ⚡️ To make frontier-scale MoE models accessible to train, we open-source MoMoE, a hyper-performant MoE implementation built for training and inference, outpacing the fastest existing ones by up to: - 70%

Mixture‑of‑Experts (MoE) powers many frontier models like R1, K2, &amp; Qwen3

⚡️ To make frontier-scale MoE models accessible to train, we open-source MoMoE, a hyper-performant MoE implementation built for training and inference, outpacing the fastest existing ones by up to:

- 70%
George Grigorev (@iamgrigorev) 's Twitter Profile Photo

Some interesting insights about BPE tokenization during inference – especially if you’re trying to reuse training-time logic. * 1. We have pre-defined set of merges and we just want to apply them in order to a set of pre-tokens. 2. Sequence of pre-tokens is no longer represented

Jiayi Weng (@trinkle23897) 's Twitter Profile Photo

Harmony format is finally open-sourced. I still remember 3 years ago (before ChatGPT release) Shengjia Zhao, Daniel and I were brainstorming about the right abstraction for RL training, and that is the start point of the entire harmony library. github.com/openai/harmony

George Grigorev (@iamgrigorev) 's Twitter Profile Photo

interesting way to make open-source model safe - nerf pre-training data and check during RL training to answer unsafe prompts + unsafe data. If quality still lower than already released o3 - you're good to go. It's also a cool marketing trick -- they say that even after

interesting way to make open-source model safe - nerf pre-training data and check during RL training to answer unsafe prompts + unsafe data. If quality still lower than already released o3 - you're good to go.

It's also a cool marketing trick -- they say that even after
George Grigorev (@iamgrigorev) 's Twitter Profile Photo

OpenAI doesn't disclose amount of data used nor amount of gpus, but we could estimate! We know that they used 2.1M H100 hours. Considering that sama told that they re-trained gpt-oss at least once since it didn't suffice their needs, I would expect the training time took 15-30

OpenAI doesn't disclose amount of data used nor amount of gpus, but we could estimate! We know that they used 2.1M H100 hours.
Considering that sama told that they re-trained gpt-oss at least once since it didn't suffice their needs, I would expect the training time took 15-30