Tolga Bolukbasi (@tolgab0) 's Twitter Profile
Tolga Bolukbasi

@tolgab0

AI research/Gemini pretraining @GoogleDeepmind, PhD, opinions my own.

ID: 2886508144

linkhttp://www.tolgabolukbasi.com calendar_today21-11-2014 03:37:31

92 Tweet

296 Followers

239 Following

Tim RocktƤschel (@_rockt) 's Twitter Profile Photo

I am really excited to reveal what Google DeepMind's Open Endedness Team has been up to šŸš€. We introduce Genie šŸ§ž, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts.

Kelvin Guu (@kelvin_guu) 's Twitter Profile Photo

Great new work from our team and colleagues at Google DeepMind! On the Massive Text Embedding Benchmark (MTEB), Gecko is the strongest model to fit under 768-dim. Try it on Google Cloud. Use it for RAG, retrieval, vector databases, etc.

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Nice new read on tokenization! You've heard about the SolidGoldMagikarp token, which breaks GPT-2 because it was present in the training set of the Tokenizer, but not the LLM later. This paper digs in in a lot more depth and detail, on a lot more models, discovering a less

Tolga Bolukbasi (@tolgab0) 's Twitter Profile Photo

It was great to work with Minsuk and excited to see this released. Looking at individual model outputs this way helps one see which examples/tasks are truly wins across model versions and which ones are just due to randomness of generation or raters.

Jeff Dean (@jeffdean) 's Twitter Profile Photo

We have an experimental updated version of Gemini 1.5 Pro that is #1 on the LMSYS Org Chatbot Arena. This model is a significant improvement over earlier versions of Gemini 1.5 Pro (it cracks into 1300+ elo score territory). I'm really proud of the whole team of people that

Tolga Bolukbasi (@tolgab0) 's Twitter Profile Photo

I have been thinking about this since ChatGPT came out. Using RLHF never fully made sense to me given how restricted it is compared to regular RL. There should be a way simpler non-exploring method to distill RM knowledge into the main model.

Jeff Dean (@jeffdean) 's Twitter Profile Photo

Welcome, AlphaChip! Today, we are sharing some exciting updates on our work published in nature in 2021 on using reinforcement learning for ASIC chip floorplanning and layout. We’re also naming this work AlphaChip. Since we first published this work, our use of this approach

Welcome, AlphaChip!

Today, we are sharing some exciting updates on our work published in <a href="/Nature/">nature</a> in 2021 on using reinforcement learning for ASIC chip floorplanning and layout.  We’re also naming this work AlphaChip.

Since we first published this work, our use of this approach
Andrew Ilyas (@andrew_ilyas) 's Twitter Profile Photo

Machine unlearning ("removing" training data from a trained ML model) is a hard, important problem. Datamodel Matching (DMM): a new unlearning paradigm with strong empirical performance! w/ Kristian Georgiev Roy Rinberg Sam Park Shivam Garg Aleksander Madry Seth Neel (1/4)

Tolga Bolukbasi (@tolgab0) 's Twitter Profile Photo

I will be in ATTRIB workshop tomorrow (attrib-workshop.cc). Stop by if you’d like to chat with me and connect with other great researchers in this area.

Mike Morton (@morteymike) 's Twitter Profile Photo

andi (twocents.money) I worked on the M series while at Apple. The main advantage that stuck out to me was actually that they were able to acquire dozens of top Intel engineers 5-10 years ago as Intel started struggling and making poor decisions. For example, Intel had a couple sites around the

Susan Zhang (@suchenzang) 's Twitter Profile Photo

arxiv.org/abs/2411.03923 "From FigureĀ 3(a), it is apparent that many of the benchmarks we considered are substantially contaminated in the Llama 1 pre-training corpus as well as in the Pile. For 8 of the 13 datasets that we considered, on average more than 50% of the samples are

arxiv.org/abs/2411.03923

"From FigureĀ 3(a), it is apparent that many of the benchmarks we considered are substantially contaminated in the Llama 1 pre-training corpus as well as in the Pile. For 8 of the 13 datasets that we considered, on average more than 50% of the samples are
Noam Shazeer (@noamshazeer) 's Twitter Profile Photo

This model’s ā€œthinkingā€ capabilities are driving major gains: šŸ§‘ā€šŸ”¬Top performance on math and science benchmarks (AIME, GPQA) šŸ’»Exceptional coding performance (LiveCodeBench) šŸ“ˆImpressive performance on complex prompts (Humanity’s Last Exam) #1 on lmarena.ai (formerly lmsys.org) leaderboard šŸ†

Tyler Chang (@tylerachang) 's Twitter Profile Photo

Presenting our work on training data attribution for pretraining this morning: iclr.cc/virtual/2025/p… -- come stop by in Hall 2/3 #526 if you're here at ICLR!

Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

Our latest Gemini 2.5 Pro update is now in preview. It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads lmarena.ai with a 24pt Elo score jump since the previous version. We also

Our latest Gemini 2.5 Pro update is now in preview.

It’s better at coding, reasoning, science + math, shows improved performance across key benchmarks (AIDER Polyglot, GPQA, HLE to name a few), and leads <a href="/lmarena_ai/">lmarena.ai</a> with a 24pt Elo score jump since the previous version.

We also