Hamza Harkous (@hamzaharkous) 's Twitter Profile
Hamza Harkous

@hamzaharkous

Research Scientist at @Google, focusing on synthetic data generation. Previously at Amazon & EPFL.

ID: 306785297

linkhttps://hamzaharkous.com calendar_today28-05-2011 12:57:43

704 Tweet

602 Takipçi

774 Takip Edilen

Hamza Harkous (@hamzaharkous) 's Twitter Profile Photo

ML-powered code completion at Google has been a significant productivity boost for me: - Guesses (almost) all python type hints and TypeScript types. - Autocompletes (+comments) protocol buffers. - Reduces copy-paste in cases of similar patterns spanning multiple lines.

Adept (@adeptailabs) 's Twitter Profile Photo

1/7 We built a new model! It’s called Action Transformer (ACT-1) and we taught it to use a bunch of software tools. In this first video, the user simply types a high-level request and ACT-1 does the rest. Read on to see more examples ⬇️

Alhussein Fawzi (@alhusseinfawzi) 's Twitter Profile Photo

Excited to share our paper in nature: We revisit the 50+ year-old maths problem with AI: how efficiently can we multiply two matrices? Surprisingly, the answer is still not known - even for 3x3 matrices! With AI, we discover many new efficient and exact algorithms. 1/13

John Schulman (@johnschulman2) 's Twitter Profile Photo

Certain software skills are exceptionally useful for machine learning. In a previous era, it was GPU programming. Now in the era of pretrained models, it's front-end development -- to quickly whip up a UI to collect a fine-tuning or eval dataset.

Hamza Harkous (@hamzaharkous) 's Twitter Profile Photo

I usually refer to it as treating your data as a Tamagotchi (digital pet) that needs continuous nurturing. This "data engine" is missing from the vast majority of pipelines in the industry nowadays (minus a few big teams at a few big companies).

Jan Leike (@janleike) 's Twitter Profile Photo

Constitutional AI doesn't let you avoid labeling data by writing down some rules. You still need to figure out how good your rules are. So you need to label a validation set. Then you'll get some accuracy on the validation set. How can you increase this accuracy?

Jimmy Lin (@lintool) 's Twitter Profile Photo

GPT-4 and its ilk are awesome for rapid prototyping and one-offs, but at the end of the day, enterprises will deploy far smaller distilled models in production. Here's my contrarian take -

Tim Davidson @ICLR25 (@im_td) 's Twitter Profile Photo

synthetic data is the future. but — generating explainable and controllable synth data at Google-scale is **really** hard. stop by Peridot 202-203 at 1130-1230 today to chat w/ Hamza Harkous and me about “Orchestrating Synthetic Data w/ Reasoning”! 🦾🧠 openreview.net/pdf?id=VOoeogZ…

synthetic data is the future. but — generating explainable and controllable synth data at Google-scale is **really** hard.

stop by Peridot 202-203 at 1130-1230 today to chat w/ <a href="/hamzaharkous/">Hamza Harkous</a> and me about “Orchestrating Synthetic Data w/ Reasoning”! 🦾🧠

openreview.net/pdf?id=VOoeogZ…