Hamza Harkous (@hamzaharkous) Twitter Tweets • TwiCopy

Hamza Harkous

@hamzaharkous

+ Follow

Research Scientist at @Google, focusing on synthetic data generation. Previously at Amazon & EPFL.

ID: 306785297

linkhttps://hamzaharkous.com calendar_today28-05-2011 12:57:43

704 Tweet

602 Takipçi

774 Takip Edilen

Hamza Harkous

@hamzaharkous

4 years ago

ML-powered code completion at Google has been a significant productivity boost for me: - Guesses (almost) all python type hints and TypeScript types. - Autocompletes (+comments) protocol buffers. - Reduces copy-paste in cases of similar patterns spanning multiple lines.

thumb_up_off_alt18

chat_bubble_outline2

repeat2

shareShare

Adept

@adeptailabs

4 years ago

1/7 We built a new model! It’s called Action Transformer (ACT-1) and we taught it to use a bunch of software tools. In this first video, the user simply types a high-level request and ACT-1 does the rest. Read on to see more examples ⬇️

thumb_up_off_alt4,4K

chat_bubble_outline127

repeat899

shareShare

Alhussein Fawzi

@alhusseinfawzi

3 years ago

Excited to share our paper in nature: We revisit the 50+ year-old maths problem with AI: how efficiently can we multiply two matrices? Surprisingly, the answer is still not known - even for 3x3 matrices! With AI, we discover many new efficient and exact algorithms. 1/13

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat182

shareShare

John Schulman

@johnschulman2

3 years ago

Certain software skills are exceptionally useful for machine learning. In a previous era, it was GPU programming. Now in the era of pretrained models, it's front-end development -- to quickly whip up a UI to collect a fine-tuning or eval dataset.

thumb_up_off_alt1,1K

chat_bubble_outline46

repeat166

shareShare

Hamza Harkous

@hamzaharkous

3 years ago

I usually refer to it as treating your data as a Tamagotchi (digital pet) that needs continuous nurturing. This "data engine" is missing from the vast majority of pipelines in the industry nowadays (minus a few big teams at a few big companies).

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jan Leike

@janleike

3 years ago

Constitutional AI doesn't let you avoid labeling data by writing down some rules. You still need to figure out how good your rules are. So you need to label a validation set. Then you'll get some accuracy on the validation set. How can you increase this accuracy?

thumb_up_off_alt57

chat_bubble_outline2

repeat3

shareShare

Jimmy Lin

@lintool

3 years ago

GPT-4 and its ilk are awesome for rapid prototyping and one-offs, but at the end of the day, enterprises will deploy far smaller distilled models in production. Here's my contrarian take -

thumb_up_off_alt721

chat_bubble_outline28

repeat150

shareShare

Tim Davidson @ICLR25

@im_td

a year ago

synthetic data is the future. but — generating explainable and controllable synth data at Google-scale is **really** hard. stop by Peridot 202-203 at 1130-1230 today to chat w/ Hamza Harkous and me about “Orchestrating Synthetic Data w/ Reasoning”! 🦾🧠 openreview.net/pdf?id=VOoeogZ…

thumb_up_off_alt446

chat_bubble_outline3

repeat86

shareShare