Dave(@dmvaldman) 's Twitter Profileg
Dave

@dmvaldman

weak supervisor

ID:49700785

linkhttps://dmvaldman.github.io calendar_today22-06-2009 17:38:36

6,3K Tweets

6,7K Followers

989 Following

Follow People
Dave(@dmvaldman) 's Twitter Profile Photo

Great paper, arguing emergent abilities are only a function of pre training loss and not model/dataset size.

ie, if you (inefficiently) overtrain a small model to the loss of GPT4, you'd get all the abilities of GPT4.

arxiv.org/abs/2403.15796

account_circle
Dave(@dmvaldman) 's Twitter Profile Photo

Fun thought experiment: what if the input into Sora wasn't text, but the motion sensor data of a robot.

It turns its head, and the scene rotates. It lifts its arm, and a hand comes into view, etc. Doesn't need eyes.

account_circle
Dave(@dmvaldman) 's Twitter Profile Photo

If the outputs are the same, but the means are different, Yann would be so much happier.

Too bad no one else would care.

account_circle
Dave(@dmvaldman) 's Twitter Profile Photo

I was so worried the big AI labs were no longer publishing their research and I'd be left behind.

But it turns out it's all still train big models on lots of data.

account_circle
Dave(@dmvaldman) 's Twitter Profile Photo

An interesting AI math question: can you generate text with higher entropy than human text with an LLM? I'm looking at you 'Backdoors of Claude' people.

If so, how can a compression machine also be a decompression machine?

account_circle
Dave(@dmvaldman) 's Twitter Profile Photo

When I read something that changes my mind, I find it hard to believe that this was caused by a change in the strengths of my neurons. Am I wrong?

account_circle
Dave(@dmvaldman) 's Twitter Profile Photo

This paper is an implementation of self-awareness masquerading as 'making quadratic attention more efficient'.

This paper is an implementation of self-awareness masquerading as 'making quadratic attention more efficient'.
account_circle
Dave(@dmvaldman) 's Twitter Profile Photo

300,000 years ago System 2 came out from System 1. But suddenly, a few years ago and to everyone's shock, System 1 came out of System 2! Now, there's a rush to build System 2 again. And then, sometime in the future, it will build a new System 1, on some distant planet, probably.

account_circle
Dave(@dmvaldman) 's Twitter Profile Photo

Viscerally feeling that making a clean dataset for training AI is itself an AI problem. So many edge cases!

account_circle