Samip Dahal (@samipddd) Twitter Tweets • TwiCopy

Samip Dahal

@samipddd

3 years ago

autoregressive LLMs are just not it if we're talking about superintelligence. time to move on

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

the interesting thing about turing machines is that despite being crazy powerful, the first thing we knew about them was what they couldn't do in theory(all the non-computable things) and not what they could do

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Samip Dahal

@samipddd

2 years ago

the universe actually exists. how fucking amazing is that.

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Samip Dahal

@samipddd

2 years ago

physics people have mostly stopped asking the most fundamental questions(what the universe actually is, how it exists, role of minds/observers like us). the community as a whole seems to have convinced itself that those answers are out of reach.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Samip Dahal

@samipddd

2 years ago

all the physics that have been discovered so far are properties of the dream universe that our minds create. not of the underlying reality that allow the existence of such minds at all in the first place.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Samip Dahal

@samipddd

2 years ago

flexing parameter count in LLMs is going to look pretty stupid in retrospect. better models have **less** parameters, not more. you should instead be flexing how much compute your model is able to spend during inference time. current models cannot spend much time on inference.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Samip Dahal

@samipddd

6 months ago

Energy-based models will replace all generative models(including diffusion): - Learns verifiers, regardless of domains - Natural way to do test-time compute - Insanely underresearched

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Samip Dahal

@samipddd

6 months ago

Energy-based models consistently show better OOD generalization.

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Samip Dahal

@samipddd

6 months ago

Optimization people seem to think that more compute means second-order methods will win, and they are completely wrong. It actually means *zeroth-order* methods will win. That's what scales with compute. The hierarchy runs in reverse.

thumb_up_off_alt4

chat_bubble_outline2

repeat0

shareShare

Samip Dahal

@samipddd

6 months ago

turns out there is a lot of free lunch left it's crazy

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Samip Dahal

@samipddd

5 months ago

evolution will completely replace gradient descent in the next 1-2 years: - it can optimize for what we actually want, not just differentiable proxies. including generalization - unlike backprop, search scales with compute - early results show it's computationally feasible

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Samip Dahal

@samipddd

5 months ago

Ilya's a great marketer but a pretty average researcher. He's barely come to terms with the need to solve generalization, and his best idea is... value functions? Doesn't seem like he has thought much about it.

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare