Samip Dahal (@samipddd) 's Twitter Profile
Samip Dahal

@samipddd

working on the theory of general intelligence.

ID: 740409191291097089

calendar_today08-06-2016 05:04:42

157 Tweet

722 Followers

97 Following

Samip Dahal (@samipddd) 's Twitter Profile Photo

the interesting thing about turing machines is that despite being crazy powerful, the first thing we knew about them was what they couldn't do in theory(all the non-computable things) and not what they could do

Samip Dahal (@samipddd) 's Twitter Profile Photo

physics people have mostly stopped asking the most fundamental questions(what the universe actually is, how it exists, role of minds/observers like us). the community as a whole seems to have convinced itself that those answers are out of reach.

Samip Dahal (@samipddd) 's Twitter Profile Photo

all the physics that have been discovered so far are properties of the dream universe that our minds create. not of the underlying reality that allow the existence of such minds at all in the first place.

Samip Dahal (@samipddd) 's Twitter Profile Photo

flexing parameter count in LLMs is going to look pretty stupid in retrospect. better models have **less** parameters, not more. you should instead be flexing how much compute your model is able to spend during inference time. current models cannot spend much time on inference.

Samip Dahal (@samipddd) 's Twitter Profile Photo

Energy-based models will replace all generative models(including diffusion): - Learns verifiers, regardless of domains - Natural way to do test-time compute - Insanely underresearched

Samip Dahal (@samipddd) 's Twitter Profile Photo

Optimization people seem to think that more compute means second-order methods will win, and they are completely wrong. It actually means *zeroth-order* methods will win. That's what scales with compute. The hierarchy runs in reverse.

Samip Dahal (@samipddd) 's Twitter Profile Photo

evolution will completely replace gradient descent in the next 1-2 years: - it can optimize for what we actually want, not just differentiable proxies. including generalization - unlike backprop, search scales with compute - early results show it's computationally feasible

Samip Dahal (@samipddd) 's Twitter Profile Photo

Ilya's a great marketer but a pretty average researcher. He's barely come to terms with the need to solve generalization, and his best idea is... value functions? Doesn't seem like he has thought much about it.