Simon Storf (@syghmon) Twitter Tweets • TwiCopy

François Chollet

@fchollet

2 months ago

The question of whether LLMs can reason is, in many ways, the wrong question. The more interesting question is whether they are limited to memorization / interpolative retrieval, or whether they can adapt to novelty beyond what they know. (They can't, at least until you start

thumb_up_off_alt1,1K

chat_bubble_outline80

repeat305

Andrej Karpathy

@karpathy

2 months ago

# RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely

thumb_up_off_alt8,8K

chat_bubble_outline387

repeat1,1K

Simon Storf

2 months ago

should I explore or exploit?

Simon Storf

2 months ago

Simon Storf

2 months ago

Life is just a long meditation session, and whenever we're not meditating, we're merely distracted -- until we return to the practice once more.

Simon Storf

2 months ago

How do we explain the superhuman to the human?

Simon Storf

a month ago

But even if intelligence has a ceiling, that limit could be so far beyond us it's practically irrelevant. An intelligent system could surpass us so greatly that any theoretical bound becomes insignificant, with the bottleneck appearing far too late to matter. Am I missing smth?

thumb_up_off_alt3

repeat1

Simon Storf

a month ago

loxe you, oxox

Simon Storf

a month ago

Language is a way to communicate reasoning. Communicating reasoning is not the same as reasoning. Lots of evidence that current LLMs "reasoning" is constrained to their dataset. Many such cases, cope and seethe.

thumb_up_off_alt3

repeat1

Simon Storf

a month ago

vro yappin

Simon Storf

a month ago

exploit or explore?

Simon Storf

a month ago

ol' reliable

thumb_up_off_alt1

Omar Khattab

@lateinteraction

a month ago

🧵What's next in DSPy 2.5? And DSPy 3.0? I'm excited to share an early sketch of the DSPy Roadmap, a document we'll expand and maintain as more DSPy releases ramp up. The goal is to communicate our objectives, milestones, & efforts and to solicit input—and help!—from everyone.

Simon Storf

a month ago

True

Simon Storf

a month ago

I understand MPC offers efficiency, but how do we reach superhuman performance for example in a very complex environment without letting NNs learn through trials? The way I see it, MPC is only good for well understood, simple environments. What am I missing here?

thumb_up_off_alt1

chat_bubble_outline1

lmsys.org

@lmsysorg

a month ago

Does style matter over substance in Arena? Can models "game" human preference through lengthy and well-formatted responses? Today, we're launching style control in our regression model for Chatbot Arena — our first step in separating the impact of style from substance in

thumb_up_off_alt891

chat_bubble_outline46

repeat115

Simon Storf

22 days ago

not enough people working on mushroom safety

thumb_up_off_alt2

repeat1