Peter Barnett (@csgosmorf) 's Twitter Profile
Peter Barnett

@csgosmorf

ID: 1381867369916588040

calendar_today13-04-2021 07:10:39

462 Tweet

35 Takipçi

223 Takip Edilen

Peter Barnett (@csgosmorf) 's Twitter Profile Photo

Shouldn't [x,y] = xy - yx be called the "anti-commutator", since [x,y] = -[y,x]? And since it "measures anti-commutativity"? Then {x,y} = xy + yx would be called the "commutator" since {x,y} = {y,x}, and since it measures commutativity Or am I missing something

Peter Barnett (@csgosmorf) 's Twitter Profile Photo

Alexey Guzey **Explaining Induction** 1/? To grok it we need predicates. A predicate is a function that maps each input to a truth value. e.g. Let n%2 denote the remainder after dividing n by 2 (so it is 0 for evens and 1 for odds) Let isEven(n) = (n%2 = 0) Let isOdd(n) = (n%2 = 1)

Peter Barnett (@csgosmorf) 's Twitter Profile Photo

If the entirety of Earth’s surface had a tiling like this and you stepped on 2 new tiles per second, 8 hours per day, everyday, it would take you around 100 million years to step on every last tile. If u could observe 1M new tiles per second you still wouldn’t live to see it all

If the entirety of Earth’s surface had a tiling like this and you stepped on 2 new tiles per second, 8 hours per day, everyday, it would take you around 100 million years to step on every last tile. If u could observe 1M new tiles per second you still wouldn’t live to see it all
Alex Albert (@alexalbert__) 's Twitter Profile Photo

Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of

Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval.

For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of
Peter Barnett (@csgosmorf) 's Twitter Profile Photo

System 2 thinking was a step up. Maybe the next step up is introspection via meta-representations. It could not only know things, but also know what it knows

Peter Barnett (@csgosmorf) 's Twitter Profile Photo

“Error in message stream” almost every time I prompt o1, after it thinks for a couple minutes and writes almost the entire answer. Can only imagine how much compute is being wasted by this

Peter Barnett (@csgosmorf) 's Twitter Profile Photo

Is it just me or does o1-pro perform worse when source code files are uploaded than when they’re each copy-pasted into the chat window?

AK (@_akhaliq) 's Twitter Profile Photo

Microsoft presents rStar-Math Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking On the MATH benchmark, it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%. On the USA Math Olympiad

Microsoft presents rStar-Math

Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

On the MATH benchmark, it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%. On the USA Math Olympiad
Phrases (@phrases1439078) 's Twitter Profile Photo

Major update to my free app “Lexi: Vocabulary Crosswords” for expanding vocab via crosswords with spaced repetition! Now featuring: — fill-in-the-blank clues testing in-context application — pronunciation — improved definitions 50% of profits go to Qualia Research Institute. #Vocabulary #QRI

Major update to my free app “Lexi: Vocabulary Crosswords” for expanding vocab via crosswords with spaced repetition! Now featuring:
—  fill-in-the-blank clues testing in-context application
— pronunciation
— improved definitions
50% of profits go to <a href="/QualiaRI/">Qualia Research Institute</a>. #Vocabulary #QRI