_Gooofy_ (@_gooofy_) 's Twitter Profile
_Gooofy_

@_gooofy_

Free Software, Linux, Amiga, artificial intelligence, hardware, embedded systems

ID: 2188905007

linkhttps://github.com/gooofy calendar_today11-11-2013 19:33:33

4,4K Tweet

244 Takipçi

421 Takip Edilen

MatthewBerman (@matthewberman) 's Twitter Profile Photo

.Anthropic just published a WILD new AI jailbreaking technique Not only does it crack EVERY frontier model, but it's also super easy to do. ThIS iZ aLL iT TakE$ 🔥 Here's everything you need to know: 🧵

.<a href="/AnthropicAI/">Anthropic</a> just published a WILD new AI jailbreaking technique

Not only does it crack EVERY frontier model, but it's also super easy to do.

ThIS iZ aLL iT TakE$ 🔥

Here's everything you need to know: 🧵
Nat McAleese (@__nmca__) 's Twitter Profile Photo

o1 was the first large reasoning model — as we outlined in the original “Learning to Reason” blog, it’s “just” an LLM trained with RL. o3 is powered by further scaling up RL beyond o1, and the strength of the resulting model the resulting model is very, very impressive. (2/n)

Wes Roth (@wesrothmoney) 's Twitter Profile Photo

AGI ACHIEVED OpenAI just announced the o3 model that broke the ARC AGI benchmark 🔥 this is UNPRECEDENTED.... here's what you need to know 🧵:

AGI ACHIEVED

OpenAI just announced the o3 model that broke the ARC AGI benchmark 🔥

this is UNPRECEDENTED....

here's what you need to know 🧵:
Denny Zhou (@denny_zhou) 's Twitter Profile Photo

When the name Q* was leaked, many thought Q* would be a combination of Q-Learning and A* search, or some other more advanced search powered by RL, over the generation space. I commented that is a dead end. Now my comment should be clear to them.

Mike Knoop (@mikeknoop) 's Twitter Profile Photo

Raising visibility on this note we added to address ARC "tuned" confusion: > OpenAI shared they trained the o3 we tested on 75% of the Public Training set. This is the explicit purpose of the training set. It is designed to expose a system to the core knowledge priors needed to

Alvaro Cintas (@dr_cintas) 's Twitter Profile Photo

Stanford has launched an incredible research AI tool. It’s called STORM, and basically you enter a topic and it will search hundreds of websites to write an article about its major findings. Available to everyone for free!

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Microsoft’s new rStar-Math technique upgrades small models to outperform OpenAI’s o1-preview at math problems 🤯 rStar-Math technique enhances small language models (SLMs) using Monte Carlo Tree Search (MCTS) and self-evolution strategies. Applied to models like Qwen-7B and

Microsoft’s new rStar-Math technique upgrades small models to outperform OpenAI’s o1-preview at math problems 🤯

rStar-Math technique enhances small language models (SLMs) using Monte Carlo Tree Search (MCTS) and self-evolution strategies.

Applied to models like Qwen-7B and
Daniel Han (@danielhanchen) 's Twitter Profile Photo

Phi-4 bug fixes: 1. EOS should be <|im_end|> not <|endoftext|> 2. Pad token EOS should be <|dummy_87|> 3. Chat template shouldn't default add "assistant" & Llama-fied Phi-4 & split QKV to increase accuracy for fine-tuning & made dynamic 4bit quants! Details: 1. The EOS should

Phi-4 bug fixes:
1. EOS should be &lt;|im_end|&gt; not &lt;|endoftext|&gt;
2. Pad token EOS should be &lt;|dummy_87|&gt;
3. Chat template shouldn't default add "assistant"

&amp; Llama-fied Phi-4 &amp; split QKV to increase accuracy for fine-tuning &amp; made dynamic 4bit quants!

Details:
1. The EOS should
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

MiniMax-Text-01 maintains higher accuracy even with a 1M-token context window, outperforming others like Claude-3.5 and Gemini models, which show steep declines beyond 256K tokens.

MiniMax-Text-01 maintains higher accuracy even with a 1M-token context window, outperforming others like Claude-3.5 and Gemini models, which show steep declines beyond 256K tokens.
MatthewBerman (@matthewberman) 's Twitter Profile Photo

1/ Google Research unveils new paper: "Titans: Learning to Memorize at Test Time" It introduces human-like memory structures to overcome the limits of Transformers, with one "SURPRISING" feature. Here's why this is huge for AI. 🧵👇

1/ Google Research unveils new paper: "Titans: Learning to Memorize at Test Time"

It introduces human-like memory structures to overcome the limits of Transformers, with one "SURPRISING" feature.

Here's why this is huge for AI. 🧵👇
MatthewBerman (@matthewberman) 's Twitter Profile Photo

This blew my mind. PhD student Jiayi Pan reproduced the emergent “thinking” behavior in a 1.5b model using the DeepSeek R1 technique for just $30. This means we can give “thinking” to pretty much any model!! I broke down the findings in my YouTube video below 👇

MatthewBerman (@matthewberman) 's Twitter Profile Photo

New research paper shows how LLMs can "think" internally before outputting a single token! Unlike Chain of Thought, this "latent reasoning" happens in the model's hidden space. TONS of benefits from this approach. Let me break down this fascinating paper...

New research paper shows how LLMs can "think" internally before outputting a single token!

Unlike Chain of Thought, this "latent reasoning" happens in the model's hidden space.

TONS of benefits from this approach.

Let me break down this fascinating paper...
Marcel Pociot 🧪 (@marcelpociot) 's Twitter Profile Photo

The Command and Conquer source code was open sourced today and it's full of amazing comments 😂 Exhibit A: The "we will fix it later"

The Command and Conquer source code was open sourced today and it's full of amazing comments 😂

Exhibit A: The "we will fix it later"
Qwen (@alibaba_qwen) 's Twitter Profile Photo

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general

Introducing Qwen3! 

We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general
Andrew Zhao (@andrewz45732491) 's Twitter Profile Photo

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math &amp; coding domains.
🧵 1/
Commodore Computer Museum 🕹 (@museumcommodore) 's Twitter Profile Photo

It's official: Perifractic's mission to buy Commodore Computers succeeded! Mind-blowing! After decades in limbo, Commodore has a new owner! I personally tried for years without success to buy Commodore, or at least convince wealthy business men to buy it, but I failed

It's official: Perifractic's mission to buy Commodore Computers succeeded! Mind-blowing! After decades in limbo, Commodore has a new owner! 

I personally tried for years without success to buy Commodore, or at least convince wealthy business men to buy it, but I failed