yseeker (@yseeker0) Twitter Tweets • TwiCopy

o1 pro普通にすごい。自分が昔書いた論文（固体物理 / 実験）のタイトルだけ入力して解説させたら、参考文献もほぼドンピシャで内容も70点くらいだった。※ただしハルシネーションを含む同じプロンプトでo1だと参考文献はハルシネーション多数、解説の内容は50点。 ClaudeとGPT-4oはゴミという感じ

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

yseeker

@yseeker0

5 months ago

o1 pro、論文全文（or 主要な箇所）とソースコードを可能な限り同時に突っ込むといい感じの出力になる。

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

yseeker

@yseeker0

5 months ago

o1 pro、日本語の言い回しがこなれすぎててビビる笑例1）ただし、「decode & augment」できる利点が""ハマる""ほど、、、例2）一見12.8GBなら""載るかも……""と思うかもしれませんが、実際には以下の要素で、、、「ハマる」とか「載るかも……」とか、言葉をちゃんと使えてる感じがする

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

yseeker

@yseeker0

4 months ago

並びがおかしのでは？？

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Update: Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have readers, like main test our CUDA kernels, to identify that the system had found a way to “cheat”. For example, the system

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat198

shareShare

yseeker

@yseeker0

3 months ago

Our new house…

thumb_up_off_alt13

chat_bubble_outline0

repeat0

shareShare

yseeker

@yseeker0

3 months ago

いろいろ触った結果、だいたい全部： o1 pro すぐ回答が欲しい時 : o1 コーディング：o3-mini-high リサーチ： Deep Research（ChatGPT）という結論になった。さすが OpenAIでしたわ。

thumb_up_off_alt8

chat_bubble_outline1

repeat0

shareShare

yseeker

@yseeker0

3 months ago

（Grokは試せてないですが、Claude, gemini, DeepSeekは試しました）

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

yseeker

@yseeker0

3 months ago

wifi難民生活終了です。

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

yseeker

@yseeker0

3 months ago

いろんなwebサービス、ツールがMCPサーバー化され、LLMにつながっていく様子を見ると、アメリカ時代に希釈冷凍機のコントローラーやロックインアンプもlabradサーバー化して中央集権でコントロールできるように実装していたことを思い出した。labradをMCPサーバー化すれば統一規格でLLMにつなげそう。

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

yseeker

@yseeker0

2 months ago

Gemini 2.5 proマジでかしこいな。o1 proと同等かタスクによっては上回ってるな。しかもレスポンスも高速。 o3 proは多分ゆっくり型だろうけど果たしてIQ120相当のGeminiを引き離せるのか？

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

yseeker

@yseeker0

2 months ago

GrokとClaudeも負けずにo3をぶち抜いてほしい

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

yseeker

@yseeker0

2 months ago

tracking ai （trackingai.org/home）のmensa norway とoffline testを足して2で割ったくらいが体感の性能に近いな。用途は、物書き、調べ物、アイディア整理など（not coding） o3 =~ gemini 2.5 pro > o1 pro >> o1 >= o3-mini-high ~ > Claude (extend thinking) >~ Grok (think)

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare