Ryan Sun (@sun_hanchi) Twitter Tweets • TwiCopy

Ryan Sun

@sun_hanchi

+ Follow

Large Language Mystificator🧐 | Member of Non-Technical Staff @ Lehigh | Converting to JEPAism🙏

ID: 1543864209322156032

calendar_today04-07-2022 07:48:24

1,1K Tweet

239 Followers

426 Following

Ryan Sun

@sun_hanchi

a month ago

I stopped using Microsoft Office bundle: - slides --> LaTeX Beamer - docs --> Markdown - excel --> csv/json + python (matplotlib) The three replacements are text only, so I can use LLM Agents to work on them Claude Code + Cursor is my new UI for everything

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

No!!! EC!! U deserve better!! Revive the glory of EC!!!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

Kaiming once used a similar analogy: one trust turbojets not because we figured out aerodynamics or solved Navier-Stokes equations, but we tested the turbojets tens of millions of times

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

The “so done” and “so back” oscillations

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

Is the difference led by the implicit bias induced by length normalization?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

That means 5-TPG would win 42 of 44 trades Guys, market might have been solved

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

Does MuP just work for MoE? I suspect the top-k operation increases variance, so maybe extreme value theorem shall be considered Maybe shift init further by 1/log, 1/loglog, or pi^2/6

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

Why Claude for Excel, When you can Claude Code + matplotlib + .csv?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

I won't be surprised if Gemini is actually expert choice

thumb_up_off_alt5

chat_bubble_outline2

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

TL;DR: Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3 Qwen3… (*38)

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

A reminder that 11 months have passed, and we still have no open-source implementation of o1-pro A multi-agent long reasoning framework that can use 100x test time compute to produce well-thought results

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

a month ago

Have you noticed that, chatbots never say "sorry"?

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

24 days ago

It’s a good mental model to think model training cost is essentially 0 It gets amortized by the increasing demand One should only care about inference cost in the long run

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

24 days ago

I remember seeing the demo during NeurIPS 2023 Still don’t see the point

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Ryan Sun

@sun_hanchi

20 days ago

❌ Conference shitty peer review ✅ Decentralized public voting system, (e.g., huggingface daily🤗) ✅✅✅ Recommendation system for papers based on engagements I thought we solved decentralized review a while ago in social media

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Jia-Bin Huang

@jbhuang0604

18 days ago

writing the intro of an ML paper be like:

thumb_up_off_alt61

chat_bubble_outline1

repeat6

shareShare