Sicong (Sheldon) Huang (@sicong_huang) Twitter Tweets • TwiCopy

Sicong (Sheldon) Huang

@sicong_huang

+ Follow

Pretrained by evolution, finetuned by experience, prompted by situations. | AI PhDing @UofT, sharing ideas on AI, forecasting research and the human condition.

ID: 3720756561

linkhttps://www.cs.toronto.edu/~huang/ calendar_today20-09-2015 19:46:52

588 Tweet

2,2K Followers

695 Following

Sicong (Sheldon) Huang

@sicong_huang

10 months ago

Also a great place to meet friends. I'd guess most students there are conscientious and highly open. Could be your future boss or employee. Or just friends to grow together.

thumb_up_off_alt21

chat_bubble_outline0

repeat1

shareShare

I think inference time compute could be more efficient for scaling sequential decision making instead of only using the depth of a NN, so that during train time the model is (in weight) learning to search (in context). But it seems human researchers are still deciding which parts

thumb_up_off_alt33

chat_bubble_outline1

repeat2

shareShare

Sicong (Sheldon) Huang

@sicong_huang

7 months ago

i was like, this AI timeline seems more based than a lot of the more serious-looking ones i've seen — and then bro shared his chatgpt conversation history...

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Sicong (Sheldon) Huang

@sicong_huang

7 months ago

Are LLMs ready for scientific hypothesis generation and inductive reasoning? We built HypoBench to put them to the test. See what we found. 👇

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Ruiqi Zhong

@zhongruiqi

6 months ago

Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at OpenAI and societal impact Anthropic Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P

Last day of PhD!

I pioneered using LLMs to explain dataset&model. It's used by interp at <a href="/OpenAI/">OpenAI</a> and societal impact <a href="/AnthropicAI/">Anthropic</a>

Tutorial here. It's a great direction & someone should carry the torch :)

Thesis available, if you wanna read my acknowledgement section=P

thumb_up_off_alt523

chat_bubble_outline27

repeat37

shareShare

Sicong (Sheldon) Huang

@sicong_huang

6 months ago

Are we living in a simulation? If you can't tell, does it matter? But what if you can? Do you want to?

thumb_up_off_alt13

chat_bubble_outline3

repeat1

shareShare

John(Yueh-Han) Chen

@jcyhc_ai

6 months ago

New paper: We developed an LLM system that predicts which machine learning research idea from a set of candidates will yield superior empirical results. We showed that, in certain domains like NLP, our system significantly outperforms human experts (64.4% vs. 48.9%)! See more

thumb_up_off_alt11

chat_bubble_outline1

repeat4

shareShare

Daniel Wurgaft

@danielwurgaft

5 months ago

🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/

thumb_up_off_alt51

chat_bubble_outline1

repeat16

shareShare

Sang Truong

@sangttruong

4 months ago

Interested in LLM evaluation reliability & efficiency? Check our ICML’25 paper Reliable and Efficient Amortized Model-based Evaluation arxiv.org/abs/2503.13335 w/ Percy Liang Bo Li Sanmi Koyejo Yuheng Tu Virtue AI Stanford AI Lab Stanford Trustworthy AI Research (STAIR) Lab Center for Research on Foundation Models 🧵1/9

thumb_up_off_alt43

chat_bubble_outline2

repeat16

shareShare

Sid

@sid_srk

4 months ago

Announcing The Toronto School Of Foundation Modelling, a Toronto exclusive, in-person only school for learning to build Foundation Models. Coming to New Stadium and Youthful Vengeance in late August 2025.

thumb_up_off_alt92

chat_bubble_outline15

repeat11

shareShare

Ken Liu

@kenziyuliu

3 months ago

New paper! We explore a radical paradigm for AI evals: assessing LLMs on *unsolved* questions. Instead of contrived exams where progress ≠ value, we eval LLMs on organic, unsolved problems via reference-free LLM validation & community verification. LLMs solved ~10/500 so far:

thumb_up_off_alt362

chat_bubble_outline12

repeat72

shareShare

Hugo Larochelle

@hugo_larochelle

3 months ago

Excited to share that I begin today as Scientific Director at Mila - Institut québécois d'IA! Truly honored by this opportunity to serve this community of AI leaders and innovators, that I've always cherished and have benefited from myself. mila.quebec/en/news/hugo-l…

thumb_up_off_alt595

chat_bubble_outline85

repeat33

shareShare

François Chollet

@fchollet

2 months ago

The most important skill for a researcher is not technical ability. It's taste. The ability to identify interesting and tractable problems, and recognize important ideas when they show up. This can't be taught directly. It's cultivated through curiosity and broad reading.

thumb_up_off_alt3,3K

chat_bubble_outline101

repeat617

shareShare

François Chollet

@fchollet

2 months ago

The idea that we will automate work by building artificial versions of ourselves to do exactly the things we were previously doing, rather than redesigning our old workflows to make the most out of existing automation technology, has a distinct “mechanical horse” flavor

thumb_up_off_alt1,1K

chat_bubble_outline91

repeat190

shareShare

Danijar Hafner

@danijarh

2 months ago

Excited to introduce Dreamer 4, an agent that learns to solve complex control tasks entirely inside of its scalable world model! 🌎🤖 Dreamer 4 pushes the frontier of world model accuracy, speed, and learning complex tasks from offline datasets. co-led with Wilson Yan

thumb_up_off_alt2,2K

chat_bubble_outline76

repeat325

shareShare

Forecasting Research Institute

@research_fri

2 months ago

⬆️ LLMs’ forecasting abilities are steadily improving. GPT-4 (released March 2023) achieved a difficulty-adjusted Brier score of 0.131. Nearly two years later, GPT-4.5 (released Feb 2025) scored 0.101—a substantial improvement. A linear extrapolation of state-of-the-art LLM

thumb_up_off_alt228

chat_bubble_outline9

repeat28

shareShare