Aaron (@aaronmasuba) Twitter Tweets • TwiCopy

Tom Yeh

@proftomyeh

a year ago

I calculated a poster-size Transformer model by hand✍️

thumb_up_off_alt1,1K

chat_bubble_outline47

repeat188

shareShare

< Choosing a Vision Backbone > your model’s backbone is its perspective pick ResNet, and it sees in edges pick a ViT, and it sees in patches the backbone decides how your model thinks here are some of the most practical backbones and when you should choose them, from the

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat111

shareShare

Kshitij Mishra

@kshitijmis37062

5 months ago

11 FREE Books from MIT for Absolute Beginners - Machine Learning (ML) - Deep Learning (DL) - Reinforcement Learning (RL) - Artificial Intelligence (AI) To get: - 1. Follow (So I can DM you ) 2. Like & retweet 3. Reply " Send "

thumb_up_off_alt2,2K

chat_bubble_outline493

repeat601

shareShare

Kshitij Mishra

@kshitijmis37062

5 months ago

I'm deleting this soon because it's a legit cash-printing formula. 𝗣𝗮𝗶𝗱 𝗖𝗼𝘂𝗿𝘀𝗲 𝗙𝗥𝗘𝗘 (PART - 3) 1. Artificial Intelligence + Data Analyst 2. Machine Learning + Data Science 3. Cloud Computing + Web Development 4. Ethical Hacking + Hacking 5. Data Analytics + DSA

thumb_up_off_alt3,3K

chat_bubble_outline1,1K

repeat986

shareShare

机器之心 JIQIZHIXIN

@synced_global

5 months ago

Yes, it turns out diffusion models can learn from feedback as effectively as language models do with RL! Tsinghua, NVIDIA, and Stanford introduced Diffusion Negative-aware FineTuning (DiffusionNFT), a new online reinforcement learning paradigm that finally makes RL practical for

thumb_up_off_alt427

chat_bubble_outline3

repeat76

shareShare

Tom Yeh

@proftomyeh

5 months ago

At MIT, I learned about RNNs in my NLP class with Prof. Michael Collins. He built a model from my keystrokes to predict who I was. To me, it felt like a magic box. Years later, when I had to teach RNNs, I forced myself to go inside the box. ⬇️ Download: byhand.ai/rnn

thumb_up_off_alt3,3K

chat_bubble_outline27

repeat575

shareShare

Elliot Arledge

@elliotarledge

5 months ago

I split my 12 hr CUDA course into sections. We'll cover: 1) the deep learning ecosystem 2) cuda setup/installation 3) gentle intro to gpus 4) writing your first kernels 5) kernel and system level profiling, atomics, and the cuda programming model 6) how and when to use

thumb_up_off_alt4,4K

chat_bubble_outline38

repeat427

shareShare

Dwarkesh Patel

@dwarkesh_sp

5 months ago

The Andrej Karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self

thumb_up_off_alt9,9K

chat_bubble_outline281

repeat1,1K

shareShare

Rohan Paul

@rohanpaul_ai

5 months ago

New Harvard paper shows training‑free sampling lets a base LLM rival reinforcement learning on reasoning. No training, dataset, or verifier. The method samples from a power distribution, which means reweighting full sequences the model already thinks are likely. That bias

thumb_up_off_alt222

chat_bubble_outline7

repeat37

shareShare

Tom Yeh

@proftomyeh

5 months ago

Evolution of Deep Learning by Hand ✍️ As my tribute to Geoff Hinton's Nobel Prize, I drew this animation to illustrate the key idea behind Hinton's major contributions to deep learning over the years, with artistic liberty. ---- 100% original, made by hand ✍️ Join 40k readers

thumb_up_off_alt2,2K

chat_bubble_outline20

repeat368

shareShare

Swapna Kumar Panda

@swapnakpanda

5 months ago

Stanford's ALL FREE Courses [2024 & 2025] ❯ CS230 - Deep Learning ❯ CS234 - Reinforcement Learning ❯ CS236 - Deep Generative Models ❯ CME295 - Transformers & LLMs ❯ CS336 - Language Model from Scratch ❯ CS224N - NLP with DL Find links inside:

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat211

shareShare

ℏεsam

@hesamation

5 months ago

Stanford just released a new course for this Fall: Transformers & Large Language Models by the Amidi brothers. Three videos are already available for free on YouTube. SYLLABUS: > Transformers (tokenization, embeddings, attention, architecture) > LLM foundations (MoEs, types of

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat276

shareShare

Aadit Sheth

@aaditsh

5 months ago

Stanford packed 1.5 hours with everything you need to know about LLMs. It explains why scale beats architecture and data beats genius. The clearest crash course on how AI actually works. Save this for later.

thumb_up_off_alt939

chat_bubble_outline13

repeat152

shareShare

Andrej Karpathy

@karpathy

5 months ago

My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my

thumb_up_off_alt10,10K

chat_bubble_outline405

repeat1,1K

shareShare

ℏεsam

@hesamation

5 months ago

fantastic simple visualization of the self attention formula. this was one of the hardest things for me to deeply understand about LLMs. the formula seems easy. you can even memorize it fast. but to really get an intuition of what the Q,K,V represent and interact, that’s hard.

thumb_up_off_alt3,3K

chat_bubble_outline24

repeat560

shareShare

Tanmay Gupta

@tanmay2099

5 months ago

Had the surreal experience of telling a room full of computer vision researchers at the ICCV25 AC workshop why “computer vision researcher” won’t be a thing in 5 years 🌶️ Of course, this was an extreme stance to keep things lively in a fun debate setting but it echoed some of my

thumb_up_off_alt209

chat_bubble_outline7

repeat21

shareShare

Kirk Borne

@kirkdborne

5 months ago

Practical Linear Algebra for #DataScience — From Core Concepts to Applications Using #Python — amzn.to/3WWJKR4 ———— #DataScientist #AI #ML #MachineLearning #Mathematics #LinearAlgebra #Coding

thumb_up_off_alt420

chat_bubble_outline4

repeat64

shareShare

Avi Chawla

@_avichawla

5 months ago

Here's a neural net optimization trick that leads to ~4x faster CPU to GPU transfers. Imagine an image classification task. - We define the network, load the data and transform it. - In the training loop, we transfer the data to the GPU and train. Here's the problem with this:

thumb_up_off_alt395

chat_bubble_outline4

repeat52

shareShare

Santiago

@svpino

5 months ago

This is still the way I recommend most people start with machine learning: 1. Start with Python 2. Learn to use Google Colab 3. Take a Pandas tutorial 4. Then a Seaborn tutorial 5. Learn how to use Decision Trees 6. Finish Kaggle's "Intro to Machine Learning" 7. Solve the

thumb_up_off_alt3,3K

chat_bubble_outline56

repeat351

shareShare

Avinash Singh

@avinashsingh_20

5 months ago

Complete Advance DSA Resources in One Place👇 From beginner sheets to advanced problem-solving guides, logic-building notes, and real-world DSA applications , everything you need to master Data Structures & Algorithms is HERE! Perfect for coding prep & product-based interviews.

thumb_up_off_alt704

chat_bubble_outline259

repeat185

shareShare

Aaron

Tom Yeh

chester

Kshitij Mishra

Kshitij Mishra

机器之心 JIQIZHIXIN

Tom Yeh

Elliot Arledge

Dwarkesh Patel

Rohan Paul

Tom Yeh

Swapna Kumar Panda

ℏεsam

Aadit Sheth

Andrej Karpathy

ℏεsam

Tanmay Gupta

Kirk Borne

Avi Chawla

Santiago

Avinash Singh