Danial Khosravi (@danial_kh) Twitter Tweets • TwiCopy

Shreya Shankar

2 years ago

does anyone have LLM agents running in prod or at scale, automatically? forget about cost, how did you get the end-to-end latency low enough & the accuracy high enough?

thumb_up_off_alt330

chat_bubble_outline50

repeat13

shareShare

I used to find writing CUDA code rather terrifying. But then I discovered a couple of tricks that actually make it quite accessible. In this video I introduce CUDA in a way that will be accessible to Python programmers, and I even show how to do it all in Colaboratory!

thumb_up_off_alt2,2K

chat_bubble_outline34

repeat385

shareShare

Peter Yang

@petergyang

2 years ago

I thought Dune 2 was the best movie of 2024 until I watched this masterpiece (sound on).

thumb_up_off_alt12,12K

chat_bubble_outline295

repeat1,1K

shareShare

Saurabh Bhatnagar

@analyticsaurabh

2 years ago

Hot take: function calling is an anti pattern

thumb_up_off_alt4

chat_bubble_outline5

repeat2

shareShare

Hamel Husain

@hamelhusain

2 years ago

LLM bullshit knife, to cut through bs RAG -> Provide relevant context Agentic -> Function calls that work CoT -> Prompt model to think/plan FewShot -> Add examples PromptEng -> Someone w/good written comm skills. Prompt Optimizer -> For

thumb_up_off_alt3,3K

chat_bubble_outline98

repeat547

shareShare

Liliang Ren

@liliang_ren

2 years ago

Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.😮 And it has an infinite context length with linear complexity.🤯 Paper: arxiv.org/abs/2406.07522

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat271

shareShare

Xenova

@xenovacom

2 years ago

I'm excited to announce that Transformers.js V3 is finally available on NPM! 🔥 State-of-the-art Machine Learning for the web, now with WebGPU support! 🤯⚡️ Install it from NPM with: 𝚗𝚙𝚖 𝚒 @𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎/𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜 or via CDN (example below) 👇

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat212

shareShare

Danial Khosravi

@danial_kh

2 years ago

How I migrated my blog from Octopress to NextJS danialk.github.io/blog/2024/09/2…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Danial Khosravi

@danial_kh

2 years ago

Building a Dactyl Manuform Mini Keyboard danialk.github.io/blog/2024/09/2… #dactyl #3DPrinting

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Eugene Yan

@eugeneyan

a year ago

Evals are "too expensive" until you: • Can't migrate underlying models safely • Can't add new features with confidence • Can't ship w/o HITL evals, which takes >100x longer • Product development/iteration grinds to a halt • Lose customer trust due to poor user experience

thumb_up_off_alt184

chat_bubble_outline10

repeat19

shareShare

maharshi

@mrsiipa

a year ago

what an amazing read: converting json to regex then regex to finite state machines, and then optimising it is brilliant!

thumb_up_off_alt1,1K

chat_bubble_outline32

repeat161

shareShare

Denny Zhou

@denny_zhou

9 months ago

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial

thumb_up_off_alt2,2K

chat_bubble_outline22

repeat322

shareShare

Adi Polak

@adipolak

8 months ago

Every major AI lab is hiring people who can: – ship eval pipelines – scale training infra – write interpretable logs MLE ≠ "fine-tune a llama" It’s how to make reasoning reliable at scale. Get in. It’s day 1.

thumb_up_off_alt1,1K

chat_bubble_outline38

repeat110

shareShare

Justin Johnson

@jcjohnss

8 months ago

10 years ago, deep learning was in its infancy. PyTorch didn't exist. Language models were recurrent, and not large. But it felt important: a new technology that would change everything. That's why Fei-Fei Li , Andrej Karpathy, and I started CS231N Staff back in 2015 - to teach the world's

thumb_up_off_alt2,2K

chat_bubble_outline42

repeat223

shareShare

Eugene Yan

@eugeneyan

8 months ago

i’ve seen teams try to apply evals and get the same outcome and gain reluctance to eval. this is because they used off-the-shelf evals (by evals platforms) like “faithfulness”, “cohesion”, “helpfulness” 😔 generic evals aren’t useful. your evals must be aligned with your user

thumb_up_off_alt211

chat_bubble_outline14

repeat14

shareShare

Keenan Crane

@keenanisalive

8 months ago

“Everyone knows” what an autoencoder is… but there's an important complementary picture missing from most introductory material. In short: we emphasize how autoencoders are implemented—but not always what they represent (and some of the implications of that representation).🧵

thumb_up_off_alt2,2K

chat_bubble_outline41

repeat384

shareShare

Andrej Karpathy

@karpathy

6 months ago

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,

thumb_up_off_alt16,16K

chat_bubble_outline517

repeat2,2K

shareShare

Anshuman Mishra

@heyyanshuman

5 months ago

"Just use KV cache for LLM inference" Until you need: - Same 5000-token system prompt for every request - 10M+ requests/day burning $50k on redundant compute - RAG pipelines reprocessing the same docs 1000x/day Then you implement prefix caching. You know why basic KV cache

thumb_up_off_alt660

chat_bubble_outline23

repeat76

shareShare

Abhishek Singh

@natoshi_sakmoto

5 months ago

When to set Kubernetes limits — the version nobody tells juniors but every senior engineer actually follows: CPU limits: → Almost never. They throttle your pods, destroy latency, and create artificial bottlenecks. Let the kernel do its job — not Kubernetes micromanagement. CPU

thumb_up_off_alt466

chat_bubble_outline9

repeat46

shareShare

apolinario 🌐

@multimodalart

a month ago

LLaDA 2.1 is now merged to the diffusers library 🧨 language diffusion models are mature and usable. and now integrated combining transformers 🤝 diffusers I've just built a demo where you can play with the diffusion process 👇

thumb_up_off_alt40

chat_bubble_outline3

repeat11

shareShare

Danial Khosravi

Shreya Shankar

Jeremy Howard

Peter Yang

Saurabh Bhatnagar

Hamel Husain

Liliang Ren

Xenova

Danial Khosravi

Danial Khosravi

Eugene Yan

maharshi

Denny Zhou

Adi Polak

Justin Johnson

Eugene Yan

Keenan Crane

Andrej Karpathy

Anshuman Mishra

Abhishek Singh

apolinario 🌐