Danial Khosravi (@danial_kh) 's Twitter Profile
Danial Khosravi

@danial_kh

Functional Programming, Machine Learning, Deep Learning, Bayesian Statistics. Full-stack Software Engineer and Data Scientist

ID: 41543977

linkhttp://danialk.github.io/ calendar_today21-05-2009 06:49:01

1,1K Tweet

282 Takipçi

1,1K Takip Edilen

Shreya Shankar (@sh_reya) 's Twitter Profile Photo

does anyone have LLM agents running in prod or at scale, automatically? forget about cost, how did you get the end-to-end latency low enough & the accuracy high enough?

Jeremy Howard (@jeremyphoward) 's Twitter Profile Photo

I used to find writing CUDA code rather terrifying. But then I discovered a couple of tricks that actually make it quite accessible. In this video I introduce CUDA in a way that will be accessible to Python programmers, and I even show how to do it all in Colaboratory!

Hamel Husain (@hamelhusain) 's Twitter Profile Photo

LLM bullshit knife, to cut through bs RAG -> Provide relevant context Agentic -> Function calls that work CoT -> Prompt model to think/plan FewShot -> Add examples PromptEng -> Someone w/good written comm skills. Prompt Optimizer -> For

Liliang Ren (@liliang_ren) 's Twitter Profile Photo

Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.😮 And it has an infinite context length with linear complexity.🤯 Paper: arxiv.org/abs/2406.07522

Introducing Samba 3.8B, a simple Mamba+Sliding Window Attention architecture that outperforms Phi3-mini on major benchmarks (e.g., MMLU, GSM8K and HumanEval) by a large margin.😮 And it has an infinite context length with linear complexity.🤯

Paper: arxiv.org/abs/2406.07522
Xenova (@xenovacom) 's Twitter Profile Photo

I'm excited to announce that Transformers.js V3 is finally available on NPM! 🔥 State-of-the-art Machine Learning for the web, now with WebGPU support! 🤯⚡️ Install it from NPM with: 𝚗𝚙𝚖 𝚒 @𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎/𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜 or via CDN (example below) 👇

Eugene Yan (@eugeneyan) 's Twitter Profile Photo

Evals are "too expensive" until you: • Can't migrate underlying models safely • Can't add new features with confidence • Can't ship w/o HITL evals, which takes >100x longer • Product development/iteration grinds to a halt • Lose customer trust due to poor user experience

Evals are "too expensive" until you:

• Can't migrate underlying models safely
• Can't add new features with confidence
• Can't ship w/o HITL evals, which takes >100x longer
• Product development/iteration grinds to a halt
• Lose customer trust due to poor user experience
Denny Zhou (@denny_zhou) 's Twitter Profile Photo

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial

Adi Polak (@adipolak) 's Twitter Profile Photo

Every major AI lab is hiring people who can: – ship eval pipelines – scale training infra – write interpretable logs MLE ≠ "fine-tune a llama" It’s how to make reasoning reliable at scale. Get in. It’s day 1.

Justin Johnson (@jcjohnss) 's Twitter Profile Photo

10 years ago, deep learning was in its infancy. PyTorch didn't exist. Language models were recurrent, and not large. But it felt important: a new technology that would change everything. That's why Fei-Fei Li , Andrej Karpathy, and I started CS231N Staff back in 2015 - to teach the world's

Eugene Yan (@eugeneyan) 's Twitter Profile Photo

i’ve seen teams try to apply evals and get the same outcome and gain reluctance to eval. this is because they used off-the-shelf evals (by evals platforms) like “faithfulness”, “cohesion”, “helpfulness” 😔 generic evals aren’t useful. your evals must be aligned with your user

Keenan Crane (@keenanisalive) 's Twitter Profile Photo

“Everyone knows” what an autoencoder is… but there's an important complementary picture missing from most introductory material. In short: we emphasize how autoencoders are implemented—but not always what they represent (and some of the implications of that representation).🧵

“Everyone knows” what an autoencoder is… but there's an important complementary picture missing from most introductory material.

In short: we emphasize how autoencoders are implemented—but not always what they represent (and some of the implications of that representation).🧵
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,

Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,
Anshuman Mishra (@heyyanshuman) 's Twitter Profile Photo

"Just use KV cache for LLM inference" Until you need: - Same 5000-token system prompt for every request - 10M+ requests/day burning $50k on redundant compute - RAG pipelines reprocessing the same docs 1000x/day Then you implement prefix caching. You know why basic KV cache

"Just use KV cache for LLM inference"

Until you need:
- Same 5000-token system prompt for every request
- 10M+ requests/day burning $50k on redundant compute
- RAG pipelines reprocessing the same docs 1000x/day

Then you implement prefix caching.

You know why basic KV cache
Abhishek Singh (@natoshi_sakmoto) 's Twitter Profile Photo

When to set Kubernetes limits — the version nobody tells juniors but every senior engineer actually follows: CPU limits: → Almost never. They throttle your pods, destroy latency, and create artificial bottlenecks. Let the kernel do its job — not Kubernetes micromanagement. CPU

apolinario 🌐 (@multimodalart) 's Twitter Profile Photo

LLaDA 2.1 is now merged to the diffusers library 🧨 language diffusion models are mature and usable. and now integrated combining transformers 🤝 diffusers I've just built a demo where you can play with the diffusion process 👇