Victor (@denverrxx) 's Twitter Profile
Victor

@denverrxx

math and llm’s

ID: 1432383045085106185

linkhttp://github.com/heisdenverr calendar_today30-08-2021 16:42:02

1,1K Tweet

363 Followers

598 Following

Saptak Bhoumik (@saptakbhoumik) 's Twitter Profile Photo

Finished my work early It explores a new memory system for AI that I came up with. It's my first paper so i would appriciate any kind of feedback. Link to the paper :- zenodo.org/records/152201… Make sure to share it with people who you think might be interested

status effects (@status_effects) 's Twitter Profile Photo

New work with Alec Radford and David Duvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

Mistral AI (@mistralai) 's Twitter Profile Photo

🆕 Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI. 🌎 Enterprise teams have capable models. What they don't have is a way to run them reliably in production. That's the gap Workflows fills. It takes AI-powered business processes

Tensara (@tensarahq) 's Twitter Profile Photo

PyPTX by Patrick C Toulme is available on Tensara now! Submit and benchmark PTX kernels directly from Python across 80+ problems like GEMM, MXFP/NVFP, Attention, and more!

Mistral Vibe (@mistralvibe) 's Twitter Profile Photo

Introducing remote agents in Vibe and Mistral Medium 3.5. You can now launch remote agents in the cloud, including from the CLI or Le Chat. Plus, new Work mode in Le Chat for complex, multi-step tasks. 🧵

Qwen (@alibaba_qwen) 's Twitter Profile Photo

Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools: 🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed 📂 Data — Classify &

Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools:

🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed
📂 Data — Classify &
Massimo (@rainmaker1973) 's Twitter Profile Photo

When you start a chess game, you have 20 possible moves available. After the first full move (White then Black), there are already over 400 possible positions. By the third move, that number jumps to around 8,900, and after the fourth it reaches nearly 200,000. By the time you

When you start a chess game, you have 20 possible moves available. After the first full move (White then Black), there are already over 400 possible positions. By the third move, that number jumps to around 8,900, and after the fourth it reaches nearly 200,000.

By the time you
Victor (@denverrxx) 's Twitter Profile Photo

Spent this week building a Sparse Autoencoder on Qwen2-0.5B from scratch — hooking into layers 9-14 MLP activations, training on CPU, finding neurons that fire on NHL trades and measurement units across 18k token activations Then Qwen drops this. Timing 🤝 Qwen

Mike (@elbasatguy) 's Twitter Profile Photo

Just pushed my CUDA-based spectrum analyzer to GitHub. Fast FFT, GPU-powered, built for SDR workflows. github.com/mebrown47/CUDA…

Just pushed my CUDA-based spectrum analyzer to GitHub.

Fast FFT, GPU-powered, built for SDR workflows.

github.com/mebrown47/CUDA…
Denis Wirtz (@deniswirtz) 's Twitter Profile Photo

Big paper coming out soon. Using AI, we mapped embryos of mice, alligators, turtles, rhesus macaques, and chickens in 3D and at single-cell resolution. We discovered something truly remarkable...stay tuned!

Big paper coming out soon.

Using AI, we mapped embryos of mice, alligators, turtles, rhesus macaques, and chickens in 3D and at single-cell resolution.

We discovered something truly remarkable...stay tuned!
Sebastian Raschka (@rasbt) 's Twitter Profile Photo

Here is a 2nd batch of April architecture drops. What a month! - Ant Ling 2.6 1T - Minimax M2.7 - Xiaomi MiMo V2.5 - Poolside Laguna XS.2 - Tencent Hy3-preview - IBM Granite 4.1

Here is a 2nd batch of April architecture drops. What a month!
- Ant Ling 2.6 1T
- Minimax M2.7
- Xiaomi MiMo V2.5
- Poolside Laguna XS.2
- Tencent Hy3-preview
- IBM Granite 4.1
Tomasz Limisiewicz (@tomlimi) 's Twitter Profile Photo

We present Compute Optimal Tokenization! 🔡 Common in LLM scaling works stick to one tokenizer, sweeping data/model size. But what happens when we control the tokenizer’s compression rate (bytes/token)? Here we sweep tokenizers, params, and data across compute budgets: [1/N]

himanshu dubey (@himanshustwts) 's Twitter Profile Photo

Today we introduce Physera to the world! Physera is an applied research and product lab working at the intersection of model efficiency and behavioural simulations. We are rethinking each layer of AI stack from first principles. 1. We believe there has been no better time to