Victor (@denverrxx) Twitter Tweets • TwiCopy

Saptak Bhoumik

a year ago

Finished my work early It explores a new memory system for AI that I came up with. It's my first paper so i would appriciate any kind of feedback. Link to the paper :- zenodo.org/records/152201… Make sure to share it with people who you think might be interested

thumb_up_off_alt4

chat_bubble_outline2

repeat2

shareShare

kache

@yacinemtb

3 months ago

you can outsource your thinking but you cannot outsource your understanding

thumb_up_off_alt1,1K

chat_bubble_outline62

repeat182

shareShare

status effects

@status_effects

10 days ago

New work with Alec Radford and David Duvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

thumb_up_off_alt2,2K

chat_bubble_outline141

repeat315

shareShare

Mistral AI

@mistralai

9 days ago

🆕 Today, we're releasing the public preview of Workflows, the orchestration layer for enterprise AI. 🌎 Enterprise teams have capable models. What they don't have is a way to run them reliably in production. That's the gap Workflows fills. It takes AI-powered business processes

thumb_up_off_alt1,1K

chat_bubble_outline75

repeat240

shareShare

Taarush Grover

@tagrtagr

9 days ago

Gonna tell my kids about this afternoon with Andrej Karpathy

Gonna tell my kids about this afternoon with <a href="/karpathy/">Andrej Karpathy</a>

thumb_up_off_alt969

chat_bubble_outline26

repeat5

shareShare

Tensara

@tensarahq

9 days ago

PyPTX by Patrick C Toulme is available on Tensara now! Submit and benchmark PTX kernels directly from Python across 80+ problems like GEMM, MXFP/NVFP, Attention, and more!

thumb_up_off_alt32

chat_bubble_outline3

repeat3

shareShare

Mistral Vibe

@mistralvibe

8 days ago

Introducing remote agents in Vibe and Mistral Medium 3.5. You can now launch remote agents in the cloud, including from the CLI or Le Chat. Plus, new Work mode in Le Chat for complex, multi-step tasks. 🧵

thumb_up_off_alt833

chat_bubble_outline32

repeat107

shareShare

Qwen

@alibaba_qwen

7 days ago

Today we’re releasing Qwen-Scope 🔭, an open suite of sparse autoencoders for the Qwen model family. It turns SAE features into practical tools： 🎯 Inference — Steer model outputs by directly manipulating internal features, no prompt engineering needed 📂 Data — Classify &

thumb_up_off_alt2,2K

chat_bubble_outline77

repeat309

shareShare

Massimo

@rainmaker1973

6 days ago

When you start a chess game, you have 20 possible moves available. After the first full move (White then Black), there are already over 400 possible positions. By the third move, that number jumps to around 8,900, and after the fourth it reaches nearly 200,000. By the time you

thumb_up_off_alt1,1K

chat_bubble_outline57

repeat413

shareShare

Victor

@denverrxx

6 days ago

Spent this week building a Sparse Autoencoder on Qwen2-0.5B from scratch — hooking into layers 9-14 MLP activations, training on CPU, finding neurons that fire on NHL trades and measurement units across 18k token activations Then Qwen drops this. Timing 🤝 Qwen

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Mike

@elbasatguy

6 days ago

Just pushed my CUDA-based spectrum analyzer to GitHub. Fast FFT, GPU-powered, built for SDR workflows. github.com/mebrown47/CUDA…

thumb_up_off_alt697

chat_bubble_outline14

repeat58

shareShare

Denis Wirtz

@deniswirtz

5 days ago

Big paper coming out soon. Using AI, we mapped embryos of mice, alligators, turtles, rhesus macaques, and chickens in 3D and at single-cell resolution. We discovered something truly remarkable...stay tuned!

thumb_up_off_alt4,4K

chat_bubble_outline117

repeat430

shareShare

Sebastian Raschka

@rasbt

4 days ago

Here is a 2nd batch of April architecture drops. What a month! - Ant Ling 2.6 1T - Minimax M2.7 - Xiaomi MiMo V2.5 - Poolside Laguna XS.2 - Tencent Hy3-preview - IBM Granite 4.1

thumb_up_off_alt835

chat_bubble_outline19

repeat115

shareShare

Tomasz Limisiewicz

@tomlimi

3 days ago

We present Compute Optimal Tokenization! 🔡 Common in LLM scaling works stick to one tokenizer, sweeping data/model size. But what happens when we control the tokenizer’s compression rate (bytes/token)? Here we sweep tokenizers, params, and data across compute budgets: [1/N]

thumb_up_off_alt575

chat_bubble_outline21

repeat90

shareShare

himanshu dubey

@himanshustwts

3 days ago

Today we introduce Physera to the world! Physera is an applied research and product lab working at the intersection of model efficiency and behavioural simulations. We are rethinking each layer of AI stack from first principles. 1. We believe there has been no better time to

thumb_up_off_alt408

chat_bubble_outline87

repeat29

shareShare

Prime Intellect

@primeintellect

2 days ago

thumb_up_off_alt1,1K

chat_bubble_outline31

repeat97

shareShare