Mario Sieg (@_mario_neo_) 's Twitter Profile
Mario Sieg

@_mario_neo_

Software Engineer | Hacker | Math @TUBerlin.
ML, Game Engines, Compilers, HPC.

ID: 1762642751114674176

linkhttps://github.com/MarioSieg calendar_today28-02-2024 00:56:14

54 Tweet

2,2K Followers

44 Following

Vincent Weisser (@vincentweisser) 's Twitter Profile Photo

Awesome work by Mario Sieg to accelerate quantization of pseudo-gradients in decentralized training settings like DiLoCo - already integrated in pccl (prime collective communication library)

Mario Sieg (@_mario_neo_) 's Twitter Profile Photo

To implement a GPT-2 in my custom PyTorch-like ML framework, I added boolean tensors. Boolean tensors are used for filtering, indexing and as attention and loss masks and much more. The main logical operators AND, OR, XOR and NOT are now supported. Another more step towards LLM

To implement a GPT-2 in my custom PyTorch-like ML framework, I added boolean tensors.

Boolean tensors are used for filtering, indexing and as attention and loss masks and much more.
The main logical operators AND, OR, XOR and NOT are now supported.

Another more step towards LLM
Mario Sieg (@_mario_neo_) 's Twitter Profile Photo

This is not PyTorch. It’s Magnetron - my tiny ML framework with a PyTorch-like API, designed for microcontrollers and IoT. Now supports nn.Module, nn.Linear, nn.Sequential, nn.ModuleList, nn.ModuleDict, and more. The API got very close to PyTorch the last month, more to come!

This is not PyTorch.

It’s Magnetron - my tiny ML framework with a PyTorch-like API, designed for microcontrollers and IoT.
Now supports nn.Module, nn.Linear, nn.Sequential, nn.ModuleList, nn.ModuleDict, and more.
The API got very close to PyTorch the last month, more to come!
Prime Intellect (@primeintellect) 's Twitter Profile Photo

Launching SYNTHETIC-2: our next-gen open reasoning dataset and planetary-scale synthetic data generation run. Powered by our P2P inference stack and DeepSeek-R1-0528, it verifies traces for the hardest RL tasks. Contribute towards AGI via open, permissionless compute.

Mario Sieg (@_mario_neo_) 's Twitter Profile Photo

Our fast quantization library piquant will support 2-bit quantization and new 4-bit kernels for even higher performance on AVX-512 CPUs in the next release. Get ready to crunch those packed integers!

Mario Sieg (@_mario_neo_) 's Twitter Profile Photo

Sometimes I have random creative "attacks" where I build random stuff. Last time it was techno music generated with pure code, this time it's a small cryptocurrency... It's not about money, it's about exploring, learning and having fun. This approach taught me 99% of what I

Mario Sieg (@_mario_neo_) 's Twitter Profile Photo

Upcoming features of piquant - our blazingly fast quantization library - int2 quantization - direct quanitization of bf16 tensors - sign quantization - SIMD kernels for stochastic rounding

Mario Sieg (@_mario_neo_) 's Twitter Profile Photo

magnetron's GPT-2 inference example is working!! one year ago I wrote the first C file to build a small pytorch clone, today LLMs can be implemented by it. Took a lot of hours and work to get everything right and I can't wait to continue with even more advanced models like

magnetron's GPT-2 inference example is working!!

one year ago I wrote the first C file to build a small pytorch clone, today LLMs can be implemented by it.
Took a lot of hours and work to get everything right and I can't wait to continue with even more advanced models like
Mario Sieg (@_mario_neo_) 's Twitter Profile Photo

My attempt at creating a creepy "Jarvis like" sci-fi horror AI assisant. Built him a year ago in C++ with raylib for rendering and whisper.cpp for speech recognition. The mouth animation definitely needs some work

Mario Sieg (@_mario_neo_) 's Twitter Profile Photo

my matrix multiplication kernels for magnetron now beat PyTorch's performance on my CPU: Magnetron matmul: 1955.2 GFLOP/s Torch matmul: 1752.3 GFLOP/s check out the kernel code: github.com/MarioSieg/magn… magnetron detects cache sizes and uses state of the art block tuning