Mario Sieg (@_mario_neo_) Twitter Tweets • TwiCopy

Mario Sieg

@_mario_neo_

+ Follow

Software Engineer | Hacker | Math @TUBerlin.
ML, Game Engines, Compilers, HPC.

ID: 1762642751114674176

linkhttps://github.com/MarioSieg calendar_today28-02-2024 00:56:14

54 Tweet

2,2K Followers

44 Following

Johannes Hagemann

@johannes_hage

6 months ago

great work by Mario Sieg, already integrated into PCCL to make quantization of pseudo-gradients in DiLoCo lightning fast

thumb_up_off_alt58

chat_bubble_outline0

repeat6

shareShare

Vincent Weisser

@vincentweisser

6 months ago

Awesome work by Mario Sieg to accelerate quantization of pseudo-gradients in decentralized training settings like DiLoCo - already integrated in pccl (prime collective communication library)

thumb_up_off_alt68

chat_bubble_outline5

repeat7

shareShare

To implement a GPT-2 in my custom PyTorch-like ML framework, I added boolean tensors. Boolean tensors are used for filtering, indexing and as attention and loss masks and much more. The main logical operators AND, OR, XOR and NOT are now supported. Another more step towards LLM

thumb_up_off_alt14

chat_bubble_outline0

repeat1

shareShare

Mario Sieg

@_mario_neo_

6 months ago

This is not PyTorch. It’s Magnetron - my tiny ML framework with a PyTorch-like API, designed for microcontrollers and IoT. Now supports nn.Module, nn.Linear, nn.Sequential, nn.ModuleList, nn.ModuleDict, and more. The API got very close to PyTorch the last month, more to come!

thumb_up_off_alt29

chat_bubble_outline2

repeat1

shareShare

Mario Sieg

@_mario_neo_

5 months ago

Seems like I've come to a point where my C code crashes a modern LLVM compiler and makes it spit out LLVM IR 🙄

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Prime Intellect

@primeintellect

5 months ago

Launching SYNTHETIC-2: our next-gen open reasoning dataset and planetary-scale synthetic data generation run. Powered by our P2P inference stack and DeepSeek-R1-0528, it verifies traces for the hardest RL tasks. Contribute towards AGI via open, permissionless compute.

thumb_up_off_alt463

chat_bubble_outline30

repeat80

shareShare

Mathieu

@miniapeur

5 months ago

thumb_up_off_alt2,2K

chat_bubble_outline10

repeat317

shareShare

Mario Sieg

@_mario_neo_

5 months ago

Our fast quantization library piquant will support 2-bit quantization and new 4-bit kernels for even higher performance on AVX-512 CPUs in the next release. Get ready to crunch those packed integers!

thumb_up_off_alt26

chat_bubble_outline2

repeat1

shareShare

Mario Sieg

@_mario_neo_

4 months ago

Sometimes I have random creative "attacks" where I build random stuff. Last time it was techno music generated with pure code, this time it's a small cryptocurrency... It's not about money, it's about exploring, learning and having fun. This approach taught me 99% of what I

thumb_up_off_alt13

chat_bubble_outline1

repeat0

shareShare

Mario Sieg

@_mario_neo_

4 months ago

Upcoming features of piquant - our blazingly fast quantization library - int2 quantization - direct quanitization of bf16 tensors - sign quantization - SIMD kernels for stochastic rounding

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Vincent Weisser

@vincentweisser

4 months ago

Chris Barber gabriel Noam Brown Jason Wei rohan anil will brown finbarr Jeremy Howard xjdr doomslide Casey Flint great list, missing (totally unbiased selection): Johannes Hagemann samsja Justus Mattern Grad Jackmin mike64_t Mario Sieg

thumb_up_off_alt23

chat_bubble_outline0

repeat1

shareShare

Mario Sieg

@_mario_neo_

4 months ago

magnetron's GPT-2 inference example is working!! one year ago I wrote the first C file to build a small pytorch clone, today LLMs can be implemented by it. Took a lot of hours and work to get everything right and I can't wait to continue with even more advanced models like

thumb_up_off_alt89

chat_bubble_outline7

repeat4

shareShare

Mario Sieg

@_mario_neo_

4 months ago

My attempt at creating a creepy "Jarvis like" sci-fi horror AI assisant. Built him a year ago in C++ with raylib for rendering and whisper.cpp for speech recognition. The mouth animation definitely needs some work

thumb_up_off_alt13

chat_bubble_outline6

repeat0

shareShare

Mario Sieg

@_mario_neo_

3 months ago

my matrix multiplication kernels for magnetron now beat PyTorch's performance on my CPU: Magnetron matmul: 1955.2 GFLOP/s Torch matmul: 1752.3 GFLOP/s check out the kernel code: github.com/MarioSieg/magn… magnetron detects cache sizes and uses state of the art block tuning

thumb_up_off_alt330

chat_bubble_outline11

repeat27

shareShare