y7xyz (@y7xyz_) Twitter Tweets • TwiCopy

David Hendrickson

10 days ago

🚀 IBM just soft-launched Granite 4.1! 🔥 It's a new family of dense, open-source models (Apache 2.0) built for real enterprise workloads. Another personal inferencing candidate. 📦 Full Family (128K context): • 30B: Highest performance • 8B: Sweet spot (GSM8K 92.5% •

thumb_up_off_alt150

chat_bubble_outline16

repeat16

shareShare

🧬Maxpein🧬

@maximumpain333

8 days ago

WE LIVE IN A MIND MATTER UNIVERSE Without a deep understanding of our own energy field we are missing out on the number contributor to our health and Well being. Even more, your field holds all of your trauma, memory, loops, patterns, and is essentially the fingerprint of your

thumb_up_off_alt901

chat_bubble_outline14

repeat268

shareShare

Tom Dörr

@tom_doerr

8 days ago

Operating system written in 1000 lines of code github.com/nuta/operating…

thumb_up_off_alt155

chat_bubble_outline1

repeat20

shareShare

Mario Nawfal’s Roundtable

@roundtablespace

7 days ago

Someone just publicly committed to beating Claude Code with a fully local alternative by end of year. They're building vllm-studio - a control panel for VLLM, SGLang, llama.cpp, and exllamav3. The local AI war just got a named target.

thumb_up_off_alt258

chat_bubble_outline30

repeat20

shareShare

AJ 💙

@itsmeajaykv

6 days ago

Qwen3.6-35B-A3B (TQ3_4S ~4bpw) on RTX 3060 (12GB) via llama.cpp-tq3 (TurboQuant): • ~619 t/s prompt (4K ctx) • ~60 t/s generation (128K ctx) • fits in ~12.4GB VRAM 128K context with usable decode speed on a single 3060 is kind of wild

thumb_up_off_alt604

chat_bubble_outline41

repeat61

shareShare

ollama

@ollama

5 days ago

🤯 Ollama now supports Claude Desktop via Claude’s built-in third party inference. ollama launch claude-desktop This allows all models from Ollama's Cloud to be used across Claude Cowork and Claude Code from the Claude Desktop app.

thumb_up_off_alt4,4K

chat_bubble_outline143

repeat480

shareShare

kosovi

@nimses1010

5 days ago

مشعوذ الصواريخ: السر الذي أخفته المخابرات عن 'فتح بوابات الأبعاد' عام 1946!" عالم صواريخ ومؤسس مختبر JPL التابع لناسا، قاد أبحاث وقود الصواريخ نهاراً، ومارس "سحر الثيليما" ليلاً تحت إشراف "أليستر كراولي". 🔹الوثيقة: ملفات الـ FBI (رقم 100-245448) تؤكد انتمائه لطائفة (Cult) تمارس

thumb_up_off_alt158

chat_bubble_outline4

repeat30

shareShare

ハカセアイ(Ai-Hakase)🐾最新トレンドＡＩのためのＸ 🐾

@ai_hakase_

5 days ago

Llama.cppがついにMTP対応！ローカルAIの生成速度が異次元へ 🚀 Llama.cppに待望の「Multi-Token Prediction（MTP）」ベータ版サポートが追加されました！複数の単語を一度に予測することで、ローカルLLMの動作が驚くほど高速化します。 🌟 注目ポイント・生成速度が最大1.5〜2.0倍に爆上がり

thumb_up_off_alt126

chat_bubble_outline2

repeat14

shareShare

Noctis

@sequencetraders

5 days ago

I found a VPS with these crazy specs, full root for only $4 😭 Should i tell you the name of the VPS?

thumb_up_off_alt1,1K

chat_bubble_outline86

repeat60

shareShare

Witcheer | b/era

@0xwitcheer

4 days ago

benched 5 open-source models on my windows tower (RTX 4060 Ti 8GB, Ryzen 5 7600x, 32GB DDR5). all q4_k_m, LM Studio, full GPU offload, 16K context. results: > nemotron-3-nano-4b 80.7 t/s, 3.6GB VRAM. fastest small model I've benched on 8GB. > gemma-4-e4b 68.5 t/s, 6.0GB

thumb_up_off_alt74

chat_bubble_outline10

repeat6

shareShare

Unsloth AI

@unslothai

4 days ago

We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw. Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp Guide: unsloth.ai/docs/basics/api

thumb_up_off_alt1,1K

chat_bubble_outline44

repeat235

shareShare

Hugging Models

@huggingmodels

4 days ago

Meet Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-int4-AutoRound. A massive 27B parameter model that combines Qwen3.5's reasoning with Claude Opus distillation. This is advanced text and image reasoning compressed into a 4-bit quantized package, making it runnable on consumer

thumb_up_off_alt506

chat_bubble_outline15

repeat43

shareShare

Hugging Models

@huggingmodels

4 days ago

Just dropped: storagejuju/kimi-k2.6-ud-q8-k-xl-juju. A custom compressed model using kimi_k25 architecture. It's designed for efficient, high-performance inference in the US region. Get ready for a new level of AI speed.

thumb_up_off_alt7

chat_bubble_outline1

repeat1

shareShare

Google AI Developers

@googleaidevs

4 days ago

Speed up your Gemma 4 workflows by up to 3x with Multi-Token Prediction (MTP) drafters. Standard LLM inference is fundamentally memory-bandwidth bound, creating a latency bottleneck as billions of parameters travel from VRAM just to generate a single token. We're working to ease

thumb_up_off_alt1,1K

chat_bubble_outline23

repeat116

shareShare

Ton Incubator

@ton_incubator

4 days ago

Elon Musk said he once had dinner with a top physicist and a top computer scientist and asked them what they thought the probability was that we are living in a simulation. They answered simultaneously, 0% and 100% respectively. It was like a double-slit experiment, but with

thumb_up_off_alt1,1K

chat_bubble_outline258

repeat208

shareShare

Basic Apple Guy

@basicappleguy

4 days ago

Apple removed the 256GB memory option for the M3 Ultra Mac Studio. Only configuration is now 96GB.

thumb_up_off_alt1,1K

chat_bubble_outline50

repeat46

shareShare

vLLM

@vllm_project

4 days ago

🚀 Day-0 MTP support for Gemma4 now available at vLLM with ready-to-use docker image! ⚡️Enjoy up to 3x faster decoding performance to supercharge your development with zero quality degradation! Check out the full vLLM recipes for Gemma 4 model series👇 recipes.vllm.ai/Google/gemma-4…

thumb_up_off_alt902

chat_bubble_outline18

repeat99

shareShare

طريق البيتكوين

@bitcoin_way

4 days ago

📶 مؤشر جديد Silicon Data يتتبع تكلفة "الوقود الرقمي" — أي تكلفة تشغيل نماذج الذكاء الاصطناعي لكل مليون رمز 📊 الفكرة الأساسية: كما يوجد مؤشرات لأسعار النفط أو المعادن، هذا موشر للذكاء الاصطناعي. هذا المؤشر يقيس 'الإنفاق على رموز نماذج اللغة الكبيرة للذكاء الاصطناعي ' (LLM Token

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Akshay 🚀

@akshay_pachaar

3 days ago

NVIDIA + Unsloth just dropped a guide on making fine-tuning 25% faster. this is hands-down the cleanest systems-level writeup i've read. you'll learn how 3 optimizations help your gpu train models faster: 1. packed-sequence metadata caching 2. double-buffered checkpoint

thumb_up_off_alt279

chat_bubble_outline12

repeat49

shareShare

Perplexity

@perplexity_ai

3 days ago

We’ve developed our own inference engine Runtime-Optimized Serving Engine (ROSE) to serve models ranging from embeddings to trillion-parameter LLMs. With CuTeDSL integrated into our inference engine, Perplexity can build the specialized GPU kernels faster to bring models up to

thumb_up_off_alt1,1K

chat_bubble_outline71

repeat120

shareShare

y7xyz

David Hendrickson

🧬Maxpein🧬

Tom Dörr

Mario Nawfal’s Roundtable

AJ 💙

ollama

kosovi

ハカセ アイ(Ai-Hakase)🐾最新トレンドＡＩのためのＸ 🐾

Noctis

Witcheer | b/era

Unsloth AI

Hugging Models

Hugging Models

Google AI Developers

Ton Incubator

Basic Apple Guy

vLLM

طريق البيتكوين

Akshay 🚀

Perplexity

ハカセアイ(Ai-Hakase)🐾最新トレンドＡＩのためのＸ 🐾