NicoNico🦇🔊 (@niangao_g) Twitter Tweets • TwiCopy

NicoNico🦇🔊

@niangao_g

+ Follow

🎓 PhD student @ Hasso Plattner Institute | 🤖 Crafting smarter, not harder AI | Exploring the edges of efficiency

ID: 802190125757124608

calendar_today25-11-2016 16:40:04

592 Tweet

168 Takipçi

2,2K Takip Edilen

NicoNico🦇🔊

@niangao_g

2 years ago

This saves all GPU poor 😃 #llms #mlx #ChatGPT #Startup

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

nat | localhost: auriel 🇺🇦🇬🇪🇪🇺

@theaiobserverx

2 years ago

Green-Bit-LLM github.com/GreenBitAI/gre…

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Happy to share DeepMixtral-8x7b-Instruct. A direct extraction/transfer of Mixtral Instruct's experts into Deepseek's architecture. Performance is identical, if not even a bit better, and seems more malleable to training. Collaborators Eric Hartford Fernando Fernandes Neto.

thumb_up_off_alt122

chat_bubble_outline9

repeat14

shareShare

NicoNico🦇🔊

@niangao_g

2 years ago

Got some nice pictures from GPT-4 👉huggingface.co/GreenBitAI 👉github.com/GreenBitAI/gre…

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

Ramsri Goutham Golla

@ramsri_goutham

2 years ago

Honestly my thought currently running 2 production AI SaaS apps! 1. Anthropic's Claude is too weak - Ask it to be sarcastic it say's I can't hurt sentiments etc. Bruh! Lacking JSON mode is a bummer. You can try workarounds but any JSON with nesting is a pain! 2. Even if open

thumb_up_off_alt21

chat_bubble_outline3

repeat1

shareShare

Awni Hannun

@awnihannun

2 years ago

pip install -U mlx

thumb_up_off_alt85

chat_bubble_outline3

repeat10

shareShare

NicoNico🦇🔊

@niangao_g

2 years ago

It is good to know that apple's edge deployment uutilizes the same low-bit layout (a mix of INT4/INT2 ) as green-bit-llm. huggingface.co/blog/NicoNico/…

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Awni Hannun

@awnihannun

5 months ago

I'm super excited about M5. It's going to help a lot with compute-bound workloads in MLX. For example: - Much faster prefill. In other words time-to-first-token will go down. - Faster image / video generation - Faster fine-tuning (LoRA or otherwise) - Higher throughput for

thumb_up_off_alt1,1K

chat_bubble_outline53

repeat106

shareShare

Awni Hannun

@awnihannun

5 months ago

LoRA fine-tuning Qwen3 4B on the DGX spark with mlx / mlx-lm. Gets a very respectable ~1200 tok/sec.

thumb_up_off_alt273

chat_bubble_outline15

repeat16

shareShare

Ido Salomon

@idosal1

2 months ago

Building AgentCraft v1 with AgentCraft v0 is 🤌 Managed up to 9 Claude Code agents with the RTS interface so far. There's a lot to explore, but it feels right. v1 coming soon

thumb_up_off_alt2,2K

chat_bubble_outline195

repeat159

shareShare

NicoNico🦇🔊

NicoNico🦇🔊

nat | localhost: auriel 🇺🇦🇬🇪🇪🇺

Lucas Atkins

NicoNico🦇🔊

Ramsri Goutham Golla

Awni Hannun

NicoNico🦇🔊

Awni Hannun

Awni Hannun

Ido Salomon