Avinash Sooriyarachchi (@avitwit3) Twitter Tweets • TwiCopy

Andrej Karpathy

2 years ago

# on shortification of "learning" There are a lot of videos on YouTube/TikTok etc. that give the appearance of education, but if you look closely they are really just entertainment. This is very convenient for everyone involved : the people watching enjoy thinking they are

thumb_up_off_alt16,16K

chat_bubble_outline695

repeat3,3K

shareShare

Avinash Sooriyarachchi

@avitwit3

2 years ago

I wanted to extend the simple from-scratch MoE LM implementation I wrote with expert capacity. Given Grok-1 is open source, hope this helps understand MoEs a bit better. Again the base for this is makemore/ nanoGPT from⁦⁦⁦⁦Andrej Karpathy⁩ huggingface.co/blog/AviSoori1…

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Avinash Sooriyarachchi

@avitwit3

2 years ago

I took a stab at implementing a vision language model from scratch in pure PyTorch. The inspiration for this is moondream 2 from vik . I basically modified makemore from Andrej Karpathy and built everything else around it. Here’s my write up: huggingface.co/blog/AviSoori1…

thumb_up_off_alt185

chat_bubble_outline3

repeat31

shareShare

Devendra Chaplot

@dchaplot

2 years ago

Excited to announce Mistral-NeMo 12B trained in collab with NVIDIA! - Outperforms Gemma2 9B and Llama3 8B - 128K context - Multilingual in 100+ languages: excels in European, Asian & Indian languages - Quant-Aware Training at FP8 - Apache 2.0 Blog: mistral.ai/news/mistral-n…

Excited to announce Mistral-NeMo 12B trained in collab with <a href="/nvidia/">NVIDIA</a>!
- Outperforms Gemma2 9B and Llama3 8B
- 128K context
- Multilingual in 100+ languages: excels in European, Asian & Indian languages
- Quant-Aware Training at FP8
- Apache 2.0

Blog: mistral.ai/news/mistral-n…

thumb_up_off_alt167

chat_bubble_outline3

repeat22

shareShare

Jeremy Howard

@jeremyphoward

2 years ago

Why didn't anyone tell me how amazingly-great Bootstrap has gotten in recent years? Which I'd known sooner -- would have saved me so much time futzing around with tailwind classes. getbootstrap.com

thumb_up_off_alt412

chat_bubble_outline22

repeat17

shareShare

Guillaume Lample @ NeurIPS 2024

@guillaumelample

2 years ago

Today, we release Mistral Large 2, the new version of our largest model. Mistral Large 2 is a 123B-parameter model with a 128k context window. On many benchmarks (notably in code generation and math), it is superior or on par with Llama 3.1 405B. Like Mistral NeMo, it was trained

thumb_up_off_alt2,2K

chat_bubble_outline47

repeat272

shareShare

Mistral AI

@mistralai

2 years ago

mistral.ai/news/build-twe…

thumb_up_off_alt756

chat_bubble_outline57

repeat108

shareShare

Avinash Sooriyarachchi

@avitwit3

2 years ago

I see a bunch of people look at benchmarks and think of Large 2 as ‘the other model’ and not as performant as 3.5 Sonnet and 4o. Honestly till you try it out for your particular use case, you really wouldn’t know. If it doesn’t cut it, it’s fine. But at least you know

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Avinash Sooriyarachchi

@avitwit3

2 years ago

This is super cool!

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Mistral AI

@mistralai

2 years ago

mistral.ai/news/ministrau…

thumb_up_off_alt912

chat_bubble_outline32

repeat125

shareShare

Avinash Sooriyarachchi

@avitwit3

2 years ago

I’ve seen a lot of interest from developers to reduce cost and deploy LLMs on device. With these new models from Mistral AI and our QAT stack, on device deployments without degradation is a reality. Amazing work Pierre Stock Sandeep Subramanian Teven Le Scao and team!!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Grant Sanderson

@3blue1brown

a year ago

I learned yesterday the video I made in 2017 explaining how Bitcoin works was taken down, and my channel received a copyright strike (despite it being 100% my own content). The request seems to have been issued by a company chainpatrol, on behalf of Arbitrum, whose website says

thumb_up_off_alt27,27K

chat_bubble_outline518

repeat1,1K

shareShare

Jeremy Howard

@jeremyphoward

a year ago

The first pytorch release without official conda support:

thumb_up_off_alt505

chat_bubble_outline33

repeat21

shareShare

Mistral AI

@mistralai

a year ago

magnet:?xt=urn:btih:11f2d1ca613ccf5a5c60104db9f3babdfa2e6003&dn=Mistral-Small-3-Instruct&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=http%3A%2F%https://t.co/ua2yzvEYLu%3A1337%2Fannounce

thumb_up_off_alt5,5K

chat_bubble_outline309

repeat506

shareShare

Mistral AI

@mistralai

9 months ago

Introducing Mistral Medium 3.1. Overall performance boost, tone improvement, smarter web searches. Try it now in Le Chat (default model) or via our API (`mistral-medium-2508`).

thumb_up_off_alt2,2K

chat_bubble_outline113

repeat265

shareShare

Avinash Sooriyarachchi

@avitwit3

9 months ago

We do still ship in August FYI ;)

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Avinash Sooriyarachchi

@avitwit3

5 months ago

Proud to share the first public model I worked on at Mistral AI. A decoder-only LLM optimized for creative writing, narrative generation, roleplay, and character-driven dialogue. Now live via API as labs-mistral-small-creative docs.mistral.ai/models/mistral…

thumb_up_off_alt20

chat_bubble_outline2

repeat1

shareShare