Daniel Han (@danielhanchen) Twitter Tweets • TwiCopy

Daniel Han

@danielhanchen

+ Follow

Building @UnslothAI. Finetune LLMs 30x faster https://t.co/aRyAAgKOR7. Prev ML at NVIDIA. Hyperlearn used by NASA. I like maths, making code go fast

ID:717359704226172928

linkhttps://unsloth.ai/ calendar_today05-04-2016 14:34:16

703 Tweets

7,0K Followers

930 Following

Daniel Han

3 days ago

Phi-3's sliding window is 2048 and not 2047! So not an odd number! Glad it got resolved quickly!

Also looks like in fact Phi-3 (3.8b) uses sliding window attention like Mistral. 2048 context length, but SWA up to 4096.

Link to PR: huggingface.co/microsoft/Phi-…

Phi-3's sliding window is 2048 and not 2047! So not an odd number! Glad it got resolved quickly! Also looks like in fact Phi-3 (3.8b) uses sliding window attention like Mistral. 2048 context length, but SWA up to 4096. Link to PR: huggingface.co/microsoft/Phi-…

thumb_up_off_alt68

chat_bubble_outline0

account_circle

Daniel Han

4 days ago

Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to Unsloth AI:

1. Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096?
2. Upcasted RoPE? Like Gemma?
3. Dynamic

Phi 3 (3.8B) got released! The paper said it was just a Llama arch, but I found some quirks while adding this to @UnslothAI: 1. Sliding window of 2047? Mistral v1 4096. So does Phi mini have SWA? (And odd num?) Max RoPE position is 4096? 2. Upcasted RoPE? Like Gemma? 3. Dynamic

thumb_up_off_alt494

chat_bubble_outline0

account_circle

Unsloth AI

4 days ago

Unsloth is currently trending on Github! 🙌🦥

If you want to finetune LLMs like Llama 3 or Mistral, now is a good time to try!⭐️

github.com/unslothai/unsl…

Unsloth is currently trending on Github! 🙌🦥 If you want to finetune LLMs like Llama 3 or Mistral, now is a good time to try!⭐️ github.com/unslothai/unsl…

thumb_up_off_alt97

chat_bubble_outline0

account_circle

Daniel Han

4 days ago

Phi-3 Mini 3.8b Instruct is out!!
68.8 MMLU vs Llama-3 8b Instruct's 66.0 MMLU (Phi team's own evals)

The long context 128K model is also out at huggingface.co/microsoft/Phi-…

Working on adding this into Unsloth AI! Some fused linear modules need unfusing :)
huggingface.co/microsoft/Phi-…

thumb_up_off_alt332

chat_bubble_outline0

account_circle

Aran Komatsuzaki

@arankomatsuzaki

4 days ago

Microsoft just released Phi-3

- phi-3-mini: 3.8B model trained on 3.3T tokens rivals Mixtral 8x7B and GPT-3.5
- phi-3-medium: 14B model trained on 4.8T tokens w/ 78% on MMLU and 8.9 on MT-bench

arxiv.org/abs/2404.14219

Microsoft just released Phi-3 - phi-3-mini: 3.8B model trained on 3.3T tokens rivals Mixtral 8x7B and GPT-3.5 - phi-3-medium: 14B model trained on 4.8T tokens w/ 78% on MMLU and 8.9 on MT-bench arxiv.org/abs/2404.14219

thumb_up_off_alt807

chat_bubble_outline0

account_circle

Jeremy Howard

4 days ago

Today at Answer.AI we've got something new for you: FSDP/QDoRA. We've tested it with AI at Meta Llama3 and the results blow away anything we've seen before.

I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵

Today at @answerdotai we've got something new for you: FSDP/QDoRA. We've tested it with @AIatMeta Llama3 and the results blow away anything we've seen before. I believe that this combination is likely to create better task-specific models than anything else at any cost. 🧵

thumb_up_off_alt1,8K

chat_bubble_outline0

account_circle

Daniel Han

5 days ago

We're so back!

thumb_up_off_alt23

chat_bubble_outline0

account_circle

Daniel Han

5 days ago

🤗

thumb_up_off_alt19

chat_bubble_outline0

account_circle

Daniel Han

5 days ago

Oh no :(

thumb_up_off_alt41

chat_bubble_outline0

account_circle

Daniel Han

1 week ago

Highly recommend Kaggle! You get Tesla T4 GPUs I think 12 hour runs, and 30 hours for free per week!

I also have a Unsloth AI Kaggle notebook for Llama-3 8B which makes finetuning 2x faster and use 60% less VRAM! Kaggle notebook: kaggle.com/code/danielhan…

thumb_up_off_alt261

chat_bubble_outline0

account_circle

Andrej Karpathy

1 week ago

🔥llm.c update: Our single file of 2,000 ~clean lines of C/CUDA code now trains GPT-2 (124M) on GPU at speeds ~matching PyTorch (fp32, no flash attention)
github.com/karpathy/llm.c…

On my A100 I'm seeing 78ms/iter for llm.c and 80ms/iter for PyTorch. Keeping in mind this is fp32,

🔥llm.c update: Our single file of 2,000 ~clean lines of C/CUDA code now trains GPT-2 (124M) on GPU at speeds ~matching PyTorch (fp32, no flash attention) github.com/karpathy/llm.c… On my A100 I'm seeing 78ms/iter for llm.c and 80ms/iter for PyTorch. Keeping in mind this is fp32,

thumb_up_off_alt5,2K

chat_bubble_outline0

account_circle

Teknium (e/λ)

1 week ago

I think Meta and Llama-3 is the final nail in the coffin to several misconceptions I've been fighting against for the last year.

Llama-3 Chat was trained on over 10M Instruction/Chat samples, and is one of the only finetunes that shows significant improvements to MMLU.

I think Meta and Llama-3 is the final nail in the coffin to several misconceptions I've been fighting against for the last year. Llama-3 Chat was trained on over 10M Instruction/Chat samples, and is one of the only finetunes that shows significant improvements to MMLU.

thumb_up_off_alt1,1K

chat_bubble_outline0

account_circle

Gavin Uberti

1 week ago

Llama-3 400B is a dense model. From the Dwarkesh podcast with Zuckerberg:

Llama-3 400B is a dense model. From the Dwarkesh podcast with Zuckerberg:

thumb_up_off_alt13

chat_bubble_outline0

account_circle

Annie ❤️‍🔥💫

@AnnieLiao_2000

1 week ago

Come build with us and get $100K+ in credits!

Build Club have just launched a 6 week AI accelerator for top builders, backed by AWS 🚀✨

Build with the likes of Daniel Han, Micah Hill-Smith and receive mentorship with swyx , Logan Kilpatrick + more...

Details 🧵

thumb_up_off_alt86

chat_bubble_outline0

account_circle

Daniel Han

1 week ago

For burning Qs, if you don't know, Unsloth AI has a wiki page at github.com/unslothai/unsl…!
1. How to update Unsloth so Llama-3 works
2. Fix OOM in eval loops
3. Train lm_head, embed_tokens
4. Resume from checkpoint
5. Saving to GGUF
6. Chat templates
7. Enable 2x faster inference

thumb_up_off_alt74

chat_bubble_outline0

account_circle

Andrej Karpathy

1 week ago

Congrats to AI at Meta on Llama 3 release!! 🎉
ai.meta.com/blog/meta-llam…
Notes:

Releasing 8B and 70B (both base and finetuned) models, strong-performing in their model class (but we'll see when the rankings come in @ lmsys.org :))
400B is still training, but already encroaching

thumb_up_off_alt7,9K

chat_bubble_outline0

account_circle