scissorstail (@scissorstail_) Twitter Tweets • TwiCopy

scissorstail

@scissorstail_

+ Follow

Web dev | Exploring LLM as a hobby after work.

ID: 1948956471594758144

calendar_today26-07-2025 04:00:25

65 Tweet

9 Followers

771 Following

scissorstail

@scissorstail_

13 days ago

So adding QK Norm doesn’t always work well, huh? I’m not exactly sure when it works well.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

scissorstail

@scissorstail_

13 days ago

I’m running the experiments on two RTX 6000 Ada GPUs. Hmm… honestly, I’m not sure if this is the best approach. Sometimes I wonder if it might be better to just rent 8× H100s for a short time.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Why is it that when there are two processes caching the same dataset, even if one has finished, I still have to wait for the other to process the exact same dataset again? It’s really puzzling, but I’ve decided to just be content with setting NCCL_TIMEOUT=64000000.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

scissorstail

@scissorstail_

12 days ago

Most of the training issues I’ve faced seem to be solved by finding the "init_process_group" function somewhere, or the function that calls it, and then adding the necessary code on top of it.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

scissorstail

@scissorstail_

11 days ago

I’m attempting to train fineweb-10B for 1 epoch with a 24×2 batch size.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

scissorstail

@scissorstail_

10 days ago

OK.. Done. I feel sad

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

scissorstail

@scissorstail_

9 days ago

I tried running a benchmark, but it doesn’t seem very meaningful. I’ll push a little further and then probably go back to the starting point to look for a new approach. This time, I really thought it was going to work...

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

scissorstail

@scissorstail_

7 days ago

I think I’ve run out of ideas.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare