scissorstail (@scissorstail_) 's Twitter Profile
scissorstail

@scissorstail_

Web dev | Exploring LLM as a hobby after work.

ID: 1948956471594758144

calendar_today26-07-2025 04:00:25

65 Tweet

9 Followers

771 Following

scissorstail (@scissorstail_) 's Twitter Profile Photo

I’m running the experiments on two RTX 6000 Ada GPUs. Hmm… honestly, I’m not sure if this is the best approach. Sometimes I wonder if it might be better to just rent 8× H100s for a short time.

scissorstail (@scissorstail_) 's Twitter Profile Photo

Why is it that when there are two processes caching the same dataset, even if one has finished, I still have to wait for the other to process the exact same dataset again? It’s really puzzling, but I’ve decided to just be content with setting NCCL_TIMEOUT=64000000.

scissorstail (@scissorstail_) 's Twitter Profile Photo

Most of the training issues I’ve faced seem to be solved by finding the "init_process_group" function somewhere, or the function that calls it, and then adding the necessary code on top of it.

Most of the training issues I’ve faced seem to be solved by finding the "init_process_group" function somewhere, or the function that calls it, and then adding the necessary code on top of it.
scissorstail (@scissorstail_) 's Twitter Profile Photo

I tried running a benchmark, but it doesn’t seem very meaningful. I’ll push a little further and then probably go back to the starting point to look for a new approach. This time, I really thought it was going to work...

I tried running a benchmark, but it doesn’t seem very meaningful. I’ll push a little further and then probably go back to the starting point to look for a new approach. This time, I really thought it was going to work...