Chris Krempel (@nudelbrot) 's Twitter Profile
Chris Krempel

@nudelbrot

I'm hacking transformers on arch btw.

ID: 80350379

linkhttps://ohmytofu.ai calendar_today06-10-2009 17:07:14

1,1K Tweet

136 Takipçi

354 Takip Edilen

Chris Krempel (@nudelbrot) 's Twitter Profile Photo

Listen, you do not need LangSmith or a ML observability platform for tracing agents. All you need is OpenTelemetry (e.g. jaeger-all-in-one) and a few lines of python

Listen, you do not need LangSmith or a ML observability platform for tracing agents.
All you need is OpenTelemetry (e.g. jaeger-all-in-one) and a few lines of python
Chris Krempel (@nudelbrot) 's Twitter Profile Photo

Anthropic is scaling Sparse Autoencoders to their Sonnet model. IMO model interpretability is one of the most exciting research directions RN. (Discl: the many unknowns are acknowledged by the field, they are most certain that they don’t understand much) transformer-circuits.pub/2024/scaling-m…

Anthropic is scaling Sparse Autoencoders to their Sonnet model. IMO model interpretability is one of the most exciting research directions RN.

(Discl: the many unknowns are acknowledged by the field, they are most certain that they don’t understand much)
transformer-circuits.pub/2024/scaling-m…
Chris Krempel (@nudelbrot) 's Twitter Profile Photo

Fast - high learning rate - pre training runs over many epochs (45) at low batch sizes (16k tok) can give you a pretty good estimate of how your actual slow, high batch size - 0.5 million tok -run will behave. Pre train. GPT2/3 medium here. The preview runs are only 15 min each.

Fast - high learning rate -  pre training runs over many epochs (45) at low batch sizes (16k tok) can give you a pretty good estimate of how your actual slow, high batch size - 0.5 million tok -run will behave. Pre train. GPT2/3 medium here. The preview runs are only 15 min each.
Chris Krempel (@nudelbrot) 's Twitter Profile Photo

What's the current best bang4buck to buy a desktop GPU machine that at least matches A6000 in memory bandwidth (768 GB/sec). Mac studios are only at 500 GB/sec, NV Spark even lower.