tbh feels beyond stupid that every token is weighted the same in llm training loss, particularly for instruct or task-specific use cases. most tokens are filler and thus noise relative to what you want the model to really learn.
can anyone point me to research on using chain-of-thought (or tree-of-thought) to generate a synthetic dataset that basically converts system 2 thought (the chain or tree) into system 1 thought (just standard next token prediction)?
cora since the aughts we’ve all received nonstop and pervasive media/social media pipe gen activity from VC’s and founders parading around as self actualization performance narratives. this has shifted egoistic preferences materializing in everyone thinking they need to be founders.
Many people are hating on this video, but I actually think it's a fascinating display of the two very distinct modes that exist to relate with reality: mimesis vs. first principles thinking.
95% of people operate by mimesis. Truth doesn't matter to them as much as getting