Anirudh Goyal (@anirudhg9119) 's Twitter Profile
Anirudh Goyal

@anirudhg9119

Thinking about thinking.

Spent time at @Berkeley_EECS, @MPI_IS, @GoogleDeepMind.

ID: 2816636344

linkhttps://anirudh9119.github.io/ calendar_today08-10-2014 19:47:25

901 Tweet

5,5K Followers

518 Following

Anirudh Goyal (@anirudhg9119) 's Twitter Profile Photo

Temporal Latent Bottleneck combines recurrence and self-attention in an unified way. Recurrence integrates information over time, and self-attention models local dependencies in "short" context. arxiv.org/abs/2205.14794

Temporal Latent Bottleneck combines recurrence and self-attention in an unified way. Recurrence integrates information over time, and self-attention models local dependencies in "short" context.

arxiv.org/abs/2205.14794
Anirudh Goyal (@anirudhg9119) 's Twitter Profile Photo

Discrete Key-Value Bottleneck (Updated) Compresses the information of a pre-trained model in learnable "key-value" codebook such that knowledge can be quickly adapted in a continual learning fashion. arxiv.org/abs/2207.11240

Discrete Key-Value Bottleneck  (Updated)

Compresses the information of a pre-trained model in learnable "key-value" codebook such that knowledge can be quickly adapted in a continual learning fashion.

arxiv.org/abs/2207.11240
Anirudh Goyal (@anirudhg9119) 's Twitter Profile Photo

Interesting progress from Rafael Rafailov @ NeurIPS and FTP et. al following our work (applied to mathematical and commonsense reasoning): Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning arxiv.org/abs/2405.00451 (Also discussed in Llama-3 paper, AI at Meta )

Interesting progress from <a href="/rm_rafailov/">Rafael Rafailov @ NeurIPS</a> and <a href="/DivGarg9/">FTP</a> et. al following our work (applied to mathematical and commonsense reasoning):

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

arxiv.org/abs/2405.00451

(Also discussed in Llama-3 paper, <a href="/AIatMeta/">AI at Meta</a> )
Nan Rosemary Ke (@rosemary_ke) 's Twitter Profile Photo

Boosting LLM Performance with Dynamic Skill Selection! 1/ ๐Ÿš€ What if LLMs could get better at solving math problems by understanding the skills they need? We explored this idea by having LLMs identify and label the skills required for each problem. arxiv.org/abs/2405.12205

Michael Qizhe Shieh (@mpulsewidth) 's Twitter Profile Photo

To me, diffusion LMs work because they remove unnecessary inductive biases. The left-to-right inductive bias is natural for human but is unlikely to be natural for AI. This gives more capacity to our models like Transformer having a bigger capacity than LSTM. Our experiment