profile-img
Robin Jia

@robinomial

Assistant Professor @CSatUSC | Previously Visiting Researcher @facebookai | Stanford CS PhD @StanfordNLP

calendar_today28-06-2018 17:50:35

174 Tweets

3,2K Followers

764 Following

Robin Jia(@robinomial) 's Twitter Profile Photo

How do Transformers really do in-context linear regression? 1 TF layer = 3 steps of a second-order method! Can’t be GD, which converges exponentially more slowly. Meanwhile, LSTMs are more like online GD; they don’t learn second-order optimization (likely due to limited memory).

account_circle