๐Ÿ‘โ€ ษษฏษu ๐Ÿ‘โ€ (@aman_gif) 's Twitter Profile
๐Ÿ‘โ€ ษษฏษu ๐Ÿ‘โ€

@aman_gif

searching for the truth

ID: 364804401

calendar_today30-08-2011 11:11:41

2,2K Tweet

874 Followers

859 Following

Iro Armeni (@ir0armeni) 's Twitter Profile Photo

Just dropped: Rectified Point Flow Can we automate 3D assembly from unposed point clouds without supervision? Yesโ€”we use a generative model that learns symmetry & part interchangeability entirely from shape Enabling robotics, AR/VR, & reverse engineering ๐ŸŒrectified-pointflow.github.io

๐Ÿ‘โ€ ษษฏษu ๐Ÿ‘โ€ (@aman_gif) 's Twitter Profile Photo

after reading interpretability works that found antipodal dense features in NNs, i tried initializing networks with them (i.e by setting half of each linear with the -weights of the first half). but it didnt seem to help on modded-nanogpt (val loss 3.32 @ 0.8B tokens)

after reading interpretability works that found antipodal dense features in NNs, i tried initializing networks with them (i.e by setting half of each linear with the -weights of the first half). but it didnt seem to help on modded-nanogpt (val loss 3.32 @ 0.8B tokens)
Simo Ryu (@cloneofsimo) 's Twitter Profile Photo

ReLU MLP with width / depth going to infinity. Note how different parameterization makes pathlogical scaling behavior (yellow / blue on activations / gradients of the weight). muP solves this.

lyra bubbles~ โ€ (@_lyraaaa_) 's Twitter Profile Photo

kalomaze will brown shaurya this is a VERY consistent pattern it has when you get it in completions mode. oh and it does really weird stuff like this sometimes when you try and crack it like this. just repeating "We can comply" forever

<a href="/kalomaze/">kalomaze</a> <a href="/willccbb/">will brown</a> <a href="/xXshaurizardXx/">shaurya</a> this is a VERY consistent pattern it has when you get it in completions mode.
oh and it does really weird stuff like this sometimes when you try and crack it like this. just repeating "We can comply" forever
Feng Yao (@fengyao1909) 's Twitter Profile Photo

(3/3) Whatโ€™s beyond? โ€” ๐๐ฎ๐š๐ง๐ญ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง! Applying quantized rollout (e.g., FP8) can greatly boost throughput โ€” but also ๐š๐ฆ๐ฉ๐ฅ๐ข๐Ÿ๐ฒ ๐ญ๐ก๐ž ๐ฆ๐ข๐ฌ๐ฆ๐š๐ญ๐œ๐ก and hurt performance. We show that ๐“๐ˆ๐’ can ๐ฆ๐ข๐ญ๐ข๐ ๐š๐ญ๐ž ๐ญ๐ก๐ž ๐ ๐š๐ฉ and preserve performance near

(3/3) Whatโ€™s beyond? โ€” ๐๐ฎ๐š๐ง๐ญ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง!
Applying quantized rollout (e.g., FP8) can greatly boost throughput โ€” but also ๐š๐ฆ๐ฉ๐ฅ๐ข๐Ÿ๐ฒ ๐ญ๐ก๐ž ๐ฆ๐ข๐ฌ๐ฆ๐š๐ญ๐œ๐ก and hurt performance. We show that ๐“๐ˆ๐’ can ๐ฆ๐ข๐ญ๐ข๐ ๐š๐ญ๐ž ๐ญ๐ก๐ž ๐ ๐š๐ฉ and preserve performance near
Jack Merullo (@jack_merullo_) 's Twitter Profile Photo

Itโ€™s maybe possible to use this to understand reasoning chains. Rollouts tend to have super flat curvature, so spikes really stand out. We see spikes when the model recites the formula for the length of a chord, or computes some really specific arithmetic

Itโ€™s maybe possible to use this to understand reasoning chains. Rollouts tend to have super flat curvature, so spikes really stand out. We see spikes when the model recites the formula for the length of a chord, or computes some really specific arithmetic
Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

This led to a very simple algorithm where the multiple rollouts happened offline prior to training, and the raw values of the reward were recorded so that the reward of any response could be calibrated via an empirical CDF inverse at training time.

This led to a very simple algorithm where the multiple rollouts happened offline prior to training, and the raw values of the reward were recorded so that the reward of any response could be calibrated via an empirical CDF inverse at training time.
Peter Yichen Chen (@peterchencyc) 's Twitter Profile Photo

#BestPaperAward #SIGGRAPH2025 One neural PDE model, hundreds of shapes โ€” simulated at lightning speed. ๐Ÿš€ Introducing Shape Space Spectra: first eigenanalysis across shapes. Come see ChangYueโ€™s talk today ๐Ÿ‘‰ changy1506.github.io

Jenny (@_jemeny) 's Twitter Profile Photo

I would like to understand better the niche of tech talents who choose to live suburban lives in states that arenโ€™t cali, ny, or even texas. Like florida but not miami. Georgia but not atlanta. Who are they, where are they, how many are there