Fereshte Khani (@fereshte_khani) 's Twitter Profile
Fereshte Khani

@fereshte_khani

@OpenAI, CS Ph.D. @stanfordAILab

ID: 994846342118871040

linkhttps://fereshte-khani.github.io/ calendar_today11-05-2018 07:47:06

333 Tweet

4,4K Takipรงi

883 Takip Edilen

AI Coffee Break with Letitia (@aicoffeebreak) 's Twitter Profile Photo

We simply explain and illustrate Mamba and (Selective) State Space Models โ€“ SSMs. ๐Ÿ“บ youtu.be/vrF3MtGwD0Y SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences! Incredible work by Albert Gu Tri Dao! ๐Ÿ‘

We simply explain and illustrate Mamba and (Selective) State Space Models โ€“ SSMs.
๐Ÿ“บ youtu.be/vrF3MtGwD0Y
SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences!
Incredible work by <a href="/_albertgu/">Albert Gu</a> <a href="/tri_dao/">Tri Dao</a>! ๐Ÿ‘
Zeyuan Allen-Zhu (@zeyuanallenzhu) 's Twitter Profile Photo

Our 12 scaling laws (for LLM knowledge capacity) are out: arxiv.org/abs/2404.05405. Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope this is a new direction to study scaling laws + help practitioners make informed decisions

Our 12 scaling laws (for LLM knowledge capacity) are out: arxiv.org/abs/2404.05405. Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope this is a new direction to study scaling laws + help practitioners make informed decisions
Lilian Weng (@lilianweng) 's Twitter Profile Photo

๐ŸŽจSpent some time refactoring the 2021 post on diffusion model with new content: lilianweng.github.io/posts/2021-07-โ€ฆ โฌ‡๏ธ โฌ‡๏ธ โฌ‡๏ธ ๐ŸŽฌThen another short piece on diffusion video models: lilianweng.github.io/posts/2024-04-โ€ฆ (Yes, I had an intensive weekend๐Ÿฅน)

lmsys.org (@lmsysorg) 's Twitter Profile Photo

Exciting update -- Llama-3 full result is out, now reaching top-5 on the Arena leaderboard๐Ÿ”ฅ We've got stable enough CIs with over 12K votes. No question now Llama-3 70B is the new king of open model. Its powerful 8B variant has also surpassed many larger-size models. What an

Exciting update -- Llama-3 full result is out, now reaching top-5 on the Arena leaderboard๐Ÿ”ฅ

We've got stable enough CIs with over 12K votes. No question now Llama-3 70B is the new king of open model. Its powerful 8B variant has also surpassed many larger-size models. What an
Michael Black (@michael_j_black) 's Twitter Profile Photo

Young scientists regularly ask me for career advice. Academia or industry? Big company or startup? US or Europe?ย  Good scientists in AI disciplines are fortunate to have many choices. But choosing can be stressful. I always give the same advice. 1/10

Behnam Neyshabur (@bneyshabur) 's Twitter Profile Photo

I'm excited about this! Our team has been working really hard to improve Gemini 1.5 capabilities significantly on multiple fronts and in particular MATH/STEM! Please see the report here: goo.gle/GeminiV1-5

Tri Dao (@tri_dao) 's Twitter Profile Photo

With Albert Gu, weโ€™ve built a rich theoretical framework of state-space duality, showing that many linear attn variants and SSMs are equivalent! The resulting model, Mamba-2 is better & faster than Mamba-1, and still matching strong Transformer arch on language modeling. 1/

With <a href="/_albertgu/">Albert Gu</a>, weโ€™ve built a rich theoretical framework of state-space duality, showing that many linear attn variants and SSMs are equivalent! The resulting model, Mamba-2 is better &amp; faster than Mamba-1, and still matching strong Transformer arch on language modeling.
1/
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Brilliant work by the Android agents team at Google DeepMind ๐Ÿ“Œ The authors introduce ANDROIDCONTROL, a new dataset of 15,283 human demonstrations of everyday tasks across 833 Android apps. Each task includes both high-level and low-level instructions. This allows studying agent

Brilliant work by the Android agents team at <a href="/GoogleDeepMind/">Google DeepMind</a>

๐Ÿ“Œ The authors introduce ANDROIDCONTROL, a new dataset of 15,283 human demonstrations of everyday tasks across 833 Android apps. Each task includes both high-level and low-level instructions. This allows studying agent
Robin Jia (@robinomial) 's Twitter Profile Photo

For many years as a Stanford NLP Group PhD student, I loved attending these seminars. Itโ€™s good to be back, this time as a guest speaker! Iโ€™ll discuss my groupโ€™s recent progress on understanding and auditing large language models

Sam Altman (@sama) 's Twitter Profile Photo

way back in 2022, the best model in the world was text-davinci-003. it was much, much worse than this new model. it cost 100x more.

Rosanne Liu (@savvyrl) 's Twitter Profile Photo

New fundraiser to support 25 Nigerian students to attend Deep Learning Indaba in September! ๐ŸŒ In 2022 we supported 8 Nigerian students to attend Indaba. This year we are raising $20k to support 25(!) of them to travel to Senegal for likely the most important career event in their lives!

New fundraiser to support 25 Nigerian students to attend <a href="/DeepIndaba/">Deep Learning Indaba</a> in September! ๐ŸŒ

In 2022 we supported 8 Nigerian students to attend Indaba. This year we are raising $20k to support 25(!) of them to travel to Senegal for likely the most important career event in their lives!
Zeyuan Allen-Zhu (@zeyuanallenzhu) 's Twitter Profile Photo

Incredibly honored and humbled by the overwhelming response to my tutorial, and thank you everyone who attended in person. Truly heartwarming to hear how much you enjoyed it. Many have been asking for a recording, and I prepared one with my own subtitles youtu.be/yBL7J0kgldU

Incredibly honored and humbled by the overwhelming response to my tutorial, and thank you everyone who attended in person. Truly heartwarming to hear how much you enjoyed it. Many have been asking for a recording, and I prepared one with my own subtitles  youtu.be/yBL7J0kgldU
Zeyuan Allen-Zhu (@zeyuanallenzhu) 's Twitter Profile Photo

Bad news (1/2): video taken down by ICML ([email protected]) for copyright. While I can't agree (the consent I signed allows me to publish elsewhere) - I will respect it to save time for more important things. To bad I delayed many things and spent 20+ hrs preparing the video.

Bad news (1/2): video taken down by ICML (brockmeyer@icml.cc) for copyright. While I can't agree (the consent I signed allows me to publish elsewhere) - I will respect it to save time for more important things. To bad I delayed many things and spent 20+ hrs preparing the video.
Zico Kolter (@zicokolter) 's Twitter Profile Photo

I'm excited to announce that I am joining the OpenAI Board of Directors. I'm looking forward to sharing my perspectives and expertise on AI safety and robustness to help guide the amazing work being done at OpenAI.

Qinyuan Ye (@qinyuan_ye) 's Twitter Profile Photo

I'll be presenting our work on investigating the role of meta-prompt components in automatic prompt engineering, i.e., "๐—ฝ๐—ฟ๐—ผ๐—บ๐—ฝ๐˜ ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด ๐—ฎ ๐—ฝ๐—ฟ๐—ผ๐—บ๐—ฝ๐˜ ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ ", at Findings Session 1 (Mon 12:45) and NLRSE Workshop (Thu 4pm)! Please come say hi! ๐Ÿ‘‹

I'll be presenting our work on investigating the role of meta-prompt components in automatic prompt engineering, i.e., "๐—ฝ๐—ฟ๐—ผ๐—บ๐—ฝ๐˜ ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด ๐—ฎ ๐—ฝ๐—ฟ๐—ผ๐—บ๐—ฝ๐˜ ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ ", at Findings Session 1 (Mon 12:45) and NLRSE Workshop (Thu 4pm)! Please come say hi! ๐Ÿ‘‹