Hiroki Naganuma (@_hiroki11x) Twitter Tweets • TwiCopy

Kotaro Yoshida

a year ago

I will present a work "Mastering Task Arithmetic: τJp as a Key Indicator for Weight Disentanglement" at the #NeurIPS2024 workshop FITML on 14th Dec. We mitigate the interference between task vectors and coefficient tuning costs through the "τJp" regularization.

thumb_up_off_alt7

chat_bubble_outline0

repeat3

shareShare

Ryan D'Orazio

@ryandorazio

a year ago

I'll be at #NeurIPS24 until Sunday. If you're interested in solving variational inequality problems with deep learning (e.g. min-max and projected Bellman error), come and checkout our poster on surrogate losses at the opt ml workshop. arxiv.org/abs/2411.05228

thumb_up_off_alt21

chat_bubble_outline0

repeat7

shareShare

Hiroki Naganuma

@_hiroki11x

a year ago

I will present our work on the efficient distributed training algorithm at the optimization workshop. Join us during our poster sessions from 15:00-16:00. #NeurIPS2024

thumb_up_off_alt61

chat_bubble_outline4

repeat8

shareShare

Masanari Kimura

@machinery81

10 months ago

Our paper "Density Ratio Estimation via Sampling along Generalized Geodesics on Statistical Manifolds" got accepted to AISTATS'25

thumb_up_off_alt150

chat_bubble_outline4

repeat15

shareShare

Hiroki Naganuma

@_hiroki11x

10 months ago

Our paper "Mastering Task Arithmetic: τJp as a Key Indicator for Weight Disentanglement" has been accepted at ICLR 2025🇸🇬! Grateful to my collaborators for their efforts. This work introduces an approach to enhance task arithmetic by mitigating interference.

thumb_up_off_alt52

chat_bubble_outline0

repeat6

shareShare

Hiroki Naganuma

@_hiroki11x

10 months ago

I'm glad to see Mila - Institut québécois d'IA properly mentioned in the Japanese article.

thumb_up_off_alt18

chat_bubble_outline0

repeat2

shareShare

Hiroki Naganuma

@_hiroki11x

10 months ago

Our latest paper, Geometric Insights into Focal Loss: Reducing Curvature for Enhanced Model Calibration, is now available in Pattern Recognition Letters. We explore how focal loss influences model calibration through curvature reduction, offering new theoretical insights.

thumb_up_off_alt33

chat_bubble_outline0

repeat3

shareShare

Sourabh Medapati

@activelifetribe

9 months ago

Algoperf results paper is on arxiv now!!! Congratulations to all contributors who made it happen 🎉🎉 ... special thanks to Priya Kasimbeg George E. Dahl Frank Schneider for leading this effort! arxiv.org/abs/2502.15015

thumb_up_off_alt29

chat_bubble_outline0

repeat8

shareShare

Hiroki Naganuma

@_hiroki11x

8 months ago

先月で、Google DeepMind での Student Researcher を終えました。日本語情報があまりなさそうなので、日本の学生向けにインターン参加備忘録を書きました。 hiroki11x.github.io/posts/research…

thumb_up_off_alt279

chat_bubble_outline3

repeat22

shareShare

Ryan D'Orazio

@ryandorazio

7 months ago

This week I'll be at #ICLR25. If you like fundamental optimization results, I'll be presenting our work on surrogate losses for non-convex-concave min-max problems and learning value functions in deep RL (VIs more generally). Poster: #377 Thursday April 24 10am-12:30pm

thumb_up_off_alt28

chat_bubble_outline0

repeat6

shareShare

Hiroki Naganuma

@_hiroki11x

7 months ago

Presenting our work, “Mastering Task Arithmetic: τJp as a Key Indicator for Weight Disentanglement,” this Friday, Apr 25, 3:00–5:30 p.m. Interested in task arithmetic? Please stop by our poster! #ICLR25 Mila - Institut québécois d'IA

thumb_up_off_alt30

chat_bubble_outline1

repeat7

shareShare

Reyhane Askari

@reyhaneaskari

7 months ago

Excited to be part of this panel today at the WiML social, 12:30 PM - 2:00 PM, Hall 1 Apex

thumb_up_off_alt31

chat_bubble_outline0

repeat7

shareShare

Divyat Mahajan

@divyat09

7 months ago

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025 📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions! 📜 arxiv.org/abs/2410.06303

thumb_up_off_alt159

chat_bubble_outline4

repeat31

shareShare

Hiroki Naganuma

@_hiroki11x

5 months ago

I'm delighted to share that our paper has been accepted by #TMLR! We empirically observed signs of scaling laws regarding how the choice of pre-trained models affects OOD test errors and Expected Calibration Error on downstream tasks.

thumb_up_off_alt43

chat_bubble_outline0

repeat5

shareShare

Hiroki Naganuma

@_hiroki11x

5 months ago

日経Robotics 2025年8月号に、モデルマージの中でも特に Task Arithmetic における正則化に関する研究を紹介していただきました。 xtech.nikkei.com/atcl/nxt/mag/r…

thumb_up_off_alt39

chat_bubble_outline0

repeat7

shareShare

Divyat Mahajan

@divyat09

4 months ago

Presenting CRM at #ICML2025 📌 Wednesday, 16th July, 11 am 📍East Exhibition Hall A-B (E-2101) Lets chat about distribution shifts! Been deep into causality & invariance based perspectives, and recently exploring robust LLM pretraining architectures.

thumb_up_off_alt42

chat_bubble_outline0

repeat8

shareShare

Reyhane Askari

@reyhaneaskari

4 months ago

Excited to present our work "Improving the scaling laws of synthetic data with deliberate practice", tomorrow at #ICML2025 📢 Oral: Wed. 10:45 AM 📍 West Ballroom B (Oral 3C Data-Centric ML) 🖼️ Poster: 🕚 11:00 AM – 1:30 PM 📍 East Exhibition Hall A-B (Poster Session 3 East)

thumb_up_off_alt37

chat_bubble_outline0

repeat9

shareShare

Ryan D'Orazio

@ryandorazio

4 months ago

I’m also excited to be presenting this work (openreview.net/forum?id=4ZX2a…) at ICCOPT at USC. Theory aside there are some applications that may interest ppl in RL, games, and performative prediction. Let me know if you are in the area and want to chat!

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Kotaro Yoshida

@katoro13___

4 months ago

🚨New preprint! Real‑world robustness of multi‑task model merging was unknown. We uncover two failure modes—(i) norm mismatch (ii) low‑confidence predictions—in source models and introduce DisTaC, a distillation‑based conditioning that fixes both. 🧵 (1/6)

thumb_up_off_alt8

chat_bubble_outline1

repeat2

shareShare

Cohere Labs

@cohere_labs

3 months ago

🧠 Ever wondered why Adam optimizer works so well for training large language models? Join us on August 20th to learn more about the "secret sauce" of Adam from Antonio Orvieto, who trained over 1,300 models to uncover what makes it so effective - and discover a simplified

thumb_up_off_alt49

chat_bubble_outline0

repeat5

shareShare