Hiroki Naganuma (@_hiroki11x) 's Twitter Profile
Hiroki Naganuma

@_hiroki11x

PhD Candidate at @UMontreal ,@Mila_Quebec/ HPC, Generalization, Optimization/ ex- @GoogleDeepMind, @MSFTResearch, @IBMResearch

ID: 1067249533501898752

linkhttps://hiroki11x.github.io/ calendar_today27-11-2018 02:51:32

112 Tweet

923 Followers

721 Following

Kotaro Yoshida (@katoro13___) 's Twitter Profile Photo

I will present a work "Mastering Task Arithmetic: τJp as a Key Indicator for Weight Disentanglement" at the #NeurIPS2024 workshop FITML on 14th Dec. We mitigate the interference between task vectors and coefficient tuning costs through the "τJp" regularization.

I will present a work "Mastering Task Arithmetic: τJp as a Key Indicator for Weight Disentanglement"  at the #NeurIPS2024 workshop FITML on 14th Dec.

We mitigate the interference between task vectors and coefficient tuning costs through the "τJp" regularization.
Ryan D'Orazio (@ryandorazio) 's Twitter Profile Photo

I'll be at #NeurIPS24 until Sunday. If you're interested in solving variational inequality problems with deep learning (e.g. min-max and projected Bellman error), come and checkout our poster on surrogate losses at the opt ml workshop. arxiv.org/abs/2411.05228

Hiroki Naganuma (@_hiroki11x) 's Twitter Profile Photo

I will present our work on the efficient distributed training algorithm at the optimization workshop. Join us during our poster sessions from 15:00-16:00. #NeurIPS2024

I will present our work on the efficient distributed training algorithm at the optimization workshop. Join us during our poster sessions from 15:00-16:00.
#NeurIPS2024
Masanari Kimura (@machinery81) 's Twitter Profile Photo

Our paper "Density Ratio Estimation via Sampling along Generalized Geodesics on Statistical Manifolds" got accepted to AISTATS'25

Our paper "Density Ratio Estimation via Sampling along Generalized Geodesics on Statistical Manifolds" got accepted to AISTATS'25
Hiroki Naganuma (@_hiroki11x) 's Twitter Profile Photo

Our paper "Mastering Task Arithmetic: τJp as a Key Indicator for Weight Disentanglement" has been accepted at ICLR 2025🇸🇬! Grateful to my collaborators for their efforts. This work introduces an approach to enhance task arithmetic by mitigating interference.

Our paper "Mastering Task Arithmetic: τJp as a Key Indicator for Weight Disentanglement" has been accepted at ICLR 2025🇸🇬!  Grateful to my collaborators for their efforts. This work introduces an approach to enhance task arithmetic by mitigating interference.
Hiroki Naganuma (@_hiroki11x) 's Twitter Profile Photo

Our latest paper, Geometric Insights into Focal Loss: Reducing Curvature for Enhanced Model Calibration, is now available in Pattern Recognition Letters. We explore how focal loss influences model calibration through curvature reduction, offering new theoretical insights.

Our latest paper, Geometric Insights into Focal Loss: Reducing Curvature for Enhanced Model Calibration, is now available in Pattern Recognition Letters.
We explore how focal loss influences model calibration through curvature reduction, offering new theoretical insights.
Sourabh Medapati (@activelifetribe) 's Twitter Profile Photo

Algoperf results paper is on arxiv now!!! Congratulations to all contributors who made it happen 🎉🎉 ... special thanks to Priya Kasimbeg George E. Dahl Frank Schneider for leading this effort! arxiv.org/abs/2502.15015

Hiroki Naganuma (@_hiroki11x) 's Twitter Profile Photo

先月で、Google DeepMind での Student Researcher を終えました。日本語情報があまりなさそうなので、日本の学生向けにインターン参加備忘録を書きました。 hiroki11x.github.io/posts/research…

Ryan D'Orazio (@ryandorazio) 's Twitter Profile Photo

This week I'll be at #ICLR25. If you like fundamental optimization results, I'll be presenting our work on surrogate losses for non-convex-concave min-max problems and learning value functions in deep RL (VIs more generally). Poster: #377 Thursday April 24 10am-12:30pm

Hiroki Naganuma (@_hiroki11x) 's Twitter Profile Photo

Presenting our work, “Mastering Task Arithmetic: τJp as a Key Indicator for Weight Disentanglement,” this Friday, Apr 25, 3:00–5:30 p.m. Interested in task arithmetic? Please stop by our poster! #ICLR25 Mila - Institut québécois d'IA

Divyat Mahajan (@divyat09) 's Twitter Profile Photo

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025 📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions! 📜 arxiv.org/abs/2410.06303

Happy to share that Compositional Risk Minimization has been accepted at #ICML2025

📌Extensive theoretical analysis along with a practical approach for extrapolating classifiers to novel compositions!

📜 arxiv.org/abs/2410.06303
Hiroki Naganuma (@_hiroki11x) 's Twitter Profile Photo

I'm delighted to share that our paper has been accepted by #TMLR! We empirically observed signs of scaling laws regarding how the choice of pre-trained models affects OOD test errors and Expected Calibration Error on downstream tasks.

I'm delighted to share that our paper has been accepted
by #TMLR!
We empirically observed signs of scaling laws regarding how the choice of pre-trained models affects OOD test errors and Expected Calibration Error on downstream tasks.
Hiroki Naganuma (@_hiroki11x) 's Twitter Profile Photo

日経Robotics 2025年8月号に、モデルマージの中でも特に Task Arithmetic における正則化に関する研究を紹介していただきました。 xtech.nikkei.com/atcl/nxt/mag/r…

日経Robotics 2025年8月号に、モデルマージの中でも特に Task Arithmetic における正則化に関する研究を紹介していただきました。
xtech.nikkei.com/atcl/nxt/mag/r…
Divyat Mahajan (@divyat09) 's Twitter Profile Photo

Presenting CRM at #ICML2025 📌 Wednesday, 16th July, 11 am 📍East Exhibition Hall A-B (E-2101) Lets chat about distribution shifts! Been deep into causality & invariance based perspectives, and recently exploring robust LLM pretraining architectures.

Presenting CRM at #ICML2025 

📌 Wednesday,  16th July, 11 am
📍East Exhibition Hall A-B (E-2101)

Lets chat about distribution shifts! Been deep into causality & invariance based perspectives, and recently exploring robust LLM pretraining architectures.
Reyhane Askari (@reyhaneaskari) 's Twitter Profile Photo

Excited to present our work "Improving the scaling laws of synthetic data with deliberate practice", tomorrow at #ICML2025 📢 Oral: Wed. 10:45 AM 📍 West Ballroom B (Oral 3C Data-Centric ML) 🖼️ Poster: 🕚 11:00 AM – 1:30 PM 📍 East Exhibition Hall A-B (Poster Session 3 East)

Ryan D'Orazio (@ryandorazio) 's Twitter Profile Photo

I’m also excited to be presenting this work (openreview.net/forum?id=4ZX2a…) at ICCOPT at USC. Theory aside there are some applications that may interest ppl in RL, games, and performative prediction. Let me know if you are in the area and want to chat!

Kotaro Yoshida (@katoro13___) 's Twitter Profile Photo

🚨New preprint! Real‑world robustness of multi‑task model merging was unknown. We uncover two failure modes—(i) norm mismatch (ii) low‑confidence predictions—in source models and introduce DisTaC, a distillation‑based conditioning that fixes both. 🧵 (1/6)

🚨New preprint!

Real‑world robustness of multi‑task model merging was unknown. We uncover two failure modes—(i) norm mismatch (ii) low‑confidence predictions—in source models and introduce DisTaC, a distillation‑based conditioning that fixes both. 

🧵 (1/6)
Cohere Labs (@cohere_labs) 's Twitter Profile Photo

🧠 Ever wondered why Adam optimizer works so well for training large language models? Join us on August 20th to learn more about the "secret sauce" of Adam from Antonio Orvieto, who trained over 1,300 models to uncover what makes it so effective - and discover a simplified

🧠 Ever wondered why Adam optimizer works so well for training large language models?

Join us on August 20th to learn more about the "secret sauce" of Adam from <a href="/orvieto_antonio/">Antonio Orvieto</a>, who trained over 1,300 models to uncover what makes it so effective - and discover a simplified