Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profileg
Torsten Hoefler πŸ‡¨πŸ‡­

@thoefler

Professor @ETH_en, head of @spcl_eth, Chief Architect ML @cscsch researching large-scale #HPC and #AI systems and #Climate computing - youtube: https://t.co/dxrzSPIFVb

ID:62059377

linkhttp://htor.inf.ethz.ch calendar_today01-08-2009 15:54:19

2,8K Tweets

4,6K Followers

211 Following

Saleh Ashkboos(@AshkboosSaleh) 's Twitter Profile Photo

[1/7] Happy to release πŸ₯•QuaRot, a post-training quantization scheme that enables 4-bit inference of LLMs by removing the outlier features.
With Amirkeivan Mohtashami Max Croci Dan Alistarh Torsten Hoefler πŸ‡¨πŸ‡­ James Hensman and others

Paper: arxiv.org/abs/2404.00456
Code: github.com/spcl/QuaRot

[1/7] Happy to release πŸ₯•QuaRot, a post-training quantization scheme that enables 4-bit inference of LLMs by removing the outlier features. With @akmohtashami_a @max_croci @DAlistarh @thoefler @jameshensman and others Paper: arxiv.org/abs/2404.00456 Code: github.com/spcl/QuaRot
account_circle
Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profile Photo

Digital Intelligence Index - a country's advancement in is on the y axis and adoption of on the x axis. Quite interesting.

digitalintelligence.fletcher.tufts.edu/trajectory

Watch China and Singapore!

Fast compute is clearly the future - and whole countries need to watch to not be left behind.

Digital Intelligence Index - a country's advancement in #AI is on the y axis and adoption of #AI on the x axis. Quite interesting. digitalintelligence.fletcher.tufts.edu/trajectory Watch China and Singapore! Fast compute is clearly the future - and whole countries need to watch to not be left behind.
account_circle
Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profile Photo

Google's TPUv4 OCS topology is basically a 3D HammingMesh πŸ˜‡.

The main difference is optical vs. electrical in the fully-connection dimensions. We found optical too inflexible (coming from Fat Trees) and the optical switch had limited cycles.

youtube.com/watch?v=wIdssz… (15:51)

Google's TPUv4 OCS topology is basically a 3D HammingMesh πŸ˜‡. The main difference is optical vs. electrical in the fully-connection dimensions. We found optical too inflexible (coming from Fat Trees) and the optical switch had limited cycles. youtube.com/watch?v=wIdssz… (15:51)
account_circle
Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profile Photo

How to communicate with your network?

Twenty ways to message in modern networks! Some quite cool ideas, interesting tradeoffs, limitations, and design ideas for protocols and network interfaces.

arxiv.org/abs/2212.09134

How to communicate with your #RDMA network? Twenty ways to message in modern networks! Some quite cool ideas, interesting tradeoffs, limitations, and design ideas for protocols and #HPC network interfaces. arxiv.org/abs/2212.09134
account_circle
Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profile Photo

Are you wondering how to make you SC24 submission even better?

Check out the scientific benchmarking paper for parallel systems.

Twelve rules to improve the scientific quality of your paper.

htor.inf.ethz.ch/publications/i…

Are you wondering how to make you #SC24 @Supercomputing submission even better? Check out the scientific benchmarking paper for parallel systems. Twelve rules to improve the scientific quality of your #HPC paper. htor.inf.ethz.ch/publications/i…
account_circle
Nature Computational Science(@NatComputSci) 's Twitter Profile Photo

In the area of Earth systems science, Peter Bauer, Torsten Hoefler πŸ‡¨πŸ‡­ and colleagues argue that flexible human-in-the-loop interaction is essential for understanding and making efficient use of the data provided by digital twins of Earth. nature.com/articles/s4358…

🧡9/11

account_circle
PULP Platform(@pulp_platform) 's Twitter Profile Photo

Samuel Samuel presented LRSCwait where we introduce new atomic operations to eliminate polling and retries during synchronization in manycore systems: arxiv.org/abs/2401.09359 pulp-platform.org/docs/date2024/… Big thank you to Samuel for doing such an amazing job reporting from .

Samuel @saem_r presented LRSCwait where we introduce new atomic operations to eliminate polling and retries during synchronization in manycore systems: arxiv.org/abs/2401.09359 pulp-platform.org/docs/date2024/… Big thank you to Samuel for doing such an amazing job reporting from #DATE2024.
account_circle
Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profile Photo

Thinking about it - 'RDMA over Converged Ethernet' (RoCE) is really a misnomer and should be called 'InfiniBand over CE'.

RDMA is much wider and there will be a fundamentally different specification in Ultra Ethernet Consortium!

An old discussion on the terms: htor.inf.ethz.ch/blog/index.php…

Thinking about it - 'RDMA over Converged Ethernet' (RoCE) is really a misnomer and should be called 'InfiniBand over CE'. RDMA is much wider and there will be a fundamentally different specification in @ultraethernet! An old discussion on the terms: htor.inf.ethz.ch/blog/index.php…
account_circle
Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profile Photo

Neat idea from Cerebras to implement a transposed multiplication during training backwards passes for free simply by streaming one matrix in a different pattern. Sparsity by streaming seems also nice!

Maybe useful for as well?

youtube.com/watch?v=wIdssz… (48:30)

Neat idea from @CerebrasSystems to implement a transposed multiplication during #AI training backwards passes for free simply by streaming one matrix in a different pattern. Sparsity by streaming seems also nice! Maybe useful for #HPC as well? youtube.com/watch?v=wIdssz… (48:30)
account_circle
Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profile Photo

Neat idea from Cerebras to implement a transposed multiplication during training backwards passes for free simply by streaming one matrix in a different pattern. Sparsity by streaming seems also nice!

Maybe useful for as well?

youtube.com/watch?v=wIdssz… (48:30)

Neat idea from @CerebrasSystems to implement a transposed multiplication during #AI training backwards passes for free simply by streaming one matrix in a different pattern. Sparsity by streaming seems also nice! Maybe useful for #HPC as well? youtube.com/watch?v=wIdssz… (48:30)
account_circle
Vala Afshar(@ValaAfshar) 's Twitter Profile Photo

NVIDIA CEO: your job as a leader is to architect the right conditions for your employees to do their life’s work

account_circle
Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profile Photo

Fortran is dead - long live Fortran!

A take on how to move Fortran codes to modern computing devices using parametric dataflow and performance metaprogramming. Work led by Calotoiu Alexandru at SPCL@ETH!

Using ECMWF's CLOUDSC dwarf as demonstrator.

youtu.be/1s8TT3jE8Tg

Fortran is dead - long live Fortran! A take on how to move @fortranlang #HPC codes to modern computing devices using parametric dataflow and performance metaprogramming. Work led by @calotoiu at @spcl_eth! Using @ECMWF's CLOUDSC dwarf as demonstrator. youtu.be/1s8TT3jE8Tg
account_circle
Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profile Photo

'Therefore, if you are a chip designer in a bandwidth constrained world, you are making your chip roughly 3x worse when you chose to go with PCIe 5.0 instead of 112G Ethernet-style SerDes.'

Ultra Ethernet standing by πŸ˜‡.

account_circle
Tim Prickett Morgan(@TDaytonPM) 's Twitter Profile Photo

Researchers play Tetris with chiplet topologies to come up with the Goldilocks arrangement, called HexaMesh. Not to be confused with TexMex, which also comes with chips.
nextplatform.com/2024/03/11/hex…

account_circle
Torsten Hoefler πŸ‡¨πŸ‡­(@thoefler) 's Twitter Profile Photo

Maurice Steinman from Lightelligence shows the Humminbgird optical substrate at Hot Chips.

Decouples distance and enables free bcast!

Combining HexaMesh-style high-bandwidth neighbor with SlimFly or Polar Fly topologies for new breakthroughs?

youtube.com/watch?v=l5Fg2c…

Maurice Steinman from @lightelligence shows the Humminbgird optical substrate at @hotchipsorg. Decouples distance and enables free bcast! Combining HexaMesh-style high-bandwidth neighbor with SlimFly or Polar Fly topologies for new breakthroughs? youtube.com/watch?v=l5Fg2c…
account_circle