Why Supercomputer Benchmarking Is So Important

Ali Guerra

November 26, 2024

1 View

SaveSavedRemoved 0

Why Supercomputer Benchmarking Is So Important

Contents hide

1 Top500 Is a historic report of computing progress

2 A one-line surprise for politicians

3 The inventory measure of computing

4 Producing worth

5 Optimizing Workloads

6 Environmental Affect

7 Expertise Analysis

8 Speaking with stakeholders

SC2024 had a number of classes speaking in regards to the significance of benchmarking methods. The relevance of LINPACK — which is used to rank the Top500 — has been a sizzling subject of dialogue these days. Panelists mentioned supercomputing benchmarking will stay necessary for a lot of causes. However it would want to sustain with purposes, {hardware}, and codebases.

The relevance is dependent upon the viewers and purposes. It’s not apparent, however benchmarks even have geopolitical, social, environmental, and financial implications.

For science, LINPACK measures the time-to-science and worth extracted from {hardware}.

For AI, benchmarks equivalent to MLPerf assist hyperscalers get a real-world view of efficiency and optimize new fashions to {hardware}. The benchmark tells the corporate that the mannequin will be fine-tuned additional for GPUs or ASICs.

Listed below are some explanation why benchmarks are necessary.

Top500 Is a historic report of computing progress

Top500, which has run for 32 years, is a historic report of how humanity has moved ahead in computing.

“The phrase that involves his thoughts about Linpack is, in fact, the phrase legacy. It’s an necessary legacy that we proceed so we have now historic continuity,” mentioned Piotr Luszczek of Massachusetts Institute of Expertise Lincoln Lab and the College of Tennessee throughout a session.

A one-line surprise for politicians

The Top500 listing is a yardstick to measure U.S. progress in computing. For instance, Top500 offers a straightforward one-liner to inform politicians that the U.S. is main over China.

After all, the Top500 is rather more complicated than that. However impatient politicians desire a fast abstract, and the Top500 offers these solutions.

“Composite metrics are at all times problematic. Having lots of numbers is sweet for the engineers, however it’s not good for politicians, so to talk. They simply need the one quantity,” Luszczek mentioned.

“I’ve listened to many hours of science coverage testimony with members of Congress asking our company leaders about these items. They don’t know squat about HPL, however they know an inventory, and so they ask, ‘Are we within the lead or are we nonetheless within the lead?’” mentioned Jack Wells, a scientific program supervisor at Nvidia.

Wells beforehand labored as director of science for ORNL.

The inventory measure of computing

Wells mentioned the Top500 is to computing what the Dow Jones Industrials is to the inventory market.

“It has the identical impression when the neighborhood says, ‘Oh, the Dow Industrials did this at the moment.’ It’s the identical factor. However we all know it adjustments,“ Wells mentioned.

The Top500 confirmed that Dennard Scaling stopped in 2017-18. Computing progress exhibits that chiplets in AMD’s MI-series GPUs are impacting the panorama of computing efficiency.

Producing worth

Supercomputing modules are put into manufacturing as quickly as they’re prepared. Benchmarking offers a strategy to measure the readiness of the system.

“When the primary a part of it turns into out there, it goes into manufacturing. When the subsequent a part of it’s out there, it goes into manufacturing… And that approach, you begin to generate worth from that machine sooner and sooner,“ mentioned Andrew Jones of Microsoft Azure at a technical session discussing benchmarks.

“For those who take a look at the hyperscaler market coaching the big language fashions, there’s clear sufficient enterprise worth to having a extra succesful AI mannequin out earlier than your rivals,“ Jones mentioned.

Prospects typically take a look at AI fashions’ accuracy and response time. MLPerf has many benchmarking instruments that contemplate many factors, together with the environmental impression, to measure AI efficiency.

Optimizing Workloads

Jones mentioned that considerably extra science will be finished on a system that’s higher operated than a bigger one which is extra poorly operated.

Deploying the newest era or a brand new era expertise will not be extra environment friendly than “one which has chosen to optimize its precise science per greenback or science per megawatt,“ Jones mentioned.

David Kanter, head of MLPerf, identified that benchmarking helps construct out focused methods. He gave the instance of RIKEN’s Fugaku supercomputer.

“One of many supercomputer websites that has constantly impressed me is the parents at Riken and with Fugaku… they wrote a paper that principally mentioned… we aren’t optimizing for probably the most flops as a result of most workloads are sparse… we’re going to sacrifice half our peak flops. However gosh, on actual workloads, we’re going to return out,“ Kanter mentioned.

Environmental Affect

The discuss optimizing the facility effectivity of computing is finished underneath the label of sustainability.

“What we actually imply is we don’t need to spend cash on megawatts; we’d fairly spend cash on compute,“ Jones mentioned.

Constructing the bodily infrastructure — together with the manufacturing processes, mining, and laying concrete for information facilities — has a “a lot bigger, a lot greater carbon output than the precise electrical energy consumed in the course of the operation of supercomputers,“ Jones mentioned.

“One of the sustainable issues you are able to do along with your supercomputer is to not make it barely extra environment friendly in power phrases however to maneuver it some other place on this planet,“ Jones mentioned.

Expertise Analysis

Past {hardware}, benchmarks additionally measure the software program stack, tuning expertise, and crew experience.

Kanter mentioned single numbers don’t seize the system complexity of benchmarking.

The panelists mentioned there could also be statistical variations of a given software or code on a selected mixture of {hardware} and software program.

“The consumer doesn’t run an software 100 occasions on the supercomputer after which decide the one with the quickest consequence,“ Jones mentioned.

Benchmarking can not directly be strategy to measure the experience of your IT employees.

“You’re measuring the efficiency of your benchmarking code, the portability of your benchmarking code, how nicely you’ve been capable of tune it. You’re measuring the efficiency of your benchmarking crew and the talent of the benchmarking crew,“ Jones mentioned.

Speaking with stakeholders

Kanter mentioned that not everyone seems to be on the lookout for one metric — completely different groups search for completely different numbers.

The benchmark can clarify issues with completely different metrics to groups after which facilitate cooperation.

There could possibly be one metric on one codebase, however there could also be variations in sizes and circumstances. The principles could possibly be completely different.

The panelists mentioned that that could possibly be completely different if you’re making an attempt to determine which system to purchase versus assist customers successfully make the most of that system.