When scientists need to study and simulate complex phenomena like forest fires, climate change, or viral infections, they turn to supercomputing to do what regular computers can’t. The University of New Mexico Center for Advanced Research Computing (UNM CARC), the University of Tennessee at Chattanooga (UTC) Center for Excellence in Applied Computational Science and Engineering (SimCenter), and the University of Alabama at Birmingham (UAB) Collaborative Computing Lab (CCL) have long been centers of innovation in supercomputing systems deployment and research. Now, researchers at these three institutions seek to revolutionize the field of supercomputing by developing innovative new methods for communication in supercomputers.
The US Department of Energy’s National Nuclear Security Administration (NNSA) has selected UNM CARC, UTC SimCenter, and UAB CCL to receive a $4 million Predictive Science Academic Alliance Program (PSAAP) award to improve the speed and functionality of next-generation supercomputers. The award will allow for the creation of the Center for Understandable, Performant Exascale Communication Systems (CUP-ECS), which will research more efficient mechanisms for high-speed computer-to-computer communication. The center has a primary focus of improving the simulation capabilities of DOE applications, but the technological advances yielded by the project will also impact the performance of supercomputers around the globe and supercomputing applications in a wide range of disciplines.
UNM professor of Computer Science and CARC director Patrick Bridges will direct research at CUP-ECS in collaboration with associate center directors professor Anthony Skjellum at the UTC SimCenter and professor Purushotham Bangalore at UAB. According to Bridges, the method of communication currently used by supercomputing systems has fundamental limitations. These limitations reduce system performance and force application designers to take a trial-and-error approach to optimizing communication patterns.
“Current communication systems are built on assumptions from the computer architectures of 30 years ago that are no longer true today,” Bridges said.
The new DOE-funded center will research and refine more efficient methods for communication in supercomputers and supercomputing applications. This will significantly increase the performance of cutting-edge scientific applications and the scale of problems that scientists can research.
“Basically, every modern supercomputer is based on high-performance communication. We can build faster individual computers but the fabric that makes it all work is the communication system,” Bridges explained.
One of the most important aspects of the project is the development of a communication system that provides programmers with system performance feedback that is “both meaningful and useful” so that they can optimize application efficiency.
Bridges commented, “There’s a lot we can do... to make the communication system perform better. Designers make a lot of choices and they need to be well-informed. Current communication systems don’t provide application designers the information they need.”
Skjellum, director of UTC’s SimCenter, added, “Research at UNM, UTC, and UAB in high performance computing will enhance the NNSA exascale mission while educating and advancing computer science concepts at the NNSA labs and at our three institutions.”
Bangalore, Director of CCL at UAB, opined, “Working on realistic applications and revealing ways to improve performance at exascale is an outstanding problem area for our center to help solve over the next five years.”
CARC business manager Tracy Wenzl, who will manage the hiring and provide budgetary oversight for the project, writes “This project is part of CARC’s mission to lead research and development on next-generation research computing systems. The insights we gain from the project will also help provide better computational facilities to the wide range of researchers that CARC serves.”
The project is expected to last for five years and will be conducted using UNM, UTC, and UAB’s high-performance computing resources. The funding CARC receives from the DOE NNSA will go primarily toward paying students and postdoctoral fellows to perform research under the guidance of Bridges, Skjellum, and Bangalore, and to collaborate with researchers at Sandia, Los Alamos, and Lawrence Livermore National Labs.