Showing posts with label algebraic systems. Show all posts
Showing posts with label algebraic systems. Show all posts

Nov 27, 2020

[paper] Trillion-transistor chip breaks speed record

Kamil Rocki∗, Dirk Van Essendelft†, Ilya Sharapov∗, Robert Schreiber∗, Michael Morrison∗, Vladimir Kibardin∗, Andrey Portnoy∗, Jean Francois Dietiker†‡, Madhava Syamlal†
and Michael James∗
Fast Stencil-Code Computation on a Wafer-Scale Processor
Online SC20 Supercomputing Conference
arXiv:2010.03660 [cs.DC] (2020)

∗ Cerebras Systems Inc., Los Altos, California, USA
† National Energy Technology Laboratory, Morgantown, West Virginia, USA
‡ Leidos Research Support Team, Pittsburgh, Pennsylvania, USA

Abstract: The performance of CPU-based and GPUbased systems is often low for PDE codes, where large, sparse, and often structured systems of linear equations must be solved. Iterative solvers are limited by data movement, both between caches and memory and between nodes. Here we describe the solution of such systems of equations on the Cerebras Systems CS-1, a wafer-scale processor that has the memory bandwidth and communication latency to perform well. We achieve 0.86 PFLOPS on a single wafer-scale system for the solution by BiCGStab of a linear system arising from a 7-point finite difference stencil on a 600 × 595 × 1536 mesh, achieving about one third of the machine’s peak performance. We explain the system, its architecture and programming, and its performance on this problem and related problems. We discuss issues of memory capacity and floating point precision. We outline plans to extend this work towards full applications.
Fig: CS-1 Wafer Scale Engine (WSE). A single wafer (rightmost) contains one CS-1 processor. Each processor is a collection of dies arranged in a 2D fashion (middle). Dies are then further subdivided into a grid of tiles. One die hosts thousands of computational cores, memory and routers (leftmost). There is no logical discontinuity between adjacent dies and there is no additional bandwidth penalty for crossing the die-die barrier. In total, there are 1.2 trillion transistors in an area of 462.25 cm2.

Acknowledgement: The authors would like to thank Natalia Vassilieva for initiating the collaboration between Cerebras Systems and NETL and for her subsequent help with the project.