- Huawei-linked LineShine supercomputer crams 2.45 million Arm cores into one huge AI cluster
- Huawei’s processors energy considered one of China’s largest AI computing installations right this moment
- CPU-only supercomputers get rid of pricey information transfers between processors and accelerators throughout workloads
China has deployed a huge CPU-only supercomputer known as LineShine that delivers 1.54 exaflops of AI coaching efficiency with out utilizing any GPUs in any respect.
The system packs 20,480 compute nodes, every containing two LX2 processors for a complete of 40,960 chips throughout your entire machine.
Every LX2 processor has 304 CPU cores, which means the entire supercomputer makes use of roughly 2.45 million Armv9 cores in complete.
Contained in the LX2 processor’s uncommon structure
The processor was developed by Huawei or by means of a joint design with China’s Nationwide Supercomputing Heart, although the precise origin stays undisclosed.
Every LX2 processor makes use of two compute chiplets with cores organized into eight clusters containing 38 cores per cluster.
Each core contains ARM’s Scalable Vector Extension and Scalable Matrix Extension items that speed up matrix operations utilized in AI coaching.
The processor delivers 60.3 teraflops of FP64 efficiency, 240 teraflops of BF16 throughput, and 960 teraops of INT8 efficiency from a single chip.
The reminiscence subsystem combines 32GB of on-package HBM delivering as much as 4TB/s of bandwidth with as much as 256GB of off-package DDR5 reminiscence.
CPU-only techniques provide a number of benefits for complicated scientific duties that mix AI coaching with huge information ingestion and preprocessing.
Since every little thing runs on the identical processor and reminiscence area, they keep away from pricey and bandwidth-hungry CPU-to-GPU information transfers.
Homogeneous CPU-based techniques may expose a lot bigger coherent reminiscence swimming pools by combining HBM with giant DDR capacities.
That is helpful for dealing with huge scientific datasets, retrieval augmented technology, and lengthy context home windows that GPU reminiscence limitations can’t accommodate simply.
The large caveat that comes with this strategy
CPU-only techniques are normally much less energy environment friendly and ship lower-density AI throughput than GPU-based supercomputers.
That is the foremost cause many of the trade bets on heterogeneous CPU plus GPU architectures for large-scale AI workloads.
China is pursuing this path largely on account of US bans on GPU exports, not because CPU-only techniques are technically superior for AI duties.
The LineShine exhibits that CPUs can efficiently carry out GPU jobs, however the effectivity hole between the 2 approaches stays substantial and unlikely to shut anytime quickly.
China is making a strategic trade-off, accepting decrease efficiency and better energy consumption in alternate for independence from overseas {hardware} and software program ecosystems like Nvidia’s GPUs and CUDA.
Whether or not that trade-off is sensible for long-term AI improvement relies upon fully on how shortly Chinese language producers can shut the efficiency hole with their very own GPU designs.
Till then, the LineShine will remain a outstanding engineering achievement and a sensible necessity, however in all probability not a blueprint for a way many of the world will construct AI supercomputers.
By way of Toms {Hardware}
Observe TechRadar on Google Information and add us as a most well-liked supply to get our professional information, critiques, and opinion in your feeds.
Source link
#China #built #CPUonly #monster #Nvidia #GPUs #remain #banned


