GOOGLE REVEALED WHAT SUPERCOMPUTERS THEY ARE USING FOR ARTIFICIAL INTELLIGENCE
Google has released new details about the supercomputers it uses to train its artificial intelligence models. The tech company announced that the systems are both faster and more energy efficient than comparable systems from Nvidia Corp.
The company designs its own custom chip called the Tensor Processing Unit (TPU). It uses these chips for more than 90% of the company’s work in training artificial intelligence – the process of feeding data through models – to make them useful in tasks such as answering queries with human text or generating images.
TPU is now in its fourth generation. On Tuesday, Google published a research paper describing how it connected more than 4,000 chips in a supercomputer using its own custom-designed optical switches to help connect individual machines.
Improving these connections has become a key point of competition between companies building AI supercomputers as the so-called big language models that power technologies such as Google’s Bard or OpenAI’s ChatGPT have grown in size, meaning they are too big to store on a single chip.
Instead, the models are split between thousands of chips, which then have to work together for weeks or more to train the model. Google’s PaLM – the largest publicly revealed language model to date – was trained by splitting it across two supercomputers with 4,000 chips over 50 days.
Google said its supercomputers make it easier to reconfigure the connections between the chips on the fly, helping to avoid performance-boosting tuning issues.
“Circuit switching makes it easier to route around faulty components,” wrote Google contributor Norm Jupy and Google Distinguished Engineer David Patterson in a blog post about the system. “This flexibility even allows us to change the topology of the supercomputer link to speed up the performance of the ML (machine learning) model.”
Although Google is only now releasing details of its supercomputer, it has been online at the company since 2020 in a data center in Oklahoma. Google said the startup Midjourney has used the system to train its model, which generates crisp images after being given a few words of text.
In the document, Google says that for systems of comparable size, its chips are up to 1.7 times faster and 1.9 times more power efficient than a system based on Nvidia’s A100 chip, which came to market around the same time as the fourth-generation TPU.
An Nvidia spokesman declined to comment.
Google said it wasn’t comparing its fourth-generation to Nvidia’s current flagship H100 chip because the H100 came to market after Google’s chip and is made with newer technology.
The tech company has hinted that it is working on a new TPU that will compete with Nvidia’s H100, but did not provide details, with Jupy telling Reuters that Google has a “robust line of future chips”.
Digital skills for all
Digital skills for ICT professionals