Hardware Acceleration for AI Workloads

Summary

In this NLP Cloud course we explain why specific hardware is often necessary in order to speed up the processing of machine learning workloads. We also review which are the best accelerators available on the market in 2023: GPUs, TPUs, IPUs, Inferentia, Habana Gaudi...

Here is the structure of the course:

Transcript

Hello everyone, this is Julien Salinas from NLP Cloud.

In this course, we are going to see which kind of hardware accelerators we can use today to speed up our inference workloads.

In order to understand hardware acceleration, it is important to remember that AI applications are mostly based on neural networks nowadays, also known as deep learning.

Matrix multiplication is an essential operation in neural networks because it allows them to learn complex data and complex representations of the data.

In a neural network, the input data is represented by a matrix and the weights of the connections between neurons are also represented by a matrix.

When these two matrices are multiplied, the result is a new matrix that represents the output of the neurons.

This process is repeated through multiple layers of neurons, allowing the network to learn increasingly abstract and complex features of the input data.

Matrices are core components of AI models, so it is important to use some hardware that is very good at performing operations on matrices.

Another important aspect is floating point numbers.

Floating points are important in neural networks because they allow for the representation of fractional values.

Like we just said, neural networks involve large matrices with many entries.

Using only integer values would quickly lead to overflow errors.

By using floating point values, neural networks can represent values with many decimal places, allowing for more precise calculations and better accuracy in the outputs.

So as a summary, in order to efficiently process workloads, AI workloads, we need hardware that is good at dealing with matrix multiplications and floating point calculations.

The two main options you can consider today for your machine learning workloads are CPUs and GPUs.

A CPU, or Central Processing Unit, is a general purpose processor that handles a wide range of tasks in a computer system, including running applications, managing the operating system, and performing mathematical calculations.

CPUs are designed to be versatile and can handle many types of tasks, but they are not optimized for any specific type of workload.

A GPU, or Graphic Processing Unit, is a specialized processor that is designed to handle complex, parallel workloads like graphics rendering and machine learning.

GPUs have thousands of smaller cores that work together to handle large amounts of data at once, making them much faster than CPUs for certain types of workloads.

Matrix operations can be easily parallelized on several small cores, which is why GPUs excel in this domain.

Also, GPUs typically have many more floating point units than CPUs, which allows them to perform floating point operations much faster.

You now know why a CPU is often not enough for today's AI workloads, and why specific hardware is often very important.

Now let's dive into the choices that you have when it comes to specific hardware accelerators.

NVIDIA GPUs are a powerful tool for graphic processing, and they offer a range of features that make them ideal for gaming, machine learning, video editing, and design and engineering applications.

As an AI software engineer, no doubt that you will have to play with NVIDIA GPUs as they have a central position on the GPU market today.

Their most powerful cards for AI in 2023 are the A100 and the H100.

AMD also offers a wide range of GPUs, including for machine learning.

Their RockM product range is interesting, and I encourage you to have a look at it.

Google also build their own AI chips, called TPUs for TensorFlow Processing Unit.

They use these chips internally, but also propose them in their Google Cloud offer.

You cannot purchase a TPU for yourself though.

TPUs work slightly differently than GPUs, but it will be a topic for another dedicated video.

Graphcore is a UK-based company making a specific AI hardware called IPU, equivalent to Google TPUs.

You can both purchase IPUs or use them in the cloud through one of their partners.

AWS build their own AI chips.

They have a chip dedicated to inference, called Inferentia, and another one dedicated to training, called Tranium.

These chips are relatively cheap.

You cannot purchase such chips for yourself, but you can use them on AWS EC2, or Sage Maker.

Intel also build their own AI chip, called Habana Gaudi, which is a very powerful but very expensive alternative.

The hardware accelerators are powerful, but also very expensive and not easy to purchase because of a global semiconductor shortage.

Consequently, this is why it's wise to work on optimizing your AI workload as much as possible in order for it to run on smaller hardware.

CPUs can actually even be a decent option for many machine learning workloads in many situations.

As you can see at the moment, in 2023, NVIDIA is the de facto solution when it comes to hardware acceleration in AI and machine learning.

But interestingly, some alternatives are appearing.

So maybe in a couple of years, for your next AI projects, you will possibly use other types of accelerators.

I hope this course was useful and I wish you a pleasant day.