Deep learning has seen a huge rise in popularity over the last five years in both enterprise and scientific applications. While the first algorithms were created almost 20 years ago with the development of artificial neural networks in 2000, the technology has come of age due to the massive increases in compute power, development of GPU technologies, and the availability of data to train these systems.
Today the use of this technology is widespread across many scientific disciplines, from earthquake prediction, high-energy particle physics and weather and climate modelling, precision medicine and even the development of clean fusion energy. With so many possible applications, it can be difficult for scientists to figure out if artificial intelligence (AI) or deep learning (DL) can fit into workflow. While some applications, such as speech or image recognition are well documented, other applications are just now coming to light, such as the use of language processing DL frameworks in deciphering protein folding.
Christoph Angerer, Nvidia’s AI developer technologies manager for EMEA, Russia and India, breaks down the use of DL into two basic use cases. ‘In general, it depends on the use case but you can think of two cases where AI is useful. The first case is to solve problems that are hard to solve in a rule-based way, which is a similar domain as you may have outside science for speech recognition or image recognition.
‘Those are an example of problems that are hard to solve with a classic algorithm – people have tried for a long time to sit down and write feature extractors and how to identify faces and so on. That was all manually crafted and that has been changed by using more compute and the availability of neural networks and more data,’ explained Angerer.
‘These are cases that are hard to write in a manual way, and that also exists in science, of course. You could think of the image as satellite data and the analysis could be looking at how to find hurricanes, or to find potential areas of drought,’ Angerer added.
‘You may want to sift through CERN data to find collisions that may be of particular interest. All those cases where a classical algorithm may be difficult to design, and AI can come to the rescue and help you come up with a solution,’ stated Angerer.
In these cases, the objective is to complete science that could not be done in another way. Here DL is opening up new possibilities for science that were just too complicated to be solved through classical computing techniques.
The other example given by Angerer covers topics which were previously possible, but this came with massive computational barriers or required teams of people to develop algorithms which are orders of magnitude slower than what can be achieved through the use of DL.
‘There is another case in scientific applications where you use AI as a surrogate model for existing solutions. For example, if you go into weather and climate simulation, you may know the kind of physics you want to model, but often in those cases you do not model the individual air molecules, because that would be too fine grain. Instead, you explore a square kilometre for a weather forecast or in EMCWF-style worldwide forecasts, you may have a grid of 100 x 100km,’ stated Angerer.
What you do there transitionally is you come up with algorithms from physicists that are approximations, they are parametrised models. They come up with formulas and then tune the parameters, so their expectations of the training data match up with an existing understanding of real-world data that can then be used to verify the approximations made by the parametrised model,’ Angerer continued.
In this case, the use of AI means that you do not need to work manually on the models, Similarly writing feature detectors for image recognition is now a challenge that has been passed along to the neural network, rather than being coded by people. The increase in compute and training data has supplanted the hand-coded work of scientists trying to understand weather and climate.
‘You do not have to come up with these surrogate models in detail, but this means you can focus on the physics to produce training data, but then in order to speed it up or make it usable in a production run, you can use AI models,’ added Angerer. ‘In this case, it is not about getting it done at all, because there are existing algorithms that can do what you want the AI to do. It is more about the speed-up, depending on the example you pick, but there is anecdotal evidence of not only 10x speed-up but 10,000x speed-up which represents an improvement of many orders of magnitude.
Making gains
While this may seem like a ridiculous number, there are a number of factors that can help to drive this number up by orders of magnitude. Some of these codes were originally developed many years ago and are sequential in nature. So just porting to GPU technologies and rewriting algorithms to take advantage of the parallel nature of the technology can provide a big boost in performance.
‘This is summarised in two main reasons: one is with AI you have an easy on boarding strategy onto the GPU, almost like a porting strategy to get these algorithms onto the GPU in the first place. The second reason is that AI, like modern GPUs, are finely tuned to the right kind of workloads, so you get the full advantage of the hardware and software stack, including maybe mixed precision calculations and so on, that could be more difficult to achieve otherwise in the scientific domain,’ stated Angerer.
An example of this can be found in work by the European Centre for Medium-Range Weather Forecasts (ECMWF). They have been working to transform their weather reports by implementing DL technology in the prediction of weather effects from radiation transport.
In this example, Angerer highlights how DL can help to parallelise a previously sequential computing problem, which helps to provide a huge increase in the speed of the calculations. ‘You have this radiation transport which has multiple layers and each layer has some variables like moisture, aerosol concentrations, temperature and so on. Now they have an understanding of the underlying physics they want to model, they come up with an algorithm which is then fairly sequential.
‘Originally you go from A to B, from layer to layer you go down once and then you go up again; it is somewhat like a tridiagonal solver in some sense - but that is a very sequential way of solving the problem. Now if you rephrase the problem as an AI problem, one thing that you have in an AI model is that you have significantly more parameters than you have in a handcrafted parametrised model, which could have 10 to 15 parametres that you tune. In an AI model, you could have millions,’ stated Angerer.
‘Now this could have the effect that those models pick up on underlying processes that the original modeller may have not deemed important, or may not even be aware of. That can be one case but the second case is due to the fact that this parametrised model itself but of a certain fixed structure which makes it very amenable to acceleration. You have convolutions and fully connected layers for example, you could not really handcraft a neural network to solve this problem but if you train a neural network to solve this problem the structure of the neural network is way more parallelisable than it would be if you handcrafted the model to directly or indirectly model the underlying physics,’ added Angerer
‘From the way that you design an algorithm, a human could not design a neural network – the weights in a neural network – to achieve this performance, because that is not how our brain works. But if you teach the computer to come up with those weights, the algorithm that comes out of it is much more parallel, and that is where a big chunk of the speed-up comes from,’ Angerer explained.
Revolution in a neural network
That is not the only example of a huge speedup but there are even more revolutionary effects that come from the implementation of AI. A clear example comes from the domain of image recognition, which was an area of interest for both scientists, enterprise and even defence, long before AI and DL were used to solve this challenge. Whole careers were dedicated to the design of individual filters for edge detection, which were then combined to detect faces and skin colour.
The introduction of AI was found to solve these problems much faster than was possible before. While you need significantly more data to train the algorithm the fast increase in compute and data has meant this is now much cheaper to run on AI, than it is to try and create ingame detection algorithms manually.
Other examples can also be found in science, such as the use of language models to predict protein folding. A TU Munich paper essentially used a language model to train protein sequences. You feed a protein sequence into the system and it outputs another sequence which tells you about the secondary structures when this protein folds,’ said Angerer.
‘Talking with them, they said a similar thing, one of the researchers dedicated their career under the assumption that human ingenuity was needed to design how these proteins fold but now it turns out that if you take an off the shelf language model, let it train for a couple of days - it can outperform 40 years of research,’ Angerer concluded.