News + Stories: AI-generated texts and images are currently big topics of discussion. What are the technical differences between them?
Thomas Pock: Neural networks used for language and for images are now built very similarly. The buzzword here is “transformer”, which means that if a word occurs in one place in a text, it is very likely that a similar word occurs in another place. With the word “forest”, for example, it is very likely that “green” or “wood” will also be found as well. If such similar words occur, the word “forest” also becomes more important in the sentence. So, the AI tries to find words elsewhere in the text that reinforce the one word. This mechanism is called “attention”. This is used when generating sentences. These probabilities can be learned with the transformers. And similar architectures are also used in image processing. For example, when I see a street, there may be a car or a person or a house in another part of the picture. The image is broken down into so-called patches, i.e. small sections of the image, which are then treated like words in a text.
How can we explain the functionality of an AI that works with images in simple terms?
Pock: It depends on what you want to do with the image processing. If you want to recognise objects, then the image is the input for the AI and then computational operations are carried out in which certain correlations with the image are calculated. In the next steps, these correlations are processed further and finally a yes-no decision is reached. Is that a car or not, is that a person or not?
When images are altered in photo filters, as we know from social media, what happens?
Pock: Mostly it’s an architecture that gets an image, extracts information from the image, but then converts that information back into an image. The buzzword here is the so-called U-net, which is one of the most frequently cited papers in the field of neural networks, because it looks like a U in terms of architecture. An image comes in and is broken down into its features and reduced, then it is enlarged again by processing the previously extracted features, and in the end another image comes out.
At the moment, it works so well because we have a large amount of data due to digitalisation and digital cameras
For the image output, however, does the AI then only use what it has learned beforehand or what it has been fed with?
Pock: For the AI, an image is just a field of numbers, like a big matrix. Each intensity value is a number and the artificial intelligence works with these fields of numbers. It performs arithmetic operations with it and in the end a number field comes out again, which results in a picture if you visualise it. Mathematically, then, it is relatively banal.
Yes, briefly summarised like this, it may sound banal. I assume, however, that the path leading up to this involves several years of previous research...
Pock: Yes, of course. If you know how to do it, it’s always easy. At the moment, however, it works so well because we have a large amount of data due to digitalisation and digital cameras. In the meantime, we have access to billions of images. There are data sets with five billion images and, additionally, we also have the computing power that is available through modern graphics cards.
The constantly growing computing power is clearly also important. What else can be expected as computing power continues to skyrocket?
Pock: You can then train larger and larger networks. The networks are already huge, with parameter sizes ranging from 150 million to a billion parameters, the latest ChatGPT networks even more. It might be a bit overstated, but you could say that these networks do nothing more than efficiently compress the learned data. More computing power here of course increases the speed enormously and thus also the possibilities of learning even larger networks with even more parameters.
Of course, in the end, doctors still have to make the diagnosis themselves
What can AI already do in the field of images, and where is it still struggling? Hands and feet in particular are repeatedly cited here as examples of problem areas, but where do the major challenges lie?
Pock: What AI is very good at in image processing is generating new images from those it has been trained with. However, it cannot create fundamentally new images, but rather dissects the data and puts it back together again like a puzzle. There is, for example, the Stable Diffusion application, which was trained with billions of images. Here you can enter a prompt that you would like to have, for example, an alpine hut in the style of Van Gogh in the French Alps and you get a picture. But the AI did not generate this image anew; it saw Van Gogh pictures, it saw alpine huts and simply combined the image information. This is similar to language processing with ChatGPT. But who knows, maybe this is already “real” artificial intelligence?
Where do you see the benefit of AI in image processing?
Pock: That depends on how you define benefit. When it comes to running a business that makes money from it, there are of course many possibilities. What benefit for humanity or for human beings per se will result from this is a difficult question. Personally, the medical field is closest to my heart. Artificial intelligence can support radiologists very well, for example, in order to spot pathologies in images even faster and better. Due to the improving imaging techniques with higher resolutions, there is a huge amount of data and therefore it is becoming increasingly difficult to find small pathologies. When it has been trained with a lot of data, the AI can find tumours or changes here very quickly and very reliably. This is a good support for radiologists, but by no means a substitute. Of course, in the end, doctors still have to make the diagnosis themselves.
So, it’s purely about increasing efficiency and assistance?
Pock: Yes, the necessary assistance to increase efficiency or accuracy goes hand in hand with having better and better imaging methods with higher and higher available resolutions. On the other hand, AI can also help to generate better images. For example, magnetic resonance imaging takes an incredibly long time due to certain physical limits, and with the help of AI you can improve image reconstruction so that you get better images from less data.
This interview with Thomas Pock is part of the TU Graz dossier "Artificial Intelligence". Find further dossiers in our overview.
On the other hand, we often hear that questionable or even dangerous things are being done with AI, especially in the field of images. On the one hand, the social media filters that throw up strange or falsified results, or image fakes and video fakes that keep doing the rounds. Where are the greatest dangers or challenges in this area?
Pock: The danger is that you polarise, change political decisions and steer the population in the wrong direction. This is already happening every day with trolls, chatbots and deepfakes. For example, the mayor of Kiev, Vladimir Klitschko, was “emulated” using deepfake methods and thus made fake phone calls to politicians in Germany. These dangers exist and we have to make sure that the population is aware of them. Nowadays, it is relatively easy to slip into the guise of a famous person and call someone with their voice. There is a very high probability that it will not be recognised. What you saw on television decades ago in Mission Impossible has become reality, at least via video transmission.
Are there possibilities or are procedures being worked on to be able to recognise something like this? Or is it a cat-and-mouse game because fake technology is constantly improving, making it harder and harder to detect?
Pock: If there is fake technology, then you will probably be able to invent technologies that can detect it. It’s like game theory. There are two players and one always tries to be better than the other. In game theory, such optimisation problems are called min-max games. If the fake technology knows how it can be detected, then it can be improved again, and so on. This leads up to the point where the checking AI, which is called the “discriminator”, can no longer detect the fake. An example of this cat-and-mouse game is adversarial training. This has been used very successfully with so-called generative neural networks, among other things. There is one network that generates images and then there is a second network that has to recognise whether this is a generated image or a real image. You train the image generator and the controlling discriminator so that one always tries to be better than the other. This continues until the generated images are so good that the discriminator can no longer distinguish them from real images.
Let’s look at the positive side. What good applications are there from your point of view and what can we expect in the near future?
Pock: Especiall in medicine, I have some really good insights. In the area of pattern recognition, for example birthmark recognition, there are already very good algorithms. They are already being used routinely. There are algorithms that can detect pathologies in the ECG better than cardiologists and, above all, completely fatigue-free. Some work on this has already been done. I myself work with cardiologists and researchers who do heart modelling, for example, to detect cardiac arrhythmias. And this is not done by means of an ECG, but with normal smart watches that record pulse curves, so-called PPG signals. In addition, images can now be reconstructed with much better accuracy, so less and less computed tomography and magnetic resonance imaging data is needed. You can automatically model organs from MR and CT images. Personalised medicine is the big buzzword here.
I make an MRI of a person, can segment the heart, make a heart model for it and can then check, for example, how well certain devices such as pacemakers will work. Such solutions already exist. What can we expect in the future? In the field of generative models, one can probably expect a lot. It will probably go in the direction of videos, so that you can create short videos with just a few keywords. The first attempts have already been made. This will certainly be very interesting for the film industry and the games industry. You can create music, you can create speech and at some point you can create music and speech and films together. That’s where I think it’s going. True artificial intelligence, as defined in textbooks, I don’t quite see that yet. Although I know it’s fashionable these days to refer to all sorts of learned algorithms as AI. But then I always jokingly say, AI means “Absent Intelligence”. What does artificial intelligence actually mean? Everyone has their own favourite definition. We are still a long way from imitating the intelligent behaviour of living beings, of humans, of animals, or from moving in that direction.
Can a human being ever create a system that surpasses them? Is that even possible? Or can you only try to get closer and closer to it?
Isn’t that a general problem with AI – that there are always people behind it, that design, develop, model it?
Pock: To this end, you have to ask yourself: can a human being ever create a system that surpasses them? Is that even possible? Or can you only try to get closer and closer to it? There are two tracks. There is narrow AI, which is what we are seeing at the moment. This means, that there are AI modules that can solve specific tasks very well. These are mostly in areas where a lot of data is available. These modules are very good at recognising cars in images, movements in images or tumours in CT images. But there really is no such thing as broad AI. So, you can’t say there are fundamental algorithms that provide a suitable answer no matter what question you ask. Although it has to be said that the latest developments around ChatGPT are already moving very strongly in the direction of very broad artificial intelligence, but for the time being only in the area of language.
Is it always a factor as to which particular human develops an AI since a certain bias may then be included in it?
Pock: Yes, bias is an extremely important aspect. When you look at who the typical AI developer is, they’re probably in their mid-30s, live somewhere in America and are mostly white. Of course, this bias is then also reproduced. And it is very important to make sure that in the future all strata of the population, all age groups, male, female, diverse, are represented. This is certainly not the case at the moment. In the data that you train with – we mentioned social media filters before – there are certain stereotypes and the trained models then reflect exactly these stereotypes (or bias).
So, is it always a question of training, of what I put in it? The saying is: bias in, bias out.
Pock: Bias is not bad in principle. Every person has a bias, because bias means an a priori knowledge. For example, when I’m driving and I see a child walking on the side of the road, I have a bias towards being careful because it could be dangerous. So bias is basically not a bad thing. You just have to be careful of biases that lead to unequal treatment or disadvantages because they are prejudiced. How can you recognise them? Unfortunately, it’s not that simple.
Would you like to receive the latest stories, news, research stories, interviews or blog posts from TU Graz directly on your smartphone or in your email inbox? Subscribe to the TU Graz Telegram newsletter free of charge.