How machine learning and artificially generated images might replace photography as we know it.
When hearing the words ‘AI’, ‘Machine Learning’ or ‘bot’ most people tend to visualize a walking, talking android robot which looks like something out of a Sci-Fi movie and immediately assume about a time far away in the future.
Sorry folks! AI has been around us for years now and is currently residing in your smartphone (We love you Siri/Google Assistant!), your car’s GPS system and even thinking up which article it’s going to recommend for you once you’re done reading this. However no domain has been more affected by it in the past few years than that of computer vision.
With the advent of technology, it is becoming increasingly common to see visually appealing images with ultrahigh resolution. People no longer need to learn using tools like Photoshop and CorelDRAW to enhance and alter their images. AI is already being used in every aspect of image augmentation and manipulation in order to produce the best possible pictures. However, the latest idea to emerge is actually using AI to generate images, synthetically.
Nearly every image that you might have seen would have been a captured photograph or manually created by a living, breathing person. There are possibly hundreds of tools for producing images manually but they do require a human presence to preside over the process. However, imagine a computer program that draws from scratch whatever you tell it to. Microsoft’s Drawing Bot might be one of the first and only such technologies that make this possible. Envision a time in the near future, when you can just download an app on your smartphone and give it a few instructions such as “I want an image of me standing next to the Eiffel Tower. ” (Make sure you word it correctly, though).
Generative Adversarial Networks (GANs)
“GANs are the most interesting idea in the last 10 years in ML”
— Yann LeCun
The foundation that makes such synthetic image generation lies in Generative Adversarial Networks. Ever since their discovery and launch in 2014 by Ian Goodfellow and his peers in their research paper, GANs have remained one of the most fascinating and widely used aspects of Deep Learning. The endless applications of this technology, which is the heart of something called adversarial training encompasses the domains of not only computer vision but also Data Analytics, Robotics and predictive modelling.
So what is all the big deal about GANs ?
Generative Adversarial Networks belong to the set of generative models. This means that their job is to create or “generate” new data in a completely automated procedure.
As the name suggests, a GAN is actually composed of two individual neural networks which compete against each other(In an adversarial manner). One neural network, called the generator, generates new data instances which it creates from random noise, while the other, the discriminator, evaluates them for authenticity. In other words, the discriminator decides whether each instance of data it reviews belongs to the actual training dataset or not.
A simple example
Let’s say you’ve been tasked to make a painting identical to one made by a very famous artist. Unfortunately though, you don’t know who this artist is, or ever seen one of his paintings. Your task is to forge a painting and present it at an auction as one of the originals. So, you decide to give it a try. All you’re going to need is some paints and a canvas, right ? However, the auctioneers don’t want people selling some random stuff and only want genuine articles so they’ve gone ahead and hired a detective that will first verify all items presented at the auction. As luck would have it, the detective has his own samples of the original paintings by the famous artist and when you present your random painting, he knows at once that this is nothing like the originals.
He rejects it and you decide to give it another try. But this time, you have a few useful tips, that the detective let slip when he evaluated your canvas, on what the painting should actually look like.
Now when you try your luck again, the painting should be a little better. But the detective still isn’t convinced and rejects you again. So you keep trying again and again, every time using some form of feedback to alter the painting and it gets better and better.(We’re going to assume the detective is OK with you returning endless times.) In the end, after a thousand or so tries, you’re finally able to come up with something that’s close to a perfect replica. As the detective looks at his sample paintings, he is unsure whether what you handed him might be one of them or even something else that has the same style and strokes of the famous artist.
What is the step by step process of a GAN’s working?
Applying the same thought process to a combination of neural networks, the training of GANs consists of the following steps:
The generator initially takes in some random noise and passes it to the discriminator.
As the discriminator already has access to a dataset of real images, it compares them to the image it received from the generator and evaluates its authenticity.
Since the initial image is just random noise it would be evaluated as fake.
The generator keeps trying its luck by varying its parameters so as to produce images that start getting a bit better.
Both networks keep getting smarter as the training progresses, the generator at generating fake images and the discriminator at detecting them.
Eventually the generator manages to create an image indistinguishable from one in the dataset of real images. The discriminator is not smart enough to tell whether the given image is real or a fake.
At this point, the training ends and the generated image is our final result.
Pros and Cons
Like all technologies, GANs also have their own unique set of pros and cons. Let’s summarize a few of these without going too deep into details.
Here are some potential advantages of using GANs
· GANs don’t always need labelled examples to train.
· They are faster at generating samples than other generative models such as belief nets because they have no need to generate the different entries in the sample sequentially.
· They are much easier to train generative models which rely on Monte Carlo approximations to the gradient of the log partition function. Because Monte Carlo methods don’t work very well in high dimensional spaces, such generative models cannot perform well for realistic tasks like training with ImageNet.
· They don’t introduce any deterministic bias. Certain generative methods like Variational AutoEncoders introduce deterministic bias because they optimize a lower bound on the log-likelihood rather than the likelihood itself. This seems to result in VAEs learning to generate blurry samples compared to GANs.
In the same way, there are also the following downsides:
· GANs are particularly hard to train. The function these networks try to optimize is a loss function that essentially has no closed form (unlike standard loss functions like log-loss or squared error). Thus, optimizing this loss function is very hard and requires a lot of trial-and-error regarding the network structure and training protocol.
· Specifically for Image generation, there is no proper measure to evaluate accuracy. Since a synthetic image would appear passable to the computer itself, the actual result is a very subjective topic and would depend on a human observer. Instead we have functions like the Inception Score and Frechet Inception Distance to measure their performance.
Applications of GANs
Here comes the fun part. A list of all the amazing stuff we can do using GANs. Among all its potential uses, GANs have found a tremendous number of applications in the field of computer vision.
Text to Image Conversion
There are several implementations for this concept such as the TAC-GAN — Text Conditioned Auxiliary Classifier Generative Adversarial Network. They are used for synthesizing images from their text descriptions.
It includes image to image translation using a special type of GAN called CGAN(Conditional Generative Adversarial Networks). Painting and concept design have never been as easy as this. However, although GANs can complete doing simple drawings like this purse from its sketch, drawing more complex stuff like perfect human faces is currently not a GAN’s strong point. In fact, its results are quite nightmarish for certain objects.
Results from the CGAN pix2pix (Source:Github)
Two exciting applications of Generative networks can be seen in Inpainting and Outpainting. The first includes filling up or noise within an image, which might be considered as image repairing. For example, given an image with holes or gaps, a GAN should be able to correct it in a “passable” fashion. Outpainting on the other hand involves using the network’s own learning to imagine what an image might look like outside its current boundaries.
Thanks to generative networks, face synthesis is possible, which involves generation of a single face image in different angles. This is why facial recognition does not require hundreds of samples of your face but can work it out with one. Not only that, generating artificial faces has also become possible. NVIDIA recently used their GAN 2.0 to generate artificial human faces in HD resolution using the Celeba Hq dataset, the first instance of synthetic image generation in High resolution.
Intricate little methods are also becoming possible such as changing facial movements. GANimation is a research effort using PyTorch that defines itself as ” Anatomically-aware Facial Animation from a Single Image”.
Official implementation of GANimation. (Source: Github)
Painting to Photograph Translation
Another example of using GANs to make images more realistic is to simply turn a (pretty good) painting into a photograph. This is done using a special type of GAN called CycleGAN which uses two generators and two discriminators. We call one generator G, and have it convert images from the X domain to the Y domain. The other generator is called F, and converts images from Y to X. Each generator has a corresponding discriminator, which attempts to tell apart its synthesized images from real ones.
Where do we go from here ?
Machine Learning and GANs are sure to have a tremendous impact on imaging and photography in the near future. Currently, this technology is capable of generating simple images from text inputs. However, in the foreseeable future, it would be able to create not only precise images in high resolution but also entire videos. Imagine an entire movie generated by simply feeding the script into a GAN. Not only that, every person could use simple interactive apps to create their own movies(possible even starring themselves!). Would this technology be the end of real photography, direction and acting?
Impressive technology also means a potentially usage for wicked purposes. Perfectly fake images will also require a means to identify and detect them. Regulation of such image generation would be required. Currently, GANs are already being used for the creation of fake videos or “Deepfakes” which are being used in a negative manner such as generating fake pornographic videos of celebrities or feature people saying things without their knowledge. The consequences of making technology to synthesize audio and video generation available to the general population are scary.
Artificial image generation is a double edged technology, especially in a time when little about it is widely known. Generative Adversarial Networks are a tremendously useful as well as dangerous tool. The fact that it will reshape the world of technology is certain but how it shall do so, we can only ponder.