Introduction:
Hello world! After spending a long period with electronics & robotics, now I am working with neural networks & deep learning. We all know that unlike the electronics, human brain doesn't work with binary logic. The thinking approach of human brain is not like a pre-programmed electronic system, instead we can say that thinking ability is dynamic & it posses fuzziness. In simple words, fuzziness can be defined as a state of "may-be" or the state of neither high nor low. Thus, simulating human brain on the computer gives self learning ability to the computer. There is a well known term in computing to simulate the human brain which is called as a "Neural network".
Structure
of neurons in your brain
|
Similar to human brain, an artificial neural network creates a mathematical model of thousands neurons. This neurons are connected to each other and they can communicate. I hope you already know about basic architecture of biological neurons, architecture of artificial neuron (perceptron) & various mathematical terms like weights, activation functions etc.
After that, you need to learn about fundamentals of deep learning & details of Convolutional Neural network. Long story short, if you dont know anything about deep learning & CNNs then google is your friend. Read about it & then come back so that you can understand what I am trying to explain further :)
- What is neural art?
I know you're thinking about it. What exactly a neural art is? Let me explain with an example.
The great wave of Kanagawa by Katasushika Hokusai |
Neural networks used their weights to predict the output. So for any input data there are 2 possible outputs. First is a target or expected output & another is predicted output. The difference between expected & predicted output is called as error/loss. Thus, we need to calculate loss at every step and we have to keep it minimising. Remember, Lower the loss, higher the painting effect.
So there are 3 main types of loss :
1.Content Loss: This is the difference between output image and the input base image.Content loss ensures that your output painting shouldn't get much deviated from your input image.
2.Style Loss: It is the difference between styles of the input & output image. It is calculated by calculating gram matrix of both the images and then calculating the difference between them. Gram matrix is the matrix formed by calculating the image co-variance with itself. Style loss maintains the style of painting in the output image.
3.Total validation loss: It is the difference between a pixel of resulting image with its neighbouring pixel. This is done so that the image remains visually coherent.
Optimization: It is the process of minimizing the loss by changing the input parameters. Optimization function tells us how much change in parameter is required to get minimum loss. Most of you should know about Gradient descent or adam optimizers. Here, we are using BFGS algorithm because its best suited for this application. Click here to know in depth mathematical equations of BFGS.
Code it..!
first, you need to setup Python & Keras (with theano backend) on your system. You can easily install keras module using pip once you have installed python.
Make sure your keras.json & .theanorc files will look like this:
(change device=cpu if you don't have GPU setup but remember that it will take very long, even few hours to complete the process on CPU instead of GPU.)
(I am using theano backend for keras. You can use tensorflow if you want but you may need to modify the source code if necessary).
Then create a new file and save it with .py extension. Lets import all required libraries and declare global variables.
EPOCHS defines the number of iteration. For each iteration we calculate loss & tries to optimize it, so we will get better paintings at the output as we increase the number of iterations. (Warning: This program is processor intensive, so it took around 70 seconds on my system to complete 1 iteration).
img_nrows & img_ncols defines the size of output matrix i.e. your painted image will be of dimension 300x300 pixels. [#pro-tip: reduce dimensions for faster processing]
Run same code with different style_wt & content_wt values to understand their significance.
Now, let's write some helper functions which will help us to pre-process our images. Pre-processing is necessary because our neural network requires input in particual format. It is similar to basic vision preprocessing occurring in your brain. After converting your image into painting, we need to reconstruct the output of neural network into the image format, for that purpose we are using deprocessing function.
Now lets read both the images i.e. style image (Wave art) & content image (puppy) into variables and store them in keras backend variables for further processing. Then create a Placeholder to store preprocessed images. Placeholder is simply a variable whose value can be assigned later. It is similar to declaring variable or object with None type initially but it will be used later to store the data.
Now we will create the neural network. For our purpose we are using a special type of convolutional neural network (CNN) which is called as VGG network. Their are 5 variants of VGG networks out of which we will use the standard model called as VGG-16. Keras provide pre-implemented VGG-16 class along with pre-trained weights. We are going to use these pre-trained weights because VGG-16 training is very much processor intensive and it will take lot of hours to train it. Thanks to 'imagenet' for releasing optimized VGG-16 weights as open source. The wights (approx 18 MBs) will be automatically downloaded into keras directory when you will run this program for the first time.
Here are the functions to calculate losses.
Now, we will set style & content attributes. After that, we will calculate gradients & set final output.
Now we will calculate loss & gradients and then we will evaluate the output losses by creating an evaluator class. What happen here is our BFGS algorithm calls the loss function from the evaluator class. This function calculates the loss & gradient with respect to style & base image.
Finally, we will create a for loop which will optimize the calculated loss using BFGS algorithm for each epoch.
I have iterated wave style on dog image for 5 times. I am attaching GIF image below for result comparison. You can observe that loss is reducing after every epoch and the co-relation between input & output image has been increased.
So, now you can try various art images as a styling image to perform different painting effects on your image. Just like your very own Prism!!
Base image & 5 iterations |
What's next?
So far, we completed neural art on an image. Similarly, you can try to artify videos too. All you have to do is just feed each & every frame to the CNN & reconstruct the video again from the processed frames. But it will take huge processing time, so if anyone of you is working with some powerful GPU like GTX1080 or K-series products, try this video styling and let me know about it in comments!
A very good article ! Elegantly explained one of the most used and popular application of CNN ! Keep up the good work :)
ReplyDeleteI'm glad you liked it. Will try to write more about deep learning.
DeleteGreat article, this is one of the best blog about Python and I really learned a lot. Do share more.
ReplyDeleteccna Training in Chennai
Python Classes in Chennai
R Programming Training in Chennai
Nice implementation...
ReplyDeletehttps://hjlabs.in