Skip to content
  • Home
    • Fine Arts Study
    • Hard Sciences Study
  • Github Projects
    • Computer Vision
    • Deep Learning
    • Classical ML

Social

Search Site

Meditations | Machine Learning

July 14, 2019 Computer Vision, Deep Learning, Fine Arts Study, Machine Learning, Meditations 1 Comment

Alla Prima

Result: Starry NIght Over Rhone, Neural Style Transfer

Last month I watched Loving Vincent. The movie was a story of Van Gogh post Saint Rémy, in his final year in Auvers-sur-Oise. The story reconstructed the lonely artist’s mental breakdown from the perspectives of people close to him, well at least they are there in his paintings. Aside from reconstruction in a sense of narrative, the movie was also reconstruction in material. The entire movie comprised of animations of reimagined Van Gogh’s paintings. It recreated 94 Van Gogh’s originals while creating about 67K oil paintings around them as animation keyframes. Needless to say it is a huge collaboration of thousands of artists and dedicated hours. You can find details of the behind the scenes here.

This blog is about my experiments with Van Gogh’s paintings and the Neural Style Transer (NST) algorithm. Most of my previous attempts with NST had resulted in mediocre results. The problem is that without a proper metric to judge styles that are intrinsically aesthetic, or without benchmark images to compare against, I didn’t know how and what to improve the algorithm with. Loving Vincent did exactly that, provided me a test bench. In this blog we will use NST to recreate some keyframes of the film. We will compare the results with the movie recreations of Van Gogh paintings.. maybe ponder whether the NST technique could have aided in the production of the movie to save countless hours.. Well, it is said that a pen is mightier than a sword. At times, a brush is mightier than a pen. This blog explores what if, we bring a machine gun to a brush fight.

Theory

This section is my attempt to simplify some concepts that I found confusing with the NST. But I will assume you know what a neural network is and the lingos around it. NST utilizes Convolutional Neural Networks and combines the style of one image with content of the other. Most style transfer mechanisms before neural networks sucked since the problem is highly non-linear in nature ( planning to dedicate another blog on the non-linearity and the mathematics of art). But for NST, the original paper3 does a great job in explaining the architecture and there are a number of tutorials available to help one build it. I played with couple of them but found Keras implementation [here] to be simple and effective to begin with. God is always in the details but to its core, NST is a simple a+b =c algorithm. We use a pre-trained CNN architecture, VGG-19, and combine losses from Content Image (a) and a Style Image (b) that adds up to the final result (c).

a) Content Loss

Let the input image be denoted by vector p. After a feed forward step, an output image x is generated. The pre-trained VGG-19 filters p with encoding on each layer into feature representations Fl that will eventually result in x. Let Pl denote the encoding on the input image p in feature space for that particular convolution layer l. The content loss is the loss measure that minimizes the squared distance between and Pl and Fl. Or,

Show more...

where i is the filter index in the node j in layer l. Below is the chunk of the Keras code specific to the content loss. However, instead of sum I used the mean which gave the best results. The code uses the ‘block5_conv2’ or the layer 15 in VGG-19 instead of ‘block4_conv2’ as the paper recommended.

model = vgg19.VGG19(input_tensor=input_tensor,
                    weights='imagenet', include_top=False)
                    
# get the symbolic outputs of each "key" layer (we gave them unique names).
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

def content_loss(base, combination):
    #return K.sum(K.square(combination - base))
    return 0.5*K.mean(K.square(combination - base))

# combine these loss functions into a single scalar
loss = K.variable(0.0)
layer_features = outputs_dict['block5_conv2']
base_image_features = layer_features[0, :, :, :]
combination_features = layer_features[2, :, :, :]
loss += content_weight * content_loss(base_image_features, combination_features)

Below are some content-only loss node outputs. First 10 are the outputs for 10 iterations of block5_conv2, next three the 10th iterations for block4_conv2, block3_conv3 and block2_conv2, the final image is the content image used. Note that early convolutions are easily reconstructed by the network.

b) Style Loss

The style loss is generated with the Gram matrix, which is the inner product between representations Fil and Fjl for each convolution feature i and j in layer l. Or,

Let wl be weight of each style layer. Then the contribution to the style loss by each layer is given by El which is the L2 norm of Pl wrt the gram Gl. Nl is the sum of all i i.e. total feature maps for the layer l and Ml is the total pixel size in a feature map. The total loss is the sum of all El given by:

Below is the code pertaining to the Style loss. The minor modification is to use user defined weights for each layer instead of equal normalized weights for all layers.

# compute the neural style loss

# first we need to define 4 util functions
# the gram matrix of an image tensor (feature-wise outer product)
def gram_matrix(x):
    assert K.ndim(x) == 3
    if K.image_data_format() == 'channels_first':
        features = K.batch_flatten(x)
    else:
        features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
    gram = K.dot(features, K.transpose(features))
    return gram

# the "style loss" is designed to maintain
# the style of the reference image in the generated image.
# It is based on the gram matrices (which capture style) of
# feature maps from the style reference image
# and from the generated image

def style_loss(style, combination):
    assert K.ndim(style) == 3
    assert K.ndim(combination) == 3
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = img_nrows * img_ncols
    #return K.sum(K.square(S - C)) / (4.0 * (channels ** 2) * (size ** 2))
    return 0.25*K.mean(K.square(S - C)) / (4.0 * (channels ** 2) * (size ** 2))
    
feature_layers = ['block1_conv1', 'block2_conv1',
                  'block3_conv1', 'block4_conv1',
                  'block5_conv1']
style_layer_weights = [0.2, 0.4, 0.5, 0.1, 0.6];
#style_layer_weights = [0.2, 0.2, 0.3, 0.5, 0.6]; //playing with diff params
#style_layer_weights = [0.2, 0.2, 0.2, 0.2, 0.2 ];

i_ = 0
for layer_name in feature_layers:
    layer_features = outputs_dict[layer_name]
    style_reference_features = layer_features[1, :, :, :]
    combination_features = layer_features[2, :, :, :]
    #sl = style_loss(style_reference_features, combination_features)
    #loss += (style_weight / len(feature_layers)) * sl
    sl = style_layer_weights[i_]*style_loss(style_referen_features, combination_features)
    loss += style_weight * sl
    i_+=1

Below are some style-only loss node outputs. First 10 are the outputs for 10 iterations of block2_conv1, next three the 10th iterations for block3_conv1, block4_conv1 and block5_conv1, the final image is the style image used. Note that early convolutions show more localized feature representations.

  • Style Image. Portrait of Armand Roulin, Van Gogh

c) Total Loss

In addition to the content loss, a total variation loss is added that works as a regularizer. Hence the total loss:

which is a+b = c as promised. Below is the code for total loss in Keras:

def total_variation_loss(x):
    assert K.ndim(x) == 4
    if K.image_data_format() == 'channels_first':
        a = K.square(
            x[:, :, :img_nrows - 1, :img_ncols - 1] - x[:, :, 1:, :img_ncols - 1])
        b = K.square(
            x[:, :, :img_nrows - 1, :img_ncols - 1] - x[:, :, :img_nrows - 1, 1:])
    else:
        a = K.square(
            x[:, :img_nrows - 1, :img_ncols - 1, :] - x[:, 1:, :img_ncols - 1, :])
        b = K.square(
            x[:, :img_nrows - 1, :img_ncols - 1, :] - x[:, :img_nrows - 1, 1:, :])
    #return K.sum(K.pow(a + b, 1.25))
    return K.mean(K.pow(a + b, 1.25))

loss += total_variation_weight * total_variation_loss(combination_image)

Experiments

Fine Tuning:

The fist four images were resulted from some non-Keras implementations of the NST and different optimization attempts. The fourth used original Keras version with sums instead of mean for all the loss functions and when style_weight/content_weight ~ 10000 as recommended in the paper. The fifth and sixth images resulted from using means instead of sums in computing style and content loss functions respectively. The parameters for eighth image was fine tuned to: –content weight 1.0 –style weight 10 –tv_weight 0.1 with style_layer_weights = [0.2, 0.4, 0.5, 0.1, 0.6] with sums substituted to means in all the loss functions. Almost all other images worked well in this range. All the results are the 10th iterations of the L-BFGS-B function minimizer.

Color Consistency:

The strength of the NST algorithm, at least the best fine tuned implementation I found, decreased significantly as the style image departed from the content image, or when the pixel histogram distances were high. For quick fix, I added some similar images to the background and maintained ambient color consistencies. Observe results in each pair below ( left being the content image and the right the NST solution).

  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use, Modified
  • Result: Girl in White, Neural Style Transfer
  • Content Image: Current view of the Church in Auvers-sur-Oise, Photo by Alex Roediger. Fair Use, modified
  • Result: The Church in Auvers-sur-Oise, Nerual Style Transfer

Size variations:

The Keras version of the algorithm is written such that the style image is resized to the content image. So, to get more granular brush strokes, I used a quick hack of making sizes of content image smaller (there must be a better way). For all the triads of images below, the second is the result of the reduced content image size while the third is the result of customized overlays between the first and the second, where the second is resized to the first.

  • Result: Portrait of Armand Roulin, Neural Style Transfer

Results

Portraits
  • Style Image. Portrait of Armand Roulin, Van Gogh
  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Result: Portrait of Armand Roulin, Neural Style Transfer
  • Artist's reinterpretation of Armand Roulin. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Style Image. Portrait of Postman Joseph Roulin., Van Gogh
  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Result: Portrait of Postman Joseph Roulin., Neural Style Transfer.
  • Artist's reinterpretation of Postman Joseph Roulin. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Style Image. Marguerite Gachet at the Piano, Van Gogh
  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Result: Marguerite Gachet at the Piano, Neural Style Transfer.
  • Artist's reinterpretation of Marguerite Gachet at the Piano. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Style Image. Girl in White, Van Gogh
  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use, Modified
  • Result: Girl in White, Neural Style Transfer
  • Artists reinterpretation of Girl in White . Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Style Image. Dr Gachet, Van Gogh
  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Result: Dr. Gachet, Neural Style Transfer.
  • Artist's reinterpretation of Dr. Gachet. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Style Image. Portrait of Adeline Ravoux, Van Gogh
  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Result: Portrait of Adeline Ravoux, Neural Style Transfer.
  • Artist's reinterpretation of Adeline Ravoux. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Style Image. Portrait of a Young Man with Cornflower, Van Gogh
  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Result: Portrait of a Young Man with Cornflower, Neural Style Transfer.
  • Artist's reinterpretation of a Young Man with Cornflower. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Style Image. Portrait of Paul-Eugène Milliet, Van Gogh
  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Result: Portrait of Paul-Eugène Milliet, Neural Style Transfer.
  • Artist's reinterpretation of Paul-Eugène Milliet. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Style Image. The Boatman, Van Gogh
  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Result: The Boatman, Neural Style Transfer
  • Artist's reinterpretation of The Boatman. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Style Image. Self Portrait, Van Gogh
  • Content Image. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
  • Result: Self Portrait, Neural Style Transfer.
  • Artist's reinterpretation of Self Portrait. Copyright © 2013-2019 Loving Vincent (http://lovingvincent.com). Fair Use
Scenes/Landscapes
  • Style Image: "Cafe Terrace at Night", Van Gogh
  • Content Image. Café Van Gogh, Arles, France. Fair Use
  • Result: Cafe Terrace at Night, Neural Style Transfer
  • Style Image: Wheatfield with Crows, Van Gogh
  • Content Image: The field where Van Gogh painted Wheatfield with Crows. Photo by Alex Roediger. Fair Use
  • Result: Wheatfield with Crows, Neural Style Transfer
  • Style Image: The Olive Trees. Saint Rémy, Van Gogh
  • Content Image: View from across the street of The Olive Trees. Photo by Alex Roediger. Fair Use
  • Result: The Olive Trees. Saint Rémy, Neural Style Transfer
  • Style Image: Starry Night Over Rhone, Van Gogh
  • Content Image: Current view of Rhone in Night time. Fair Use
  • Result: Starry NIght Over Rhone, Neural Style Transfer
  • Style Image: Stairway at Auvers, Van Gogh
  • Content Image: Current view of stairway at Auvers. Photo by Alex Roediger. Fair Use
  • Result: Stairway at Auvers, Neural Style Transfer
  • Style Image: The Church in Auvers-sur-Oise, Van Gogh
  • Content Image: Current view of the Church in Auvers-sur-Oise, Photo by Alex Roediger. Fair Use, modified
  • Result: The Church in Auvers-sur-Oise, Nerual Style Transfer
  • Style Image: Garden of the Hospital in Arles, Van Gogh
  • Content Image: Current view of the garden, Photo by Alex Roediger. Fair Use
  • Result: Garden of the Hospital in Arles, Neural Style Transfer
  • Style Image: The Langlois Bridge at Arles, Van Gogh
  • Content Image: Current View of the Langlois Bridge. Fair Use
  • Result: The Langlois Bridge at Arles, Neural Style Transfer

Summary

NST, you’ve been cool.

So did the Neural Style Transfer do justice to the artist? I think we tried our best. The neural paintings look structurally and aesthetically good after the optimizations, and pretty post impressionist as Van Gogh’s originals. The brush strokes, however, and maybe I need to fine tune the architecture better, I want to leave it up to your judgment. If you did not know about source of the style images, would you have guessed the neural paintings were Van Gogh styled? How far away from Gauguin, Cezanne, Monet, or Degas..

Could NST have helped in creating keyframes in Loving Vincent? At this point, its a yes for me..but the algorithm would quickly break down when subjects from multiple Van Gogh paintings had to interact in a frame. Not to mention, the movie’s soul rests in all the artists’ dedicated work.

The issue of multiple painting subjects is related to what we observed in the Experiment section, that the strength of the algorithm decreased significantly as the style image departed from the content image. It makes sense, as the algorithm is only learning the style of individual painting and not that of an artist, which would be learned from a collection of the artist’s work. I will try to tackle this issue of Artist Style Transfer in my next Fine Arts post. We will bring in some bad boy GANs into the brush fight. Until then, I hope you keep looking up in the starry nights..

References:

  1. Bethge, Matthias; Ecker, Alexander S.; Gatys, Leon A. (2016). “Image Style Transfer Using Convolutional Neural Networks”. Cv-foundation.org. pp. 2414–2423. Retrieved 13 February 2019.
  2. Keras Examples Directory, Neural Style Transfer (2018), GitHub repository, https://github.com/keras-team/keras/blob/master/examples/neural_style_transfer.py
  3. Loving Vincent et al., http://lovingvincent.com
CNNsConvolutional Neural NetworksExperimentationFine ArtsKerasLoving VincentMachine LearningNeural ArtNeural Style TransferPost-ImpressionismTransfer LearningVan Gogh

Post navigation

← Previous Post
Conv Nets II
Next Post →
Brachistochrone
roodrakx

You may also like

  1. Conv Nets IV

    January 5, 2020

  2. Conv Nets III

    September 24, 2019

  3. Conv Nets II

    November 27, 2018

1 Comment

  1. Binaya Bogati
    November 17, 2019

    Really interesting work. The effect is much more pronounced on pictures with higher contrast setting and lesser details. In detailed images (example: garden view), the algorithm seems to skim over a bit and loses its touch. Its fascinating how NST recreated those brush-strokes clearly in some of the paintings. I think you should keep working on it. Loved the concept!

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Chaos June 21, 2020
  • Conv Nets IV January 5, 2020
  • Conv Nets III September 24, 2019
  • Brachistochrone September 6, 2019
  • Alla Prima July 14, 2019

Archives

  • June 2020
  • January 2020
  • September 2019
  • July 2019
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • November 2016
  • October 2016

Categories

  • Classical ML
  • Computer Vision
  • Deep Learning
  • Fine Arts Study
  • Github Projects
  • Hard Sciences Study
  • Machine Learning
  • Meditations
2023 © Meditations | Machine Learning