Sunday, May 29, 2016

Elasticsearch based Image Search using RGB Signatures


In my previous post, I described some experiments I was doing to reduce images to a Bag of Visual Words (BOVW). My goal is to build a Content Based Image Retrieval (CBIR), i.e., a system that searches images based on their pixel content rather than text captions or tags associated with them. Furthermore, I would like to use a standard text search engine to do this - a lot of effort has gone into making these engines robust and scalable. So if I model the image search as a text search over a BOVW and deploy it to one of these engines, I get the robust and scalable part for free.

Just like text search, image search is also a balance between precision and recall. You do want the "right" result to appear on top, but you also want to see other results "like" the one you asked for. My attempt to model this fuzziness is to bin the pixels into coarser buckets along each channel; that way this post-processed image looks a little more like other images (thus improving recall) but still looks more similar to similar images than dissimilar images (thus not impacting precision too much).

In my previous post, I had experimented with KMeans clustering for binning the pixels along each channel, but it turned out to be too intensive when run across my entire butterfly corpus of 200 images from Photorack. In any case, I read later that KMeans is not very effective in a single dimension, so I switched to binning the pixels along each channel into 25 equal sized bins. At the end of this process, each image becomes a document composed of a vocabulary of 75 unique "words". Here is the code to read the images and write the corresponding data for loading into a search index.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# -*- coding: utf-8 -*-
# Source: es_build_rgb.py
import matplotlib.pyplot as plt
import os

import image_search_utils

INPUT_DIR = "../data/butterflies"
OUTPUT_FILE = "../data/butterflies_rgb.txt"

fout = open(OUTPUT_FILE, 'wb')
for fname in os.listdir(INPUT_DIR):
    print("Processing file: %s" % (fname))
    img = plt.imread(os.path.join(INPUT_DIR, fname))
    words = image_search_utils.get_vector_rgb(img)
    words_str = " ".join([w[0] + "|" + ("%.3f" % (w[1])) for w in words])
    fout.write("%s\t%s\n" % (fname, words_str))
fout.close()

For convenience, I factored some of the common functionality into a utils package. One of these is the functionality to convert an image of shape (640, 480, 3) into a vector as described above. We need to do the same transformation on an incoming image as well, so this is also called by the search code. The code for the package is shown below.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# -*- coding: utf-8 -*-
# Source: image_search_utils.py
from __future__ import division, print_function
import collections
import numpy as np
import operator

def get_vector_rgb(img):
    """ Vectorizes an RGB image. Each channel's pixel intensities are
        binned separately into 25 equal size bins, and the frequencies
        counted. The end result is a vector of bin "pixel-words" that 
        are formed by one of ["r", "g", "b"] and the mid point of the
        bin. Each such pixel-word is associated with its frequency,
        arbitarily normalized to 100. Output is a list of (pixel-word,
        normalized-frequency) tuples ordered by highest to lowest 
        normalized frequency.
    """
    colors = ["r", "g", "b"]
    bins = np.linspace(0, 256, 25)
    means = 0.5 * (bins[1:] + bins[:-1]).astype("uint8")
    words = []
    for i in range(len(colors)):
        px_orig = img[:, :, i].flatten()
        labels = np.searchsorted(bins, px_orig)
        px_reduced = np.choose(labels.ravel(), means, 
                               mode="clip").astype("uint8").tolist()
        counter = collections.Counter(px_reduced)
        words.extend([(colors[i] + str(x[0]), x[1]) for x in counter.items()])
    words_sorted = sorted(words, key=operator.itemgetter(1), reverse=True)
    max_freq = words_sorted[0][1]
    words_sorted = [(x[0], 100.0 * x[1] / max_freq) for x in words_sorted]
    return words_sorted
    
def search(vec, es, index_name, doc_type, top_n=35, start=0, size=10):
    """ Does a payload search on the underly Elasticsearch index.
        The top N features of the image vector are used to search.
        The function query called implements cosine similarity so
        the output scores are between 1 and 0. Output is a list of
        file names and scores tuples, ordered by descending order 
        of score.
    """
    top_vec = vec[0:top_n]
    top_vec_as_text = " ".join([x[0] for x in top_vec])
    top_vec_as_param = ",".join(["\""+x[0]+"|"+str(x[1])+"\"" for x in top_vec])
    query = """
{
    "from": %d,
    "size": %d,
    "query": {
        "function_score": {
            "query": {
                "match": {
                    "imgsig": "%s"
                }
            },
            "script_score": {
                "script": {
                    "lang": "groovy",
                    "file": "payload",
                    "params": {
                        "params": [ %s ]
                    }
                }
            }
        }
    }
}
    """ % (start, size, top_vec_as_text, top_vec_as_param)
    resp = es.search(index=index_name, doc_type=doc_type, body=query)
    hits = resp["hits"]["hits"]
    return [(hit["_source"]["filename"], hit["_score"]) for hit in hits]

The output of the vectorization step looks something like this. Each line represents a single image. The record consists of the file name followed by a tab, then followed by 75 pixel words and their frequencies separated by a pipe. Each pixel word starts with either r, g, or b to indicate the layer. The number following the character is the mid point of one of the bins. The frequency has been somewhat arbitarily normalized to a top value of 100. In retrospect I don't think I needed to do this.

1
2
3
4
5
1132868439-121.jpg     g122|100.000 r122|81.176 b122|71.557 r112|31.578 ...
1132868439-1210.jpg    r122|100.000 g122|94.113 b122|92.171 b37|24.449 ...
1132868439-12100.jpg   r122|100.000 g122|93.931 b122|92.702 b37|16.903 ...
1132868439-12101.jpg   r122|100.000 g122|96.838 b122|95.064 g26|34.409 ...
1132868439-12102.jpg   b122|100.000 g122|95.313 r122|94.820 r69|17.510 ...

For the index, I used Elasticsearch (ES) 2.3.3. Looking at the data format above, you probably guessed that I plan to use Lucene's Payloads feature. I have used Payloads before with Solr, but I am using ES more nowadays, so I figured it would be good to explore how to use ES for Payloads as well.

With the help of the OReilly ES Book (a freebie from Elasticon 2016), and some awesome advice on StackOverflow, I was finally able to get this data loaded. The code for creating the index schema and loading the data is shown below. If you look past the boilerplate, I am basically just declaring the analyzer chain for payload data, and declaring that I have two fields in my index - filename which is a string (not text), and imgsig which is a payload field. Then I read the file I just generated and write the records into the index.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# -*- coding: utf-8 -*-
# Source: es_create_rgb.py + es_load_rgb.py
import elasticsearch

es = elasticsearch.Elasticsearch(hosts=[{
    "host": "localhost",
    "port": "9200"
}])
create_index = """
{
    "settings": {
        "analysis": {
            "analyzer": {
                "payloads": {
                    "type": "custom",
                    "tokenizer": "whitespace",
                    "filter": [
                        "lowercase",
                        "delimited_payload_filter"
                    ]
                }
            }
        }
    },
    "mappings": {
        "rgb": {
            "properties": {
                "filename": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "imgsig": {
                    "type": "string",
                    "analyzer": "payloads",
                    "term_vector": "with_positions_offsets_payloads"
                }
            }
        }
    }
}
"""
resp = es.indices.create(index="butterflies", ignore=400, body=create_index)
print resp

fin = open("../data/butterflies_rgb.txt", 'rb')
line_nbr = 1
for line in fin:
    filename, imgsig = line.strip().split("\t")
    es.index(index="butterflies", doc_type="rgb", id=line_nbr, body={
        "filename": "%s" % (filename),
        "imgsig": "%s" % (imgsig)
    })
    line_nbr += 1
fin.close()

For querying the data, however, the advice on Stack Overflow did not work for me. Specifically, ES complained that it couldn't compile the Groovy script. I ended up moving the Groovy script out of the request and into the config/scripts directory of the ES server per the ES Scripting page docs. I also ended up modifying the script a little to make it emit cosine similarities scores instead of the sum of payload scores of the matched result as it was doing before. My initial objective was to make the score reflect the importance of the query image vector elements as well, but then I realized I could just normalize the number to get the cosine similarity. Here is the Groovy script for payload scoring. Some of the functions used are explained in more detail on the ES Advanced Scripting docs.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
numer = 0.0;
denom_l = 0.0;
denom_r = 0.0;
for (param in params) {
    def (word, weight) = param.tokenize('|');
    weight_l = weight.toFloat();
    denom_l = denom_l + (weight_l * weight_l);
    termInfo = _index["imgsig"].get(word, _PAYLOADS);
    for (pos in termInfo) {
        weight_r = pos.payloadAsFloat(0);
        numer = numer + (weight_l * weight_r);
        denom_r = denom_r + (weight_r * weight_r);
    }
}
return numer / Math.sqrt(denom_l * denom_r);

The image_search_utils package above contains the query to call this scoring function. A typical query looks like this. The query part matches the records which have the words r80 and r90 together, and the params provide the source words and frequencies to match with. Remember that each image can have a maximum of 75 features and are ordered by importance. So we can increase precision by increasing the number of features in our query image and increase recall by decreasing the number of features.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
{
    "query": {
        "function_score": {
            "query": {
                "match": {
                    "imgsig": "r80 r90"
                }
            },
            "script_score": {
                "script": {
                    "lang": "groovy",
                    "file": "payload",
                    "params": {
                        "params": [ "r80|10.0", "r90|10.0" ]
                    }
                }
            }
        }
    }
}

For data exploration, I built a little web application with CherryPy. The main page shows me thumbnails of all the images in my corpus. Clicking on an image does a search using that image as the query. Here is the code for the web application.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# -*- coding: utf-8 -*-
# Source: image_search_web.py
from __future__ import print_function
import cherrypy
import elasticsearch
import matplotlib.pyplot as plt
import os

import image_search_utils

IMAGE_DIR = "/full/path/to/image-search/data/butterflies"
ES_HOST = "127.0.0.1"
ES_PORT = "9200"
ES_INDEXNAME = "butterflies"
ES_DOCTYPE = "rgb"

class ImageSearchService(object):
    
    def __init__(self):
        self.img_fnames = self._load_images()
        self.es = elasticsearch.Elasticsearch(hosts=[{
            "host": ES_HOST,
            "port": ES_PORT
        }])
        self.index_name = ES_INDEXNAME
        self.doc_type = ES_DOCTYPE

    def _load_images(self):
        img_fnames = []
        for fname in os.listdir(IMAGE_DIR):        
            img_fnames.append(fname)
        return img_fnames

    @cherrypy.expose
    def index(self):
        html = ("""<table cellspacing="0" cellpadding="0" border="1" width="100%">""")
        for i in range(20):
            html += ("""<tr>""")
            for j in range(10):
                curr_fname = self.img_fnames[i * 10 + j]
                html += ("""<td><a href="/search?q=%s"><img src="images/%s" width="100" height="75"/></a></td>""" % 
                    (curr_fname, curr_fname))
            html += ("""</tr>""")
        html += ("""</table>""")
        return html

    @cherrypy.expose    
    def search(self, q):
        img = plt.imread(os.path.join(IMAGE_DIR, q))
        term_vec = image_search_utils.get_vector_rgb(img)
        results = image_search_utils.search(term_vec, self.es, self.index_name, self.doc_type)
        html = """<h3>Query:</h3><br/>"""
        html += """<b>Filename:</b> %s<br/>""" % (q)
        html += """<b>Image:</b><br/>"""
        html += ("""<img src="images/%s" height="75" width="100"/><br/><hr/>"""
            % (q))
        html += """<h3>Results</h3><hr/>"""
        html += """<table cellspacing="0" cellpadding="0" border="1" width="100%">"""
        html += """<tr><th>Rank</th><th>Filename</th><th>Image</th><th>Score</th></tr>"""
        rank = 1
        for result in results:
            html += """<tr valign="top">"""
            html += """<td align="center">%d</td>""" % (rank)
            html += """<td align="center">%s</td>""" % (result[0])
            html += """<td align="center"><img src="images/%s" height="75" width="100"/></td>""" % (result[0])
            html += """<td align="center">%.5f</td>""" % (result[1])
            html += """</tr>"""
            rank += 1
        html += """</table>"""
        return html
    
if __name__ == "__main__":
    cherrypy.config.update({
        "server.socket_host": "127.0.0.1",
        "server.socket_port": 8080
    })
    conf = {
        "/images": {
            "tools.staticdir.on": True,
            "tools.staticdir.dir": IMAGE_DIR
        }
    }
    cherrypy.quickstart(ImageSearchService(), config=conf)

The screenshots below show the search query page (also the home page) and a results page. Notice that the query image and first result image on the results page are identical. This is true for most searches I ran, although in a few cases the query result appeared in the second position, and in fewer cases, at lower positions.






In order to figure out how good the search was overall, I ran an evaluation to measure the change in Mean Reciprocal Rank (MRR), for a random set of 50 images, as I varied the number of query image features. Here is the code to do the evaluation.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# -*- coding: utf-8 -*-
# Source: eval_feats_rgb.py
from __future__ import division, print_function
import elasticsearch
import matplotlib.pyplot as plt
import numpy as np
import os

import image_search_utils

IMAGE_DIR = "../data/butterflies"
ES_HOST = "127.0.0.1"
ES_PORT = "9200"
ES_INDEXNAME = "butterflies"
ES_DOCTYPE = "rgb"

# gather 50 random unique images from collection
np.random.seed(42)
test_image_ids = set(np.random.choice(200, 50, replace=False).tolist())
test_images = []
curr_idx = 0
for fname in os.listdir(IMAGE_DIR):
    if curr_idx in test_image_ids:
        test_images.append(fname)
    curr_idx += 1

es = elasticsearch.Elasticsearch(hosts = [{
    "host": ES_HOST,
    "port": ES_PORT
}])
mrrs = []
top_ns = range(0, 80, 5)
top_ns[0] = 1
for top_n in top_ns:
    print("Now running with top %d features..." % (top_n))
    mrr = 0.0
    for test_image in test_images:
        print("... querying with %s" % (test_image))
        img = plt.imread(os.path.join(IMAGE_DIR, test_image))
        img_vec = image_search_utils.get_vector_rgb(img)
        results = image_search_utils.search(img_vec, es, ES_INDEXNAME, 
                                            ES_DOCTYPE, top_n)
        result_images = [result[0] for result in results]                                            
        result_mrr = 0.1        
        for rank in range(len(result_images)):
            if result_images[rank] == test_image:
                result_mrr = 1.0 / (rank + 1)
                break
        mrr += result_mrr
    mrrs.append(mrr /  len(test_images))

plt.plot(top_ns, mrrs)
plt.xlabel("Number of features (Top N)")
plt.ylabel("Mean Reciprocal Rank (MRR)")
plt.grid(True)
plt.show()        

The output of this code is the chart below, which shows that we get the best MRR overall with around 35 features, and plateues thereafter. The MRR at the platueue is around 0.8, so most of our query images are returned within the top 2 ositions in the search results, which I think is quite good for such a simple model.


The flip side of this analysis is to see how the scores decrease by position. Understanding this behavior would enable us to set a threshold for "good" matches. Keeping the top_N parameter set to 35, we run a small subset of images and chart their scores by position. The code to do this is shown below.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# -*- coding: utf-8 -*-
# Source: eval_scores_rgb.py
from __future__ import division, print_function
import elasticsearch
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import os

import image_search_utils

IMAGE_DIR = "../data/butterflies"
ES_HOST = "127.0.0.1"
ES_PORT = "9200"
ES_INDEXNAME = "butterflies"
ES_DOCTYPE = "rgb"
NUM_TESTS = 5

# gather 5 random unique images from collection
np.random.seed(42)
test_image_ids = set(np.random.choice(200, NUM_TESTS, replace=False).tolist())
test_images = []
curr_idx = 0
for fname in os.listdir(IMAGE_DIR):
    if curr_idx in test_image_ids:
        test_images.append(fname)
    curr_idx += 1

es = elasticsearch.Elasticsearch(hosts = [{
    "host": ES_HOST,
    "port": ES_PORT
}])

cmap = matplotlib.cm.get_cmap("Spectral")
color = np.linspace(0, 1, NUM_TESTS)

for i in range(NUM_TESTS):
    img = plt.imread(os.path.join(IMAGE_DIR, test_images[i]))
    img_vec = image_search_utils.get_vector_rgb(img)
    results = image_search_utils.search(img_vec, es, ES_INDEXNAME, ES_DOCTYPE,
                                        top_n=35, start=0, size=50)
    scores = [result[1] for result in results]
    plt.plot(range(len(scores)+1)[1:], scores, color=cmap(color[i]))   
plt.axhline(0.9, 0, 50, color='r', linewidth=1.5, linestyle="dotted")
plt.xlabel("Result Position (Rank)")
plt.ylabel("Score")
plt.grid(True)
plt.show()

And the chart generated by the above program is as follows. As can be seen, all the queries result in 5-10 images above the 0.9 mark (marked by the red dotted line), which might be a good threshold to use in this case if we were trying to find very similar images.


Thats all I have for today. In this post, I described about how ES handles Payloads and function queries. Looking back with the work I did with Solr payloads, I found the ES implementation quite intuitive and easy to use. I was also able to build an index where I store pre-built vectors (as opposed to have the index create vectors out of the text) and compute cosine similarities using function queries. I am actually quite pleasantly surprised at how well the functionality is working. In coming weeks, I plan on making some improvements, will have more to share once I do that.

Sunday, May 15, 2016

Feature Reduction in Images


At ElasticON 2016 earlier this year, Doug Turnbull spoke about Image Search in his talk The Ghost in The Search Machine. The idea is simple - treating an image as a bag of pixel intensity "words". Thus a 640 x 480 color image of size is represented by 307,200 words, each word representing a single pixel. Since the image is in color, each pixel has 3 numbers, one for red, green and blue respectively. In our representation, an example word might be 251_253_248 representing (251, 253, 248) for RGB respectively, and each image would be represented by 307,200 such words. Now that the image is just a bunch of words, one can use Lucene (Doug's example was ElasticSearch) to look up similar images given an image.

A similar approach was demonstrated by Adrian Rosenbrock on his blog post Hobbits and Histograms - A How-To Guide to Building your First Image Search Engine in Python, where he uses histograms to detect similar images. Instead of a Lucene index, he uses Chi-squared distance between the histogram for the query image and the other images in the index.

I have been thinking of a hybrid approach where I use a Lucene index to store images as a sequence of pixel words, but where I increase the recall somewhat by doing a bit of feature reduction to have fewer words. So, looking at my original example, since my pixel word is composed of three values in the range 0-255, it can have 16,777,216 possible values. However if I bin them into 25 possible values per channel, my vocabulary size reduces to just 625. This post explores a couple of ideas around binning the pixels.

For my test, I used this image of a butterfly I got from the free stock photos at PhotoRack. The charts on the right represent the histograms for the pixel counts for different intensities for each of the RGB channels.






The code to generate the two images are as follows:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import matplotlib.pyplot as plt

img = plt.imread("/path/to/1132868439-12104.jpg")

# original image
plt.imshow(img)
plt.show()

# channel histograms
channels = ["red", "green", "blue"]
for ix, ch in enumerate(channels):
    plt.subplot(3, 1, ix+1)
    plt.hist(img[:, :, ix].reshape(img.shape[0] * img.shape[1],), color=ch[0])
    plt.grid(True)
plt.tight_layout()
plt.show()

One way to reduce the number of features is to bin the pixel intensities for each channel in equally spaced blocks. Since I want only 25 values in each channel, I create 25 bins. I then create a new image where each pixel is replaced with the mid-point of the range for the bin it belongs to. The code to do this is shown below.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import numpy as np

# binning each matrix into 25 bins and replacing with binned value
bins = np.linspace(0, 255, 25)
binned_img = ((10 * np.digitize(img, bins)) + 5).astype("uint8")
plt.imshow(binned_img)
plt.show()

# channel histograms (binned image)
for ix, ch in enumerate(channels):
    plt.subplot(3, 1, ix+1)
    plt.hist(binned_img[:, :, ix].reshape(img.shape[0] * img.shape[1],), 
             color=ch[0])
    plt.grid(True)
plt.tight_layout()
plt.show()

And here is the resulting image and the histogram. Notice that parts of the red and green channels are slightly taller and parts of the blue channel slightly shorter than the original histogram.






Finally, instead of equal sized bins, I use K-Means to cluster the pixel intensities in each channel into 25 clusters, and replace the pixel intensities with the value of the centroid of the cluster it belongs to. Here is the code to do this:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from sklearn.cluster import KMeans

temp_img = np.zeros((img.shape[0] * img.shape[1], 3), dtype="uint8")
for ix, ch in enumerate(channels):
    kmeans = KMeans(n_clusters=25)
    kmeans.fit(img[:, :, ix].reshape(img.shape[0] * img.shape[1], 1))
    centroids = kmeans.cluster_centers_
    labels = kmeans.labels_
    for lx, label in enumerate(labels):
        temp_img[lx, ix] = int(centroids[label])
clustered_img = temp_img.reshape((img.shape[0], img.shape[1], img.shape[2])) 
plt.imshow(clustered_img)
plt.show()

# channel histogram (clustered image)
for ix, ch in enumerate(channels):
    plt.subplot(3, 1, ix+1)
    plt.hist(clustered_img[:, :, ix].reshape(img.shape[0] * img.shape[1],), 
             color=ch[0])
    plt.grid(True)
plt.tight_layout()
plt.show()

And here are the corresponding image and channel histograms. The histograms seem to be closer to the original than the equal sized bins above.






The nice thing about the two approaches is that the change in image quality seems to be quite minor to the human eye (at least my human eye), but these images can be represented by much smaller vectors. Hopefully this will translate to faster performance and/or smaller Lucene term indexes.

Friday, May 06, 2016

Experiments with Image Convolutions


I was looking for some help on the Internet around Convolutional Neural Networks (ConvNets) recently, and stumbled upon this interesting page on how GIMP uses convolutions for image transformations. Of course, the process described here is the opposite of what happens with ConvNets. Here, we take a known filter and apply it to an image to get a known effect. In case of ConvNets, you start with a random filter and let the magic of backpropagation nudge the values in the filter so that the training objective is maximized.

Anyway, I thought it might be fun to (mis)use the Tensorflow API to compute convolutions on an image with the filters listed in the GIMP documentation page above. For the image, I chose this image of a Red Macaw from FreeDigitalPhotos.net.


Here is the code to apply various convolutions to this image, first using the Scipy API and then using the Tensorflow API.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# -*- coding: utf-8 -*-
from scipy import signal
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

IMG_PATH = "/path/to/ID-100137333.jpg"
FILTERS = {
    "Sharpen"     : np.array([[0,  0,  0,  0, 0], 
                              [0,  0, -1,  0, 0],
                              [0, -1,  5, -1, 0],
                              [0,  0, -1,  0, 0],
                              [0,  0,  0,  0, 0]]),
    "Blur"        : np.array([[0, 0, 0, 0, 0],
                              [0, 1, 1, 1, 0],
                              [0, 1, 1, 1, 0],
                              [0, 1, 1, 1, 0],
                              [0, 0, 0, 0, 0]]),
    "Edge Enhance": np.array([[ 0, 0, 0],
                              [-1, 1, 0],
                              [ 0, 0, 0]]),
    "Edge Detect" : np.array([[0,  1, 0],
                              [1, -4, 1],
                              [0,  1, 0]]),
    "Emboss"      : np.array([[-2, -1, 0],
                              [-1,  1, 1],
                              [ 0,  1, 2]])}
                             
def plot_image(label, image):
    print label, image.shape
    plt.imshow(image)
    plt.xticks([])
    plt.yticks([])
    plt.show()

def do_scipy_convolution(image, fltr):
    layers = []
    for z in range(image.shape[2]):
        layers.append(signal.convolve2d(image[:, :, z], fltr, mode="same"))
    return np.array(layers, dtype="uint8").swapaxes(0, 2).swapaxes(0, 1)

def do_tensorflow_convolution(image, fltr, do_pooling=False):
    batched_image = np.array([image[:, :, 0], 
                              image[:, :, 1], 
                              image[:, :, 2]], dtype="float32")
    batched_image = batched_image.reshape((batched_image.shape[0],
                                           batched_image.shape[1],
                                           batched_image.shape[2], 1))
    conv_fltr = fltr.reshape((fltr.shape[0], 
                               fltr.shape[1], 1, 1)).astype("float32")
    conv_image = tf.nn.conv2d(batched_image, conv_fltr,
        [1, 1, 1, 1], padding="SAME")
    if do_pooling:
        conv_image = tf.nn.max_pool(conv_image, [1, 3, 3, 1], [1, 3, 3, 1], 
                                    padding="SAME")
    with tf.Session() as sess:
        output_image = sess.run(conv_image)
        output_image = output_image.swapaxes(0, 2).swapaxes(0, 1)
        output_image = output_image.reshape((output_image.shape[0],
                                             output_image.shape[1],
                                             output_image.shape[2]))
        output_image = output_image.astype("uint8")
        return output_image
    
def plot_images(label, image_sp, image_tf, image_tfp):
    print label, image_sp.shape, image_tf.shape, image_tfp.shape
    plt.subplot(131)
    plt.imshow(image_sp)
    plt.xticks([])
    plt.yticks([])
    plt.subplot(132)
    plt.imshow(image_tf)
    plt.xticks([])
    plt.yticks([])
    plt.subplot(133)
    plt.imshow(image_tfp)
    plt.xticks([])
    plt.yticks([])
    plt.tight_layout()
    plt.show()
    
# show original image
image = plt.imread(IMG_PATH)
plot_image("Original", image)

for f in FILTERS.keys():
    conv_output_sp = do_scipy_convolution(image, FILTERS[f])
    conv_output_tf = do_tensorflow_convolution(image, FILTERS[f])
    conv_output_tfp = do_tensorflow_convolution(image, FILTERS[f], True)
    plot_images(f, conv_output_sp, conv_output_tf, conv_output_tfp)

I used the Scipy convolution2d API because the results of Tensorflow's conv2d seemed somewhat non-intuitive compared to the results in the Gimp docs. I tried the depthwise_conv2d call instead but did not get noticeably better results. Finally, I simulated the way I was calling the Scipy convolution2d call layer by layer by treating the three layers as single layers in a batch of 3, and I got results that were identical for both Scipy and Tensorflow.

I also tried to do max pooling just to see what happens. The images below show how the image is transformed using each filter. The leftmost image is the image after being transformed using Scipy's convolution2d function, the center one is the image transformed by Tensorflow's conv2d, and the right most one is the result of Tensorflow's conv2d followed by max_pool. As you can see, in the case of edge detection, the edges seem to show up better after max pooling. Here are the results for the various filters.

Sharpen


Blur


Edge Enhance


Edge Detect


Emboss


I realize this was probably a bit pointless, but I found the idea of doing image transformation using convolutions quite fascinating. I guess this idea might be fairly well known among image processing folks, and that's probably the reason convolutions are used in ConvNets. In any case, it was fun to see yet another area where linear algebra proves useful.