In this article, I will apply K-means clustering algorithms into three more realistic image processing papers: i) grouping the handwriting, ii) separating the object (image segmentation) and III Compressed (image compression). Through here, I would also like to get acquainted with some simple techniques in image processing – an important array in Machine Learning. Souce code for examples in this page can be here.

Warning: This post does not have many math.

- Handwriting Grouping

Mnist Database Set

The Mnist Database department is the largest database of handwriting and is used in most image-identifying algorithms (Image Classification).

Mnist consists of two subset: the training data set has a total of 60k for example of handwriting handwritten from 0 to 9, the test data (test set) has a different 10k examples. All have been labeled. The image below is an example of some of the images extracted from mnist.

Each picture is a black and white image (with 1 channel), sized 28×28 pixels (totaling 784 pixels). Each pixel carries a value that is a natural number from 0 to 255. The black pixel is worth 0, the more and white pixels are the higher value (but not more than 255). Here’s an example of the 7 digits and the value of its pixels. (For the purpose of showing the pixel matrix on the right, I was resize pics about 14×14)

Hypothetical subheadings

The calculation: Suppose that we do not know the label of these digits, we want to subgroup the nearly identical photographs of a group.

Again one more assumption is that we have only known the recent K-means clustering algorithm from the basic blog Machine Learning (sorry readers because to the lessons more meaningful, we sometimes need the assumptions not to be practical.) , how are we going to solve this math?

Prior to applying the K-means clustering algorithm, we need to treat each photograph as a data point. And because each data point is 1 vector (row or column) rather than matrix like number 7 above, we must do a simple intermediate step called vectorization (vector chemistry). That is, in order to get 1 vector, one can separate the rows of the pixel matrix out, then place them together, and we are a very long row vector performing 1 digit photograph.

Important: This is just the simplest way to describe image data using 1 vector. In fact, people apply a lot of different techniques to be able to create featured vector (feature vector) that helps the algorithm get better results.

Work on Python

First you visit the Mnist’s homepage to download this database kit. Although in this post we only use the test data kit with 10k photo and no label required, the file you still need to download both t10k-images-idx3-ubyte. GZ and t10k-labels-idx1-ubyte. GZ because the Python-mnist library requires both files to load data from the test file.

First we need to declare some of the libraries:

NumPy for matrix related mathematics. Mnist to read data from Mnist. Matplotlib to display the drawing picture. Sklearn is the main scikit-learn that we have made acquainted in the previous post. (On the installation of these libraries, I hope you read it can be Google plus slightly. If there is difficulty in the installation, leave a comment below the post. Note, working on Windows will be a bit more difficult than Linux)

*# %reset*

**import** numpy **as** np

**from** mnist **import** MNIST *# require `pip install python-mnist`*

**import** matplotlib.pyplot **as** plt

**from** sklearn.cluster **import** KMeans

**To display multiple pictures of digits at once, I use the display_network. py function.**

**Implement the K-means clustering algorithm on the entire 10k digit number.**

**from** display_network **import** *****

mndata **=** MNIST(‘../MNIST/’) *# path to your MNIST folder *

mndata**.**load_testing()

X **=** mndata**.**test_images

kmeans **=** KMeans(n_clusters**=**K)**.**fit(X)

pred_label **=** kmeans**.**predict(X)

(The rest of source code can be found here)

Come here, after you’ve found the center and subgroup of data into each cluster, I want to show how the center looks and what the photos are assigned to each cluster that is the same. Below is the result when I choose randomly 20 photos from each cluster.

Apply K-means clustering to the test set of mnist database sets with K = 10 clusters. Column 1: Centers of clusters. The remaining columns: each row is 20 random data points selected from each cluster.

Each row corresponds to a cluster, the first column that has the green background on the left is the centers found to be of the Clusters database page (red rather than the higher pixel value). We see that the center is either identical to a certain digit, either the combination of the two thirds digits. For example, the center of the fourth group is a combination of numbers 4, 7, 9; The seventh row is a combination of numbers 7, 8 and 9.

However, the photographs taken out randomly from each group look not really the same. The reason may be that these photos are far from the center of each group (although the center was the closest). Such a K-means clustering algorithm works not really well in this case. (Unfortunately, so we still have a lot to learn.)

We can still exploit some useful information after doing this algorithm. Now, instead of randomly picking the photos in each cluster, I choose 20 photos near the center of each cluster, as close as center, the higher the reliability. Take a look at the screenshot below:

- Apply K-means clustering to the test set of mnist database sets with K = 10 clusters. Column 1: Centers of clusters. The left column: Each row is 20 points data near the most center of each cluster.
- You can read the data in each row quite similar and identical to the center in the first column on the left. There are a few interesting observations that can draw from here:
- 1. There are two types of handwriting 1, one straight, one cross. And K-means clustering think it’s two different digits. This is easy to understand because K-means clustering is the Unsupervised learning algorithm. If there is human intervention, we can group these two clusters database page into making one.
- 2. Rows of 9, digit 4 and 9 are categorized in the same cluster. The truth is that these two digits are also quite the same. The same thing happened to row 7 with digits 7, 8, 9 is folded into a cluster. With this cluster, we can continue to apply K-means clustering to the thumbnail of that cluster out.

- In clustering there is a commonly used technique is Hierarchical clustering (clustering stratitiers). There are two types of hierachical clustering:

o agglomerative News “go from bottom up”. Initially consider each data point in a different cluster, then the identical cluster pairs are grouped into a larger cluster. Repeat this process until receiving the acceptable results.

O Divisive news “go from top down”. Initially consider all the data points belonging to the same cluster, then shred each cluster with a clustering algorithm.

- Object Segmentation (separating objects in the picture)

Put the problem

We try to apply K-means clustering algorithms to another photo processing paper: Separation of objects.

Assuming we have the picture below and want an algorithm to recognize the face area and split it out.

Up the idea

(again assuming that we don’t know anything other than K-means clustering, let’s stop a few seconds to think how we can handle it. Hint: There are three mainstream colors in the photograph.)

Ok, there are three colors, three clusters database page!

Photographs with three mainstream colors: pink in towels and lips; Black in eyes, hair, and background; Skin color in the rest of the face. So we can apply the K-means clustering algorithm to segment the image pixel into 3 clusters database page, then select the cluster containing the face section (this part is the human do).

This is a color photograph, each pixel will be performed by 3 values corresponding to red, green, and Blue (each of which is also a natural number that does not exceed 255). If we treat each data point as a 3-dimensional vector that contains these values, then apply K-means clustering algorithms, we can have the desired results. Try to see

Work on Python

Declare library and load pics:

** **

**import** matplotlib.image **as** mpimg

**import** matplotlib.pyplot **as** plt

**import** numpy **as** np

**from** sklearn.cluster **import** KMeans

img **=** mpimg**.**imread(‘girl3.jpg’)

plt**.**imshow(img)

imgplot **=** plt**.**imshow(img)

plt**.**axis(‘off’)

plt**.**show()

**Transform photos into 1 matrix where each row is 1 pixel with 3 color values**

X **=** img**.**reshape((img**.**shape[0]*****img**.**shape[1], img**.**shape[2]))

(The rest of source code can be viewed here).

After finding the cluster, I rather value the per pixel by center of the cluster that contains it, and the results are as follows:

Three colors: pink, black, and skin color were grouped. And faces can be separated from the skin color (and region within it). So is K-means clustering the creation of an acceptable outcome.

- Image compression (compress images and compress data in general)

To see that each pixel can pick up one of 2563 = 16, 777, 2162563 = 16,777,216 (16 million colors that we still hear when the screen ads). These are some great (equivalent to 24 bits for a pixel). If we want to save each pixel with a smaller bit of bits and accept data loss at some point, is there no way if we only know K-means clustering?

The answer is yes. In the Segmentation above algorithm, we have 3 clusters database page, and each pixel after the handle will be performed by a number corresponding to a cluster. However, the quality of the photograph clearly fell away much. I do a small experiment with the number of clusters database page to be increased to 5, 10, 15, 20. After finding the centers for each cluster, I replace the value of a pixel by the value of the respective center: