Producing a Cross Stitch Pattern using K-Means

Introduction

This documentation or vignette will cover how to make a cross stitch pattern from an image. The image I will use is the Marilyn Diptych which is a silkscreen painting by American artist Andy Warhol depicting Marilyn Monroe. In order to make a realistic cross stitch, there needs to be some adjustment to the image. I will use k means clustering to simplify the image from too many colors and I will also tune down the resolution using a function called change_resolution. There will also be trouble in displaying the correct color to stitch since not every color in the image will have an embroidery floss color. Also, since we are simplifying the image as much as possible, you may wonder how many color threads are actually needed to stitch the pattern. This will be answered with k-means clustering along with the scree-plot and color strips. By the end of this documentation, an actual cross stitch pattern will be created along with the thread colors needed to stitch it. There are 4 functions I go over in this documentation which are provided in the resources panel. A few packages are required to run the functions, they are:

imager -> to load the image and its RGB values
tidyverse & tidymodels -> to do some data wrangling
sp -> to change the resolution of the image
scales -> to conveniently show different colors
cowplot -> to plot the cross stitch pattern
dmc -> to find the nearest DMC color so the pattern can actually be stitched with DMC thread floss (you need to download this using ‘devtools::install_github(“sharlagelfand/dmc”)’)

Let’s first load the required packages.

library(imager)
library(tidyverse)
library(tidymodels)
library(sp)
library(scales)
library(cowplot)
library(dmc)

Exploring the Image and Explaining the Output of process_image

Before demonstrating how to use the functions to make a cross stitch pattern of the image, let’s actually see the image to get a sense of what the colors look like. This will help us choose the right combination of colors to cross stitch and notice if anything weird happens later on.

As you can see, the dark blue color is everywhere, there is a small amount of it everywhere. It will not be convenient to cross stitch this color so we need to use k-means clustering to simplify the image. To do the k-means clustering, we first have to project the image into RGB color space. We need to quantify this image by some coordinate points to do clustering on. Before doing k-means clustering, we also need to choose the number of cluster centers (k) which in this context is how many colors will provide the best result for our cross stitch. After knowing k, k-means clustering will basically group the (R,G,B) points into k “best” clusters and from there we can classify each cluster with their nearest DMC color. Then we can plot the pattern according to the clusters and their associated DMC color.

However, the trouble is choosing k, the number of cluster centers we do k-means with that will provide the best cross stitch pattern. The solution is to experiment with different values of k and we do this by using the function process_image(). process_image() will take in a list of k, say (2,3,4,5,6,7,8,9). Then do k-means clustering for k=2, k=3,…,k=9 and summarize this with a tibble of 8 rows. The first row will be information for the k-means clustering with k=2, second row will be information for the k-means clustering with k=3 and so on to k=9. After extracting information for different k values, we can plot the scree plot to get a sense of what k should be. Output of process_image() is essential since it is the input for making the scree plot, color strips, and to plot the pattern later on.

Let’s use the function process_image() to get the clustering info for k = c(2:8) :

set.seed(123)
cluster_info <- process_image("Marilyn_Monroe.jpg", c(2:9))
cluster_info

Now that we have information for the clustering of different K’s, I will choose k = 6 for the purpose of explaining what each column means.

Note: I am not choosing k = 6 as the final number of cluster centers, I am just using it for explaining purposes, I could’ve chosen any other k to explain but I like the number 6.

First, the “tidied” column for k = 6:

cluster_info_6 <- filter(cluster_info, k == 6)
cluster_info_6[1,3][[1]][[1]]

Recall K-means clustering with k = 6 will output 6 cluster centers and give each point a cluster label. The tidied info above is a way to summarize each cluster. It shows the cluster center coordinate, the color of the cluster center, and the nearest DMC thread color of the cluster center color. This output means that our (R,G,B) points are grouped into 6 clusters and each cluster has a DMC color associated with it. When we plot the (x,y) points associated with the (R,G,B) points later on, we will give it a color. The color we give it will be the DMC color of the cluster center it is associated with and the DMC color is given by this “tidied” column.

Now, the “augmented” column for k = 6:

glimpse(cluster_info_6[1,5][[1]][[1]])

## Rows: 1,050,624
## Columns: 6
## $ x        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
## $ y        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ R        <dbl> 0.9725490, 0.9882353, 0.9960784, 0.9921569, 0.9921569, 1.0000…
## $ G        <dbl> 0.4156863, 0.4313725, 0.4392157, 0.4352941, 0.4352941, 0.4470…
## $ B        <dbl> 0.5647059, 0.5803922, 0.5882353, 0.5843137, 0.5843137, 0.5960…
## $ .cluster <fct> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…

The augmented column will give the cluster label of each point. This will be important to plot the actual pattern because after we know (x,y) and its cluster label, we know where to plot and what color to give it. This output basically gives information for each point in the clustering.

Finally, the “glanced” column for k = 6:

cluster_info_6[1,4][[1]][[1]]

This is information for the clustering as a whole. The total within sum of squares in this case is 5453.807 and it is basically the sum of the distance between each point to its cluster center. This gives us an idea of how much error we are making when we associate a point to a cluster and give the point the color of its cluster center. The higher k is, the more clusters we have and so the points are closer to its cluster center and there is less error. But at some point, increasing k will not decrease the error as much. Glanced column basically provides information for us to make the scree plot.

Making the Scree Plot

Now that I have explained the output of process_image(), we need to use that output to plot the scree plot to get a sense of what k is. A scree plot has the error amount as the y axis and the value of k as the x axis. We can observe the error/total withinss vs the number of clusters (k) from the scree plot to get a sense of what k should be.

We can plot the scree plot using cluster_info/output of process_image as input. We will be able to see the error stop decreasing as much when we increase k. This will give us an idea of what range k should be. The function scree_plot() will plot the scree plot.

scree_plot(cluster_info)

We can see from the scree plot that the error stops decreasing as much after k = 5. If I were to choose three k values, I would choose k = 5,6,7 as my possible candidates for the optimal number of clusters. This is because I don’t want to choose too many colors for the cross stitch and at the same time it should be accurate of the image. And so I need to test out multiple values of k. The scree plot alone does not tell me the optimal value of k since it is just a mathematical definition. But it does tell me k greater than 5 could be a good choice since the error stops decreasing as fast after that. We can also tell from the image of Marilyn Monroe that there are at least 5 colors. We definitely do not want below 5 clusters because we would miss a color.

So moving on, I will filter out cluster_info and look specifically at k = 5,6,7 and eliminate two more numbers and finally arrive at the optimal (hopefully) value of k.

update_cluster_info <- filter(cluster_info, k==c(5,6,7))
update_cluster_info

To arrive at one value of k, we will need to look at the color strips which shows the nearest DMC color associated with each cluster. The nearest DMC color associated with each cluster is information that is available from the “tidied” column of cluster_info I previously talked about. The nearest DMC color associated with each cluster is the color thread used to stitch the pattern, so taking a closer look at it will allow us to make rational decisions about how many colors to use. It is important to look at the color strips because the scree plot does not know what color and how many is good for our cross stitch pattern. The scree plot displays a narrow amount of knowledge that is required to determine how many colors. Looking at the color strips will allow us to choose the number of colors that will create a simple and good cross stitch. We actually know what the image looks like so we will know if the combination of colors make sense.

The function color_strips will create the color strips.

color_strips(update_cluster_info)

From the color strip with 7 colors, we see a purplish color which does not appear in the image. It might be the background mixed with some other color that was done in the clustering. This extra color may be inaccurate of the image and is extra so we rule out k = 7.

For k = 5, there appears to be no yellow color which is the lips and we would want to include the yellow color if possible. The colors are all very distinct in the image and k = 6 appears to provide 6 distinct colors so it may be a good choice. k = 6 also has the yellow color that is missing in k = 5.

In conclusion, 6 clusters would probably be a good choice since it provides correct distinct colors just like the image and it provides the most important colors. To make a cross stitch pattern that correctly portrays the image and is not too complicated, lets go with k = 6.

What I concluded above can not be obtained from the scree plot alone which is why it is important to look at the color strips. Choosing k is definitely hard because it cannot simply be computed and we must look at the whole scenario and our goal to make a good judgment.

Usage of make_pattern

The output from process_image along with our final choice of k = 6 will allow us to plot the cross stitch pattern. The function that plots the pattern is make_pattern(). This function will actually reduce the resolution of the image to make a even better cross stitch. It also has a few useful arguments allowing you to plot the pattern in black/white, ignore the background color, and choose the approximate number of stitches in the horizontal direction.

The approximate number of stitches in the horizontal direction should be a number that will plot the pattern accurately and at the same time produce a doable cross stitch.

Let’s first plot the pattern assuming we include the background color and we don’t want black/white. Also assume the number of horizontal stitches is around 55.

Since by default, black_white is set to false and background_color is set to null, we don’t have to add those in the arguments.

Lastly, the number 6 is included in the argument to tell the function we are making a pattern using k = 6 clusters.

make_pattern(cluster_info, 6, 55)

What if you want the cross stitch pattern to be in black/white? You simply add the argument black_white = TRUE.

make_pattern(cluster_info, 6, 55, black_white = TRUE)

What if you bought a fabric that is the background color of the cross stitch and don’t want to include the background color in the cross stitch pattern? You would simply use the background_color = "hex_code" argument.

For our image, the background color in hex code is “#FF5773”. It is the pink color seen in the first cross stitch output. So we use the argument background_color = "#FF5773" to not include this background color.

Here, I use the number of horizontal stitches as 60 since we are removing the background leaving us with less points.

make_pattern(cluster_info, 6, 60, background_colour = "#FF5773")

The last possible combination is to have black/white and remove the background.

make_pattern(cluster_info, 6, 60, black_white = TRUE, background_colour = "#FF5773")

Producing a Cross Stitch Pattern using K-Means

Anson Lai

Introduction

Exploring the Image and Explaining the Output of process_image

Making the Scree Plot

Usage of make_pattern