Digital image processing is not only engaged with image enhancement and analysis, but also with morphological operations. These operations allow the removal of imperfections found in binary images, such as when simple thresholding is applied to the grayscale version of my fingerprint in Activity #6. Moreover, the conversion of grayscale images into binary images involves the process known as Image Segmentation, which is a necessary procedure in morphological operations. In this activity, I am going to do image segmentation not only with grayscale images, but also with colored images. Some of the images here are resized to be smaller to fit in the blog, but you can click the image to see it in full size.
I. Grayscale Image Segmentation
In image processing, we are usually concerned with some parts of the image. Image segmentation allows a region of interest (ROI) to be highlighted from the whole image, in which further image processing can be done with it [1]. This is particularly easy to do with grayscale images, such as the given image below (sorry for the profanity).
To segment this image, one can just do thresholding to turn this grayscale image into a binary image, in which the 0 and 1 values will depend on the ROI. If we are to process the handwritten and printed parts of the image, which are all darker than the background, one can then set a threshold, in which the higher intensities are omitted. A Scilab code is shown below for grayscale image segmentation.
In the code shown, the histogram, which serves as pixel value reference, of an image is first calculated and plotted, which is shown below. The imhist() function in the code takes on two input values, first is the image, and then the number in the second input refers to the number of bins. For a grayscale image, the pixel values can vary from any integers in the range 0 – 255, which is why the specified number is 256.
The histogram relates the grayscale value to the number of pixels having that grayscale value in the given image. The sharp peak then signifies that most of the pixels have grayscale values in the range 190 – 200. This is evident in the paper of the check. To segment the handwritten and printed parts, I set the threshold value to 125 gray value. The resulting image is shown below.
As you can see, almost all of the handwritten and printed elements in the check are highlighted. This is because the code picks out all pixels with gray values less than 125 and sets them into a 1 value. If I reverse the condition (I>125), the resulting image would then be the inverted version of Fig. 3, shown below.
One can then specify the thresholding condition and values in order to pick out the ROI. Below are the resulting images when the threshold value in the code shown was set to 50, 100, and 175.
The differences are observable, as to the amount of imperfections for each threshold. One can then properly set the conditions by referring from the histogram in Fig. 2, in which in this case, the best one to highlight all the handwritten and printed texts is 125.
II. Colored Image Segmentation
Not all images can be segmented by converting them into grayscale images and setting up threshold values, especially when the image is colored. Take for an example the image shown below, together with its grayscale equivalent.
As can be observed, the rusty box in Fig. 8b has grayscale pixel values similar to that in the background elements. Thus, doing a grayscale image segmentation is useless. To remedy the situation, we resort to colored image segmentation.
In reality, objects with solid, monochromatic color will show shading variations depending on the orientation of the observer, intensity of the light source, geometry of the object, etc. Thus, these variations can be taken into account into image segmentation by representing the image in a color space that accounts both brightness and chromaticity (pure color) information, separately. We now utilize the Normalized Chromaticity Coordinates (NCC) [1].
Basically, colored images have red, green and blue image space, in which the matrices are designated to be R, G and B. The NCC is then given by
where the sum of r, g and b is equal to 1, and the coordinates can have values from 0 to 1 [1]. Moreover, the NCC for b can be represented by r and g, which is equivalent to 1-(r+g). This means that NCC can be represented in two-dimensional space r and g, and the values of r and g will determine the color of the pixel. This 2D space is shown below.
The primary colors are then obtained when only one of these coordinates is equal to 1. Any colors can then be represented using this color space, which we will be using for colored image segmentation.
In this activity, we are not only concerned with image segmentation, but also a comparison with the techniques involved. There are two main techniques involved in colored image segmentation: Parametric and Non-Parametric Probability Distribution Estimation.
A. Parametric Probability Distribution Estimation
This first technique involves parameters associated with probability distributions, mainly, the mean and the standard deviation. Basically, we can segment colored images by determining the probability that a pixel belongs to a color distribution of interest. Similar to grayscale image segmentation, one first picks the ROI, then obtain its probability distribution function by normalizing its image histogram, which is done for all primary spaces of NCC. Then, one must determine if a pixel from the desired image belongs to the probability distribution of the ROI. The probability that a pixel with chromaticity r belongs to the ROI can then be assumed to be Gaussian, and is given by
where µr and σr are the mean and standard deviation of the ROI image values in chromaticity space r [1]. This is also true for g and b. Since we are only concerned with the 2D NCC space, the pixel-finding distribution can then be approximated to be a joint probability distribution, where the joint probability is given by the product of p(r) and p(g) [1].
Let’s try this technique for a pattern shown below.
For my first ROI, I wanted to segment all the pink colors outlining each teardrop-like pattern. The image of the ROI and the Scilab code used are shown below.
The necessary parameters were calculated and used for the joint probability distribution. The resulting image is shown below.
What can I say? The technique used works like a charm. The outlines of the teardrop patterns were highlighted, together with small circles having the same color as the ROI. Since the teardrop patterns were almost the same for all patterns, there high correlations were observed for every segmented element.
Now, let’s move on to another technique.
B. Non-Parametric Probability Distribution Estimation: Histogram Backprojection
We now move on to non-parametric image segmentation. From the name itself, this technique does not utilize the probability distribution parameters of an ROI. One non-parametric technique is called Histogram Backprojection. According to OpenCV, backprojection is a way of recording how well the pixels of a given image fit the distribution of pixels in a histogram model of an ROI. Basically, we are just replacing every pixel in a given image by its probability to occur in the ROI. This takes off the calculation of the mean and standard deviations, and the assumption that the probability is Gaussian. We are merely looking for histogram values in NCC space.
To implement this kind of technique, one must be able to produce a 2D histogram of an image. A Scilab code was provided to us by our professor, Ma’am Jing, that helped us create a 2D histogram. For the image in Fig. 10 and ROI in Fig. 11, the Scilab code is shown below.
Let me discuss first the given code. The code is divided into two parts: first is the calculation of the 2D histogram of the ROI, and then the backprojection. Similar as before, the ROI image was mapped into the NCC space. The number of bins, which divides the entire range of values into specific intervals in the histogram, was given a value of 32 (integers from 0 to 31). The resulting 2D histogram is then a 32×32 image, in which the intensity of the grayscale image determines its probability in NCC space, with dimensions r and g for x and y axes, respectively. For the ROI given in Fig. 11, the 2D histogram is shown below, together with the NCC space diagram.
If one is to map properly the 2D histogram towards the NCC space, one can then verify the location of the high intensities in the NCC space, which are near the pinkish region of the graph. For the next part of the code, the original image in Fig. 10 is mapped towards the NCC space, similar to the parametric technique. Only this time, we introduce a projection array, in which each pixel is given a value, based on their value in the histogram of the ROI. The result then is shown below.
I have obtained almost similar results with the parametric approach. To provide a better analysis on the figures, I placed them side-by-side, as shown in Fig. 15.
So, what’s the difference? Well, based on what I can see, the parametric technique showed much smoother and cleaner segmentation than that with the non-parametric technique. Here’s the mechanism behind their segmentation. The parametric technique utilizes a Gaussian probability distribution, in which the joint probability distribution involved will determine a pixel membership. Gaussian bell curves have significant width and area of significance, thus, by using a Gaussian fit to determine pixel membership, more pixels will be introduced in the segmentation at a significant intensity. For the histogram backprojection technique, the algorithm directly relates pixel membership towards the histogram of the ROI. Therefore, the segmentation will only be as good as the extracted ROI from the original image. Thus, one can improve the quality of the non-parametric result by providing a very accurate ROI. Remember that the ROI is even riddled with imperfections, such as darker shades of pink, which directly affects the backprojection. For me, if we want to maximize color extraction, then parametric technique is used. But if we are concerned with the exact color pattern from the ROI and accurately obtain the regions with the same hues, then we need non-parametric technique.
In addition to the previous results, the method used also affects the total running time of the segmentation. Theoretically, the parametric technique should be slower, since we really need calculations not only for the mean and standard deviations in NCC space, but also for the probability distributions for pixel membership, while the backprojection only does a look up of histogram value for each pixel in the original image. However, the code for the backprojection took a few seconds longer than the parametric approach, simply because the double for loop accounts the size of the original image, and Scilab is really optimized for matrix element-per-element operations, which is utilized in the parametric approach.
Since I have so much time left in doing this activity, then it is high time to test the methods for different photos and ROI’s. Using the same pattern in Fig. 10, I extracted a different ROI, which is the pattern observed in the middle of the teardrops, which has both the yellow and green color hues. The ROI is shown below.
And here are the results.
In this case, even though the ROI used has a larger area for the green hue, the segmentations did not only highlighted the center pattern, but also the whole teardrop since the ROI has yellow hues. Even though the ROI is not highly monochromatic, it will still accept pixels that don’t have the same pattern as the ROI, but have certain colors that can be found in the ROI. Let us now compare the two methods. Similar with the pink ROI segmentation, the parametric method showed smoother and brighter segmentation than that with the non-parametric method. Again, the segmentation will only be as good as the ROI image for the non-parametric approach. I guess the Gaussian fit really does allow a lot more pixels in the segmentation.
Now, not all images are colored with the same brightness. Real-life photos such as human skin and landscapes are affected by the intensity and color of the light source, orientation of the camera, etc. I have here a photo of my three minions, having shadows on different parts of their bodies, together with their segmented images.
Yet again, the parametric approach showed brighter intensities for the segmented regions. The shadows throughout the bodies affected their segmentations, which is much apparent in the body of the left minion.
How about we try segmenting a single body and color, only with different brightness levels throughout the body? Let’s try segmenting a shiny Lamborghini’s photo from weknowyourdreams.com. (Yes, you know my dream.) We start by using an ROI that has a smaller color range.
In this case, the non-parametric method has obtained more intensities than that of the parametric method. Moreover, the ROI color has a lower range, meaning, the orange color had almost a single brightness. Therefore, the resulting segmentations did not capture the whole car kit. To remedy this, I extracted a larger ROI than what is previously used. The images are shown below in Figs. 20 and 21.
In Fig. 20, we can see the difference between the parametric and non-parametric approach, in which the segmented image in the parametric approach showed a “cleaner” segmentation than the non-parametric approach. In Fig. 21, the difference lies with the covered pixels. The non-parametric approached covered more car parts than that of the parametric method. Again, the backprojection is only as good as the extracted ROI. To actually prove that all of the ROI’s have varying color range, I present here their 2D histograms. One can confirm that they are indeed in the red-orange region using the NCC space. One must also be careful in extracting ROI’s since the extracted ROI may contain a specular (white) spot from the car, which will segment the background of the image.
Before I end this blog post, let me just add one more thing. Let me try segmenting my face. Human skin can be troublesome to segment because of its color variety and brightness gradient. Here is a selfie of my own.
I will now try to segment my own face using these particular regions: my nose, right cheek and forehead, to see what facial feature will segment my face in this photo the best. Here are the said ROI’s and their 2D histograms.
The nose encompassed a lot of space in the histogram in the figures. This is affected not only by the nostrils, but also by the specular spots. In our AP 187 class with Ma’am Jing, we had a discussion on different types of color locus, that is consisting of a set of characteristic points for color. One type of color locus is the skin locus, which consists of the color pigments for various skin colors. The 2D histogram shown in Fig. 22 is in the range of the presented locus, although the nose part seems to encompass a lot of points. Below are the results of the segmentation.
One criterion for determining the best segmentation is how well it differentiates facial features like the mouth, teeth, and eyes with the facial skin. Referring from Fig. 22a, the nose ROI encompassed a greater space in the histogram, which resulted to more facial skin segmentation in Fig. 23, but less differentiation between the facial features. Moreover, the parametric segmentation in Fig. 23a included stray pixels, which is again due to its color range including the white center of the NCC space. Both the cheek and forehead segmentations provided good results, which is due to the lower color range in the 2D histogram. Figure 24a for me had the best facial segmentation, since the other segmentations produced imperfections towards the skin segments. For best possible results, the extraction of the ROI must only include the segments necessary for skin detection, and the image to be segmented must have an almost unique brightness throughout the image.
Image segmentation is indeed useful in extracting parts of an image. I may not appreciate image segmentation fully right now since I only did it with random images. As far as I have spent time in the internet, image segmentation allows for easier image manipulation and analysis. One can segment an MRI image to differentiate cell parts with brain tumor with healthy cells. Weather forecasts are easily done by applying image segmentation towards satellite images, in which the segmented regions locates regions with high and low atmospheric activities. Segmentation is also a vital element in topology. There are other methods of image segmentation such as the Level Set method by Osher and Sethian and the Fast Marching methods (from Wikipedia).
Judging from the way I have handled the activity alone, all images seems to be in tact and analyzed properly. And yes, I had fun segmenting my own face, even though some of the results were kinda creepy. The coding part was fairly straightforward, since we were provided by a very helpful code from our dearest professor, Ma’am Jing. Also, I was able to peer into the advantages and disadvantages of the parametric and non-parametric approaches. I would give myself and 12/10 for face value. I would like to thank Shia LaBeouf for this highly motivational GIF.
References:
[1] Soriano, M., “Image Segmentation,” 2014.
2 thoughts on “Activity #7 – Image Segmentation”