Monday, April 9, 2007

Solving Set

Our latest project was to create an image processing program which could play the game Set (rules here). The hardest part of this assignment was recognizing the cards and their four attributes: color, number of figures, texture and shape. We did this in the following way.

Color
Since the images of the cards were all taken under imperfect lighting, we were forced to determine color very carefully. We ended up using the RGB value of each pixel in the image, and if one of those three colors was significantly high, and all the rest were low, then we colored it that dominant color. This worked well enough that we could take the entire image once this was done and whichever color was the most popular, that was the color of that Set card, and have accurate results.

Number of Figures
To determine the number of figures on a card, we took the leftmost pixel which was not a part of the background, and the rightmost pixel which was not background. We then looked at the difference of these two points, and if they were close, we knew there was only one figure, if they were very far, there were three figures, and if they were in the middle there was one. It seems simple, but the approach worked very well and was quite easy to code!

Texture
Texture was a bit of a strange problem. We identified solid figures first. To do this, we just calculated the average number of colored pixels per figure. For the figures which were solid, this number was much higher than non-solid figures, so we could easily identify solid shapes. To identify the shaded figures, we discovered that the shaded sections of the images had distinctively lower values for all three colors than unshaded sections. So we colored these yellow in our processed image to set them apart.

We counted up all the yellow pixels, and if this number was sufficiently high, we declared that figure to be shaded. Finally, if a figure was neither solid nor shaded, we declared it hollow.

Shape
The hardest problem was determining shape. We finally decided to measure the horizontal distance across the shape at a given height to determine the shape. We calculated this height by first finding the highest colored pixel in the image. Then we looked 30 pixels below that to measure the width of the figure. This allowed us to keep a consistent measurement even when the images were centered differently from each other. The diamonds had the smallest width, then came the squiggles, then finally the capsules were the largest. We had to allow some special cases to measure 3 figures correctly, as well as to measure the different shapes accurately, but this method turned out to be extremely accurate, with a very small average difference in width between images with the same shape, even when they had different numbers of figures, and/or had different textures.

Conclusions
Our solutions ended up being significantly uglier and much more hackish than we had anticipated, but we still found value in the process. In an ideal world, the processes of opening and closing could have been used to determine texture, and some sort of shape evaluations could have been used for finding shape, but these results were simpler, and at least in the case of shape, probably much more accurate than what we would have achieved otherwise.

Finally, our image processing is dependent on the figures being centered in the image, or very close. This was an unfortunate limitation we were not able to avoid, and seems like it would be a natural problem for any vision based system. The software is only as accurate as the pictures it is processing.

No comments: