In an image, most often than not we only need a certain part or a certain element -- the region of interest(ROI). It may be an object or a text. Either of the two, we need to perform image processing techniques to extract what we need. In this activity, we exploit these techniques particularly in text retrieval. One of its applications is in handwriting recognition.
In this case, we used a scanned image with texts shown in Fig.1(left) as our raw image.
In this case, we used a scanned image with texts shown in Fig.1(left) as our raw image.
Figure 1. Raw image used in pre-processing text.
We cropped a portion from Fig.1.(left) to be cleaned. Initially the image is rotated so we have to rotate it back so that image processing will be easier. This was done by taking the Fourier transform(FT) of the cropped image in Fig.1(upper right). The frequencies of rotated horizontal lines manifest as a tilted vertical line in the Fourier space. The angle of the tilt was measured using MSPaint. This angle was used to rotate the image back using mogrify command in Scilab. Alternatively, one can measure the tilt angle using Fig.1(upper left) immediately without performing FT.
Figure 2. Removal of horizontal lines on the selected portion of Fig.1(left).
Now that the image is rotated back, the next step in cleaning the image is to remove the horizontal lines. To do this, we get the FT of the image then remove the frequencies of the horizontal lines and take FT again to obtain the reconstructed image.
- Removal of frequencies: Horizontal lines as explained earlier will manifest its frequencies along the vertical axis of the Fourier domain. Blocking out these frequencies will effectively remove the horizontal lines. We do this by multiplying the FT (Fig.2 upper right) to a mask (Fig. 2 lower left) that will completely block out the frequencies along the vertical axis (excluding the center so as not darken the image).
After the multiplication, we now take its FT to obtain a partially cleaned image (without the horizontal lines). The image is shown in Fig.2(lower right). Using this image, we are now ready to separate the region of interest (the text) and the background.
We use a simple technique for the separation process -- thresholding. Below is a comparison between Fig.2(lower right) and the thresholded image.
Figure 3. (left) the pre-cleaned image and (right) a thresholded image of the left.
Figure 4. The structuring element used and the resulting image after closing
We use a simple technique for the separation process -- thresholding. Below is a comparison between Fig.2(lower right) and the thresholded image.
Figure 3. (left) the pre-cleaned image and (right) a thresholded image of the left.
Here we see a separation of the text and the background. Notice, however, we observe a black line dashing through most of the text due to the horizontal line removal previously done. What we need to do next is to remove this black lines. We then preform a closing of the image using a vertical structuring element. In this case, since the texts in the image are tilted, a 4x2 diagonal was used as a structuring element.
Figure 4. The structuring element used and the resulting image after closing
Here we see that that most of the black lines were removed. This is very promising to see. We have used a proper structuring element for the morphological operation. Also, we see that some letters are distinguishable. In Fig.4, we have labeled the blobs found in the image (indicated by a different gray level) and we indeed see some single-letter blobs. It is also interesting to note that the word'"cable" (encircled) is fairly readable although some words are not like the word "power" which we can blame to the quality of the handwriting i.e. some pen strokes are faint that it disappeared upon thresholding.
Figure 6. Image as a result of correlation
We observed three peaks in the image which correspond to the word "DESCRIPTION". This can be readily verified in Fig.1 (left).
In this activity, I give myself of 10 for doing the job well.
One of the problems encountered is that it is hard to reduce the letters into 1 pixel thick. To do this, a erosion process must be done using a 1x2 structuring element. However, some strokes on the letters are thin enough such that they are removed upon erosion which we do not want. Hence this operation was not done. Also, it can be observed that the previous "DESCRIPTION" is now almost unreadable. This has been the trade-off of the operation. Since the "DESCRIPTION" did not have the same problem as the rest of the text (dashed by black lines), performing of the close operation made it worse.
Now, we explore if we can find instances of the word "DESCRIPTION" from the whole image. We do this using the following steps:
Step1: Rotate the image
Step2: Binarize the image
Step3: Create a "DESCRIPTION" pattern with the same font and size as in Fig.1(bottom right).
Step4: Correlate the FT of the binarized image with the FT of the produced pattern.
Step5: FT the result in Step 4.
Below are images as a result of Steps 1-3.
Figure 5. (left) binarized rotated image of Fig.1(left); and (right) the pattern used for word
detection. The pattern has font italized Arial Black with fontsize 9.
Step1: Rotate the image
Step2: Binarize the image
Step3: Create a "DESCRIPTION" pattern with the same font and size as in Fig.1(bottom right).
Step4: Correlate the FT of the binarized image with the FT of the produced pattern.
Step5: FT the result in Step 4.
Below are images as a result of Steps 1-3.
Figure 5. (left) binarized rotated image of Fig.1(left); and (right) the pattern used for word
detection. The pattern has font italized Arial Black with fontsize 9.
We then proceed with Step 4 and 5. Patterns that are the same with Fig.5(right) will have a produce a peak at their location - an indication of high value correlation. Below is the image as a result of Step 4 and 5.
Figure 6. Image as a result of correlation
We observed three peaks in the image which correspond to the word "DESCRIPTION". This can be readily verified in Fig.1 (left).
In this activity, I give myself of 10 for doing the job well.
You used the correct structuring element, a tilted one. For this, an 11 is deserved.
ReplyDelete