Basic OCR in OpenCV

In this tutorial we go to create a basic number OCR. It consist to classify a handwrite number into his class.

To do it, we go to use all we learn in before tutorials, we go to use a simple basic painter and the basic pattern recognition and classification with openCV tutorial.

In a typical pattern recognition classifier consist in three modules:

Preprocessing: in this module we go to process our input image, for example size normalize, convert color to BN…

Feature extraction: in this module we convert our image processed to a characteristic vector of features to classify, it can be the pixels matrix convert to vector or get contour chain codes data representation

Classification module get the feature vectors and train our system or classify an input feature vector with a classify method as knn.

In this basic OCR we go to use this graph:

Where we get a train set and test set of image to train and test our classifier method (knn)

We have a 1000 handwrite images, 100 images of each number. We get 50 images of each number (class) to train and other 50 to test our system.

Then the first work we do is pre-process all train image, to do it we create a preprocessing function. In this function we get a image and a new width and height we want as result of preprocessing, then the function return a normalized size with bounding box image. You can see more clear the process in this graph:

Pre-processing code:

void findX(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->width; i++){
cvGetCol(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max= i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}
 
void findY(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->height; i++){
cvGetRow(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max=i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}
CvRect findBB(IplImage* imgSrc){
CvRect aux;
int xmin, xmax, ymin, ymax;
xmin=xmax=ymin=ymax=0;
 
findX(imgSrc, &xmin, &xmax);
findY(imgSrc, &ymin, &ymax);
 
aux=cvRect(xmin, ymin, xmax-xmin, ymax-ymin);
 
//printf("BB: %d,%d - %d,%d\n", aux.x, aux.y, aux.width, aux.height);
 
return aux;
 
}
 
IplImage preprocessing(IplImage* imgSrc,int new_width, int new_height){
IplImage* result;
IplImage* scaledResult;
 
CvMat data;
CvMat dataA;
CvRect bb;//bounding box
CvRect bba;//boundinb box maintain aspect ratio
 
//Find bounding box
bb=findBB(imgSrc);
 
//Get bounding box data and no with aspect ratio, the x and y can be corrupted
cvGetSubRect(imgSrc, &data, cvRect(bb.x, bb.y, bb.width, bb.height));
//Create image with this data with width and height with aspect ratio 1
//then we get highest size betwen width and height of our bounding box
int size=(bb.width>bb.height)?bb.width:bb.height;
result=cvCreateImage( cvSize( size, size ), 8, 1 );
cvSet(result,CV_RGB(255,255,255),NULL);
//Copy de data in center of image
int x=(int)floor((float)(size-bb.width)/2.0f);
int y=(int)floor((float)(size-bb.height)/2.0f);
cvGetSubRect(result, &dataA, cvRect(x,y,bb.width, bb.height));
cvCopy(&data, &dataA, NULL);
//Scale result
scaledResult=cvCreateImage( cvSize( new_width, new_height ), 8, 1 );
cvResize(result, scaledResult, CV_INTER_NN);
 
//Return processed data
return *scaledResult;
 
}

We use the function getData of basicOCR class to create the train data and train classes, this function get all images under OCR folder to create this train data, the OCR forlder is structured with 1 folder to each class and each file have are pbm files with this name cnn.pbm where c is the class {0..9} and nn is the number of image {00..99}

Each image we get is pre-processed and then convert the data in a feature vector we use.

basicOCR.cpp getData code:

void basicOCR::getData()
{
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
for(i =0; i<classes; i++){
for( j = 0; j< train_samples; j++){
 
//Load file
if(j<10)
sprintf(file,"%s%d/%d0%d.pbm",file_path, i, i , j);
else
sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);
 
//Set class label
cvGetRow(trainClasses, &row, i*train_samples + j);
cvSet(&row, cvRealScalar(i));
//Set data
cvGetRow(trainData, &row, i*train_samples + j);
 
IplImage* img = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
//convert 8 bits image to 32 float image
cvConvertScale(&prs_image, img, 0.0039215, 0);
 
cvGetSubRect(img, &data, cvRect(0,0, size,size));
 
CvMat row_header, *row1;
//convert data matrix sizexsize to vecor
row1 = cvReshape( &data, &row_header, 0, 1 );
cvCopy(row1, &row, NULL);
}
}
}

After processed and get train data and classes whe then train our model with this data, in our sample we use knn method then:

knn=new CvKNearest( trainData, trainClasses, 0, false, K );

Then we now can test our model, and we can use the test result to compare to another methods we can use, or if we reduce the image scale or similar. There are a function to create the test in our basicOCR class, test function.

This function get the other 500 samples and classify this in our selected method and check the obtained result.

void basicOCR::test(){
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
int error=0;
int testCount=0;
for(i =0; i<classes; i++){
for( j = 50; j< 50+train_samples; j++){
 
sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);
float r=classify(&prs_image,0);
if((int)r!=i)
error++;
 
testCount++;
}
}
float totalerror=100*(float)error/(float)testCount;
printf("System Error: %.2f%%\n", totalerror);
 
}

Test use the classify function that get image to classify, process image, get feature vector and classify it with a find_nearest of knn class. This function we use to classify the input user images:

float basicOCR::classify(IplImage* img, int showResult)
{
IplImage prs_image;
CvMat data;
CvMat* nearest=cvCreateMat(1,K,CV_32FC1);
float result;
//process file
prs_image = preprocessing(img, size, size);
 
//Set data
IplImage* img32 = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
cvConvertScale(&prs_image, img32, 0.0039215, 0);
cvGetSubRect(img32, &data, cvRect(0,0, size,size));
CvMat row_header, *row1;
row1 = cvReshape( &data, &row_header, 0, 1 );
 
result=knn->find_nearest(row1,K,0,0,nearest,0);
 
int accuracy=0;
for(int i=0;i<K;i++){
if( nearest->data.fl[i] == result)
accuracy++;
}
float pre=100*((float)accuracy/(float)K);
if(showResult==1){
printf("|\t%.0f\t| \t%.2f%%  \t| \t%d of %d \t| \n",result,pre,accuracy,K);
printf(" ---------------------------------------------------------------\n");
}
 
return result;
 
}

All work or training and test is in basicOCR class, when we create a basicOCR instance then only we need call to classify function to classify our input image. Then we go to use basic Painter we create before in other tutorial to user interactivity to draw a image and classify it.

Demo Source

122 Comments to “Basic OCR in OpenCV”

  1. Tõnu Samuel 2 November 2009 at 10:13 pm #

    You can replace four lines by one:
    ———————————-
    if(j<10)
    sprintf(file,”%s%d/%d0%d.pbm”,file_path, i, i , j);
    else
    sprintf(file,”%s%d/%d%d.pbm”,file_path, i, i , j);

    ———————————-
    sprintf(file,”%s%d/%d%02d.pbm”,file_path, i, i , j);
    ———————————-
    This is basic C know-how :P

  2. damiles 3 November 2009 at 9:45 am #

    Thanks Tönu XD

  3. Qazi 3 November 2009 at 1:04 pm #

    Hello damiles,

    you are a good teacher for my favorite computer vision library :)

    I am looking for a segmentation solution in opencv esp. for handwritten letters (described in http://yann.lecun.com/exdb/publis/#lecun-98 – e.g. Heuristic Over Segmentation). As I still am a beginner with c/c++ and also not very familiar with complex mathematical formulars, I’d like to know, if you know about any code example with this method – or if you could give some helpfull hints/starting points.
    Any help would be great!

    Thank you,

    Qazi

  4. Arnaud 17 November 2009 at 2:06 am #

    Hey,

    Thanks for your article.
    However, I’m quite disappointed with this very line :
    cvCopy(row1, &row, NULL);
    which basically means that you train the classifier with the *hole image*.
    How long did the knn classifier take for those 50 samples of 40² pixels ?

    More importantly : which “high level” criterion would you recommend (area, spline contour parameters, moments…) ?

    thanks!

  5. damiles 17 November 2009 at 10:23 am #

    Arnaud, you are the first who analize the code and give me good questions, and doubts or disagree, THANKS!!

    First, for train classifier i use all image pixels, I know it’s no the best criterion to use, but i use it for explain the basic of pattern recognition in ocr.

    Then in cvCopy(row1, &row, NULL); i copy all pixels of each image in a row for clasifiy, first i get a row of my matrix for training “trainData”

    cvGetRow(trainData, &row, i*train_samples + j);

    then when i have the pixels in a array (row1) copy it in a row, a pointer of traindata row.

    cvCopy(row1, &row, NULL);

    For knn classifier i take 40² samples. The 50 number i think you extract from function void basicOCR::test() in line 10 “for( j = 50;…” , in this fucntion i classify 50 random numbers to get a error, this function is only statistic of my classifier, where i can test the knn with 50 random numbers and get the error obtained.

    And of course, the more important, the criterion, i recomended other criterion, no pixels, contour or moments, for select criterion for this case you must test all, moments, pixels, spline, area, eigvals… and select the best, and test diferents size of image and all sets you can think.

    I expect respond all your questions. I’m not a expert.

    Regards, David.

  6. karta 18 November 2009 at 11:48 am #

    Hi
    first of all thanks for this wonderful tutorial.
    I am doing a project that needs to extract a string of numbers from an image. I can get an image where it only contains numbers.
    My problem is how can i segment the image so i can get each number and test it against the sample?

  7. damiles 18 November 2009 at 1:32 pm #

    You can use cvBlobd linbrary http://opencv.willowgarage.com/wiki/cvBlobsLib

    Or user any segmentation type or use countours or similar.

    Regards.

  8. Sotiris Karavarsamis 19 November 2009 at 12:23 am #

    damiles, thanks for this great tutorial. it was actually very interesting and complete!

  9. eric 20 December 2009 at 11:41 pm #

    Hey…very nice tutorials. I was wondering if you have any tutorial on human action recognition as well…using some feature point descriptors and then training the model to classify the actions….

  10. vasakan 5 January 2010 at 5:12 am #

    Hi,

    i compiled and run your code in vc++. I changed your pbm images into jpg but it did not give the good result if i draw the single horizontal line which is classified as class 4. Is there any code change is needed? please help me

    Thanks in advance
    Manivasakan B

  11. damiles 5 January 2010 at 10:04 am #

    Sorry vasakan, i don’t understand what error you have… Can you explain it more detailed.

    Thanks

  12. vasakan 5 January 2010 at 10:33 am #

    Thank you very much for your reply.

    if i draw the number from 0 to 9 that is classified from the corresponding class but i draw the shape that is not a number(differnet shape ) that shape also classified from the number class which give 100% accuracy.

    how to solve this problem.?

    Thanks
    Manivasakan.

  13. damiles 5 January 2010 at 10:52 am #

    Ah, ok, the system is prepared to get only 10 classes, numbers, then you must only draw numbers, if you draw a no number, then the system get the more approximate number class you draw. If you want no number class you must train with a new class to get the no number objects, but it’s more complex.

    The accuaracy is how many of k-neighbourds are of the same class knn algorithm select as better. Then if you have 5 of 10 neighbourds you have a 50%.

    Solve proble of no class defined, you must work with a diferent learning algorithm as EM that is unsupervised in some cases more flexible.

    Regards.

  14. vasakan 5 January 2010 at 11:11 am #

    Thanks

    I really appreciate your valuable help. currently i am working on pit pattern classification. i want classify pattern from the cancer tumor images now i try to classify the Type III L pit pattern which look like tubular shape. i expect help from you.

    I have basic knowledge in image processing with opencv

    thanks
    Manivasakan .

  15. damiles 5 January 2010 at 11:17 am #

    vasakan, your project is more advanced than this example, but it can help to start. you must have a good database of image properties, and train you classifier with it, and create 2 class, the tumor cancer and no tumor cancer.

    There are a lot of medical papers with this themes, i recomend you search technical papers.

    Regards.

  16. vasakan 5 January 2010 at 11:33 am #

    thanks

    Is KNN classifier good choice for pattern classification. I have a good knowledge about Type III L pit pattern. I like to take contour points for processing. can you give your mail id?. i will send the image to you for your clarification.

    Regards
    Manivasakan.

  17. damiles 5 January 2010 at 11:35 am #

    KNN is good classifier for most task but there are more that you must check

  18. vasakan 5 January 2010 at 11:51 am #

    Thank you very much.

    i have a image 640X480 size and i want to create the template for 128X128 size using contour points in opencv.

    can you suggest me good idea for template creation?

    i have tried to convert the contour points manually from 640X480 to 128X128 but it did not give the good template.

    Regards
    Manivasakan.

  19. damiles 5 January 2010 at 11:59 am #

    First, try use non pixels as parameters, i use it because is the most simpy to understand. I suggest read scientific papers to see what is the best template, and algorithm. I can’t explain correctly what are the best for this computer vision tasks

  20. vasakan 5 January 2010 at 12:08 pm #

    Thank you very much for spending lot of time for me.

    i will read good papers for best template and algorithm and then i will get back you.

    Thanks
    Manivasakan

  21. vasakan 6 January 2010 at 7:31 am #

    Hi,

    Have you seen my images? I want to create the class for that images. Can you give some idea for template creation. I going to create the two classes one for Type III L pit pattern another one for non Type III L.

    Is there any tool for template creation?

    Thanks
    Manivasakan B.

  22. leque 2 February 2010 at 9:33 am #

    Hi!

    Can I use OpenCV in recognizing the font used in a document image? I am a noob in OpenCV. Can you give me some tips to make my project possible?

    Thank you in advance!


Leave a Reply