Basic OCR in OpenCV

Update!. Demo is now with CMake, the cross-platform, open-source build system.

Download Now!

In this tutorial we go to create a basic number OCR. It consist to classify a handwrite number into his class.

To do it, we go to use all we learn in before tutorials, we go to use a simple basic painter and the basic pattern recognition and classification with openCV tutorial.

In a typical pattern recognition classifier consist in three modules:

Preprocessing: in this module we go to process our input image, for example size normalize, convert color to BN…

Feature extraction: in this module we convert our image processed to a characteristic vector of features to classify, it can be the pixels matrix convert to vector or get contour chain codes data representation

Classification module get the feature vectors and train our system or classify an input feature vector with a classify method as knn.

In this basic OCR we go to use this graph:

Where we get a train set and test set of image to train and test our classifier method (knn)

We have a 1000 handwrite images, 100 images of each number. We get 50 images of each number (class) to train and other 50 to test our system.

Then the first work we do is pre-process all train image, to do it we create a preprocessing function. In this function we get a image and a new width and height we want as result of preprocessing, then the function return a normalized size with bounding box image. You can see more clear the process in this graph:

Pre-processing code:

void findX(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->width; i++){
cvGetCol(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max= i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}

void findY(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->height; i++){
cvGetRow(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max=i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}
CvRect findBB(IplImage* imgSrc){
CvRect aux;
int xmin, xmax, ymin, ymax;
xmin=xmax=ymin=ymax=0;

findX(imgSrc, &xmin, &xmax);
findY(imgSrc, &ymin, &ymax);

aux=cvRect(xmin, ymin, xmax-xmin, ymax-ymin);

//printf("BB: %d,%d - %d,%d\n", aux.x, aux.y, aux.width, aux.height);

return aux;

}

IplImage preprocessing(IplImage* imgSrc,int new_width, int new_height){
IplImage* result;
IplImage* scaledResult;

CvMat data;
CvMat dataA;
CvRect bb;//bounding box
CvRect bba;//boundinb box maintain aspect ratio

//Find bounding box
bb=findBB(imgSrc);

//Get bounding box data and no with aspect ratio, the x and y can be corrupted
cvGetSubRect(imgSrc, &data, cvRect(bb.x, bb.y, bb.width, bb.height));
//Create image with this data with width and height with aspect ratio 1
//then we get highest size betwen width and height of our bounding box
int size=(bb.width>bb.height)?bb.width:bb.height;
result=cvCreateImage( cvSize( size, size ), 8, 1 );
cvSet(result,CV_RGB(255,255,255),NULL);
//Copy de data in center of image
int x=(int)floor((float)(size-bb.width)/2.0f);
int y=(int)floor((float)(size-bb.height)/2.0f);
cvGetSubRect(result, &dataA, cvRect(x,y,bb.width, bb.height));
cvCopy(&data, &dataA, NULL);
//Scale result
scaledResult=cvCreateImage( cvSize( new_width, new_height ), 8, 1 );
cvResize(result, scaledResult, CV_INTER_NN);

//Return processed data
return *scaledResult;

}

We use the function getData of basicOCR class to create the train data and train classes, this function get all images under OCR folder to create this train data, the OCR forlder is structured with 1 folder to each class and each file have are pbm files with this name cnn.pbm where c is the class {0..9} and nn is the number of image {00..99}

Each image we get is pre-processed and then convert the data in a feature vector we use.

basicOCR.cpp getData code:

void basicOCR::getData()
{
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
for(i =0; i<classes; i++){
for( j = 0; j< train_samples; j++){

//Load file
if(j<10)
sprintf(file,"%s%d/%d0%d.pbm",file_path, i, i , j);
else
sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);

//Set class label
cvGetRow(trainClasses, &row, i*train_samples + j);
cvSet(&row, cvRealScalar(i));
//Set data
cvGetRow(trainData, &row, i*train_samples + j);

IplImage* img = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
//convert 8 bits image to 32 float image
cvConvertScale(&prs_image, img, 0.0039215, 0);

cvGetSubRect(img, &data, cvRect(0,0, size,size));

CvMat row_header, *row1;
//convert data matrix sizexsize to vecor
row1 = cvReshape( &data, &row_header, 0, 1 );
cvCopy(row1, &row, NULL);
}
}
}

After processed and get train data and classes whe then train our model with this data, in our sample we use knn method then:

knn=new CvKNearest( trainData, trainClasses, 0, false, K );

Then we now can test our model, and we can use the test result to compare to another methods we can use, or if we reduce the image scale or similar. There are a function to create the test in our basicOCR class, test function.

This function get the other 500 samples and classify this in our selected method and check the obtained result.

void basicOCR::test(){
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
int error=0;
int testCount=0;
for(i =0; i<classes; i++){
for( j = 50; j< 50+train_samples; j++){

sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);
float r=classify(&prs_image,0);
if((int)r!=i)
error++;

testCount++;
}
}
float totalerror=100*(float)error/(float)testCount;
printf("System Error: %.2f%%\n", totalerror);

}

Test use the classify function that get image to classify, process image, get feature vector and classify it with a find_nearest of knn class. This function we use to classify the input user images:

float basicOCR::classify(IplImage* img, int showResult)
{
IplImage prs_image;
CvMat data;
CvMat* nearest=cvCreateMat(1,K,CV_32FC1);
float result;
//process file
prs_image = preprocessing(img, size, size);

//Set data
IplImage* img32 = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
cvConvertScale(&prs_image, img32, 0.0039215, 0);
cvGetSubRect(img32, &data, cvRect(0,0, size,size));
CvMat row_header, *row1;
row1 = cvReshape( &data, &row_header, 0, 1 );

result=knn->find_nearest(row1,K,0,0,nearest,0);

int accuracy=0;
for(int i=0;i<K;i++){
if( nearest->data.fl[i] == result)
accuracy++;
}
float pre=100*((float)accuracy/(float)K);
if(showResult==1){
printf("|\t%.0f\t| \t%.2f%%  \t| \t%d of %d \t| \n",result,pre,accuracy,K);
printf(" ---------------------------------------------------------------\n");
}

return result;

}

All work or training and test is in basicOCR class, when we create a basicOCR instance then only we need call to classify function to classify our input image. Then we go to use basic Painter we create before in other tutorial to user interactivity to draw a image and classify it.

Demo Source
Demo Source with CMake build

164 Comments to “Basic OCR in OpenCV”

  1. mithun 19 July 2010 at 3:44 pm #

    hi,
    i am a student working on hand gesture recognition.1st i have to go for hand gesture segmentation. please help me

  2. Mhd 24 July 2010 at 11:22 pm #

    hello,
    Thanks for this trutorial. Accually I am using the latest version of the OpenCV which requires to use CMake. Frankly, I’m not that familiar with CMake, so if you can rise the CMakeLists.txt for this trutorial, I will be appreciated!

  3. mithun 26 July 2010 at 9:04 am #

    please tell me how to get a boundingbox over a specified pixel area of a binary image

  4. damiles 26 July 2010 at 9:16 am #

    Hi mithun, you must fincontours of your binary image, and then you can use boundingRect function to get the bounding box of your contour.

    Findcontour function: http://opencv.willowgarage.com/documentation/cpp/structural_analysis_and_shape_descriptors.html?highlight=findcontours#findContours

    boundingRect function:
    http://opencv.willowgarage.com/documentation/cpp/structural_analysis_and_shape_descriptors.html?highlight=findcontours#cv-boundingrect

    Regards.

  5. damiles 26 July 2010 at 9:19 am #

    Hi Mhd, opencv require cmake to build and install opencv but your projects can be compiled with makefiles, cmake, sh, vc++ and all you want, cmake is a makefile generator then you can use makefile.

    In this tutorial i use a shellscript for compile OCRbuild.sh, you can execute this script to compile.

    But in few days i create the cmake compiler for you ;) Regards David.

  6. yanlin 1 August 2010 at 10:02 am #

    First of all .Thanks a lot for you BasicOCR, it helps me a lot in my study. Well done.
    But I have several question about your code:
    (1)In function both “findX” and “findY” you calculate the maxVal use
    “CvScalar maxVal=cvRealScalar(imgSrc->width * 255)”.
    but i think maxVal for X and Y should be different,
    int “findX” it should be CvScalar maxVal=cvRealScalar(imgSrc->height * 255)
    (2)In function “getData” and “classify” ,in order to use opencv’s knn, you convert the image depth from 8bit to 32bit use param 0.0039125 , why?

    Thank you very much

  7. mithun 2 August 2010 at 9:53 am #

    hi,thanks for your suggession,now my problem is how to select the starting point for the boundingbox using boundingRect function .So that in cam boundingbox appears across my hand only

  8. damiles 2 August 2010 at 10:43 am #

    Hi Yanlin

    1.- Yes it should be height instead width, but my OCR images are same width and height, then you can use to max value the with width or height.

    2.- for convert 8bit to 32bit, 8bit are in range 0 to 255 but 32bit should be 0 to 1, then you no only convert the internal format, you rescale the values, then you when convert 0 – 255 to 0 – 1, you get this formula: pixel_src/255, this si pixel_src* 1/255, and 1/255=0.0039125.

  9. damiles 2 August 2010 at 10:47 am #

    Hi mithun, i don’t understand what is your problem, sorry, my english is so bad.

    for you get the bounding rect you should first search the contours, and use this contours to get the bounding rect.

  10. Utkarsh 22 August 2010 at 3:40 pm #

    Awesome! Exactly what I was looking for!! Thank you!

  11. pbharrin 26 August 2010 at 3:13 am #

    This is a great example!
    Did you do this on Mac in Xcode?

  12. damiles 26 August 2010 at 3:24 pm #

    In the last download demo, you can generate it with cmake util.

    cmake -g xcode

  13. Seongman 31 August 2010 at 8:13 am #

    Hello.
    Thanks for this example.

    I think you want maxVal with cvRealScalar(imgSrc->height) in findX() func. (about 34 lines)
    Isn’t it?

  14. damiles 31 August 2010 at 11:45 am #

    Hello Seongman, this line is correctly, the findY function, line 55, must be imgSrc->height, but it’s no problem in my example because images has same with and height.

    Regards. David


Leave a Reply