Basic OCR in OpenCV

Update!. Demo is now with CMake, the cross-platform, open-source build system.

Download Now!

In this tutorial we go to create a basic number OCR. It consist to classify a handwrite number into his class.

To do it, we go to use all we learn in before tutorials, we go to use a simple basic painter and the basic pattern recognition and classification with openCV tutorial.

In a typical pattern recognition classifier consist in three modules:

Preprocessing: in this module we go to process our input image, for example size normalize, convert color to BN…

Feature extraction: in this module we convert our image processed to a characteristic vector of features to classify, it can be the pixels matrix convert to vector or get contour chain codes data representation

Classification module get the feature vectors and train our system or classify an input feature vector with a classify method as knn.

In this basic OCR we go to use this graph:

Where we get a train set and test set of image to train and test our classifier method (knn)

We have a 1000 handwrite images, 100 images of each number. We get 50 images of each number (class) to train and other 50 to test our system.

Then the first work we do is pre-process all train image, to do it we create a preprocessing function. In this function we get a image and a new width and height we want as result of preprocessing, then the function return a normalized size with bounding box image. You can see more clear the process in this graph:

Pre-processing code:

void findX(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->width; i++){
cvGetCol(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max= i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}

void findY(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->height; i++){
cvGetRow(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max=i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}
CvRect findBB(IplImage* imgSrc){
CvRect aux;
int xmin, xmax, ymin, ymax;
xmin=xmax=ymin=ymax=0;

findX(imgSrc, &xmin, &xmax);
findY(imgSrc, &ymin, &ymax);

aux=cvRect(xmin, ymin, xmax-xmin, ymax-ymin);

//printf("BB: %d,%d - %d,%d\n", aux.x, aux.y, aux.width, aux.height);

return aux;

}

IplImage preprocessing(IplImage* imgSrc,int new_width, int new_height){
IplImage* result;
IplImage* scaledResult;

CvMat data;
CvMat dataA;
CvRect bb;//bounding box
CvRect bba;//boundinb box maintain aspect ratio

//Find bounding box
bb=findBB(imgSrc);

//Get bounding box data and no with aspect ratio, the x and y can be corrupted
cvGetSubRect(imgSrc, &data, cvRect(bb.x, bb.y, bb.width, bb.height));
//Create image with this data with width and height with aspect ratio 1
//then we get highest size betwen width and height of our bounding box
int size=(bb.width>bb.height)?bb.width:bb.height;
result=cvCreateImage( cvSize( size, size ), 8, 1 );
cvSet(result,CV_RGB(255,255,255),NULL);
//Copy de data in center of image
int x=(int)floor((float)(size-bb.width)/2.0f);
int y=(int)floor((float)(size-bb.height)/2.0f);
cvGetSubRect(result, &dataA, cvRect(x,y,bb.width, bb.height));
cvCopy(&data, &dataA, NULL);
//Scale result
scaledResult=cvCreateImage( cvSize( new_width, new_height ), 8, 1 );
cvResize(result, scaledResult, CV_INTER_NN);

//Return processed data
return *scaledResult;

}

We use the function getData of basicOCR class to create the train data and train classes, this function get all images under OCR folder to create this train data, the OCR forlder is structured with 1 folder to each class and each file have are pbm files with this name cnn.pbm where c is the class {0..9} and nn is the number of image {00..99}

Each image we get is pre-processed and then convert the data in a feature vector we use.

basicOCR.cpp getData code:

void basicOCR::getData()
{
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
for(i =0; i<classes; i++){
for( j = 0; j< train_samples; j++){

//Load file
if(j<10)
sprintf(file,"%s%d/%d0%d.pbm",file_path, i, i , j);
else
sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);

//Set class label
cvGetRow(trainClasses, &row, i*train_samples + j);
cvSet(&row, cvRealScalar(i));
//Set data
cvGetRow(trainData, &row, i*train_samples + j);

IplImage* img = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
//convert 8 bits image to 32 float image
cvConvertScale(&prs_image, img, 0.0039215, 0);

cvGetSubRect(img, &data, cvRect(0,0, size,size));

CvMat row_header, *row1;
//convert data matrix sizexsize to vecor
row1 = cvReshape( &data, &row_header, 0, 1 );
cvCopy(row1, &row, NULL);
}
}
}

After processed and get train data and classes whe then train our model with this data, in our sample we use knn method then:

knn=new CvKNearest( trainData, trainClasses, 0, false, K );

Then we now can test our model, and we can use the test result to compare to another methods we can use, or if we reduce the image scale or similar. There are a function to create the test in our basicOCR class, test function.

This function get the other 500 samples and classify this in our selected method and check the obtained result.

void basicOCR::test(){
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
int error=0;
int testCount=0;
for(i =0; i<classes; i++){
for( j = 50; j< 50+train_samples; j++){

sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);
float r=classify(&prs_image,0);
if((int)r!=i)
error++;

testCount++;
}
}
float totalerror=100*(float)error/(float)testCount;
printf("System Error: %.2f%%\n", totalerror);

}

Test use the classify function that get image to classify, process image, get feature vector and classify it with a find_nearest of knn class. This function we use to classify the input user images:

float basicOCR::classify(IplImage* img, int showResult)
{
IplImage prs_image;
CvMat data;
CvMat* nearest=cvCreateMat(1,K,CV_32FC1);
float result;
//process file
prs_image = preprocessing(img, size, size);

//Set data
IplImage* img32 = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
cvConvertScale(&prs_image, img32, 0.0039215, 0);
cvGetSubRect(img32, &data, cvRect(0,0, size,size));
CvMat row_header, *row1;
row1 = cvReshape( &data, &row_header, 0, 1 );

result=knn->find_nearest(row1,K,0,0,nearest,0);

int accuracy=0;
for(int i=0;i<K;i++){
if( nearest->data.fl[i] == result)
accuracy++;
}
float pre=100*((float)accuracy/(float)K);
if(showResult==1){
printf("|\t%.0f\t| \t%.2f%%  \t| \t%d of %d \t| \n",result,pre,accuracy,K);
printf(" ---------------------------------------------------------------\n");
}

return result;

}

All work or training and test is in basicOCR class, when we create a basicOCR instance then only we need call to classify function to classify our input image. Then we go to use basic Painter we create before in other tutorial to user interactivity to draw a image and classify it.

Demo Source
Demo Source with CMake build

164 Comments to “Basic OCR in OpenCV”

  1. Sergio 12 April 2009 at 11:18 am #

    Hi ,
    I m having dificulties in compiling /binding your package on MAC OS ,, I have OpenCV done ,,, any sugestions ?
    Thaks,

  2. damiles 12 April 2009 at 9:32 pm #

    Hi sergio, what is the problem, what is the error you get? I compile it in mac os perfectly…. I can create a cmake cross compiler if better for you.

  3. Tom 14 April 2009 at 2:39 am #

    This might be a stupid question, but I’m trying to understand how and where in your code you extract features from the images in your samples. More specifically how are you extracting the features and what features are you looking at for the images of the numbers? How are you evaluating these features when comparing it to the image file you are testing the accuracy of the classifier on?

  4. damiles 14 April 2009 at 9:30 am #

    There are no stupids questions.

    The features are the pixels of image.
    This is in the “basicOCR::getData()” function
    I use trainData variable for store the features.
    I use trainClasses for store the class identifier.

    First you get the row we go to use of trainData variable
    cvGetRow(trainData, &row, i*train_samples + j);

    Then get the image data
    IplImage* img = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
    And convert it into 32 float image
    cvConvertScale(&prs_image, img, 0.0039215, 0);

    I convert the image matrix into a vector data
    cvGetSubRect(img, &data, cvRect(0,0, size,size));
    CvMat row_header, *row1;
    row1 = cvReshape( &data, &row_header, 0, 1 );

    We store the data into row that is pointer of variable row of trainData
    cvCopy(row1, &row, NULL);

  5. Tongan 20 April 2009 at 3:41 am #

    Hi damiles,

    I really appreciate your example code.
    I am a newbie of OpenCV. I were assigned a project to separate the text from the image, then to recognize the text and also analysis the image structure(with text removed). But I really have no any idea how to separate the text from image. Any suggestions or example code?

    Thank you in advance!

  6. has 21 April 2009 at 7:07 pm #

    hello,

    can help me?
    I yet study openCV and I would compile your application with VC++ (MS-Visual Studio C++), do you have any example?

  7. damiles 22 April 2009 at 9:10 am #

    Hi Togan, i don’t have any example to separate the image to the text, sorry.

    Has, it’s no dificult compile with VC++, you only need set correctly the libraries. I go to create a cmake for this example, for correct cross compiling.

  8. raj 29 April 2009 at 9:06 pm #

    hello,
    i just downloaded openCV and was trying to get a hang of it. I was going through the OCR code, becuase I want to work on OCR too….when i checked the findX and findY functions it seems to me, this will work for only text on white backgrounds. Am i right? how do i work around that? also, how do i get the color of a pixel eg i want to test if (20,30) is black?

  9. damiles 29 April 2009 at 11:41 pm #

    yes you’re right.

    For test the color pixel of x,y position you can visit the faq documentation you get with opencv source (doc/faq.htm)

    in faq said:

    How to access image pixels

    (The coordinates are 0-based and counted from image origin, either top-left (img->origin=IPL_ORIGIN_TL) or bottom-left (img->origin=IPL_ORIGIN_BL)

    * Suppose, we have 8-bit 1-channel image I (IplImage* img):

    I(x,y) ~ ((uchar*)(img->imageData + img->widthStep*y))[x]

    * Suppose, we have 8-bit 3-channel image I (IplImage* img):

    I(x,y)blue ~ ((uchar*)(img->imageData + img->widthStep*y))[x*3]
    I(x,y)green ~ ((uchar*)(img->imageData + img->widthStep*y))[x*3+1]
    I(x,y)red ~ ((uchar*)(img->imageData + img->widthStep*y))[x*3+2]

    e.g. increasing brightness of point (100,100) by 30 can be done this way:

    CvPoint pt = {100,100};
    ((uchar*)(img->imageData + img->widthStep*pt.y))[pt.x*3] += 30;
    ((uchar*)(img->imageData + img->widthStep*pt.y))[pt.x*3+1] += 30;
    ((uchar*)(img->imageData + img->widthStep*pt.y))[pt.x*3+2] += 30;

    or more efficiently

    CvPoint pt = {100,100};
    uchar* temp_ptr = &((uchar*)(img->imageData + img->widthStep*pt.y))[x*3];
    temp_ptr[0] += 30;
    temp_ptr[1] += 30;
    temp_ptr[2] += 30;

    * Suppose, we have 32-bit floating point, 1-channel image I (IplImage* img):

    I(x,y) ~ ((float*)(img->imageData + img->widthStep*y))[x]

    * Now, the general case: suppose, we have N-channel image of type T:

    I(x,y)c ~ ((T*)(img->imageData + img->widthStep*y))[x*N + c]
    or you may use macro CV_IMAGE_ELEM( image_header, elemtype, y, x_Nc )
    I(x,y)c ~ CV_IMAGE_ELEM( img, T, y, x*N + c )

    There are functions that work with arbitrary (up to 4-channel) images and matrices (cvGet2D, cvSet2D), but they are pretty slow.

  10. raj 1 May 2009 at 2:37 pm #

    Hi,
    Could you please explain your getData function to me….I’m not understanding the functions you’ve used and why you’ve used them.

    1.How are you exactly setting trainData? Are trainClasses and trainData initialized to something or do I just declare them as CvMat* variables?

    2.I know cvGetRow gets the row value from a matrix but I what is this line doing exactly:
    cvGetRow(trainData, &row, i*train_samples + j);

  11. raj 1 May 2009 at 5:56 pm #

    Hi again,

    I got it going….took me time to understand but finally did…thanks for the earlier reply btw…. just a lingering doubt, everytime I run the program do I need to train it?

  12. damiles 1 May 2009 at 6:24 pm #

    yes, you need do this if you don’t save the train data, because this information you need for classify.

    You can save this data for no process the train images with opencv save data functions.

  13. Rahul 8 May 2009 at 8:04 am #

    Sir,
    I have to train a set of retinal images for classifying another test set. Can you explain the working of knn classifier and cvNearest method.

    Thanking You

  14. damiles 8 May 2009 at 8:24 am #

    Hi Rahul, you can see the before post, in this post i explain a simple demo about knn, here is the link http://blog.damiles.com/?p=84 (The basic patter recognition and classification with openCV)

  15. hossein 13 May 2009 at 10:12 pm #

    Dear damiles

    do you have a c# version of this code??

    thanks

  16. damiles 13 May 2009 at 10:42 pm #

    No, i don’t have a c# version of this.

  17. Karl Krukow 15 May 2009 at 9:09 pm #

    Hello,
    Thanks for a good post! I’ve downloaded the basicOCR.tar.gz however the makefile is empty. Is this a mistake?

    /Karl

  18. damiles 16 May 2009 at 1:00 pm #

    KarKrukow, there are a ORCbuild.sh for comile.

  19. Karl Krukow 16 May 2009 at 7:13 pm #

    Ok thanks for helping out ;-) It is probably just me that doesn’t understand what it needs to compile. When I run OCRbuild.sh I get:

    krukow:~/libraries/ocr/basicOCR/basicOCR$ ./OCRbuild.sh
    i686-apple-darwin9-g++-4.0.1: -lcxcore: linker input file unused because linking not done
    i686-apple-darwin9-g++-4.0.1: -lcv: linker input file unused because linking not done
    i686-apple-darwin9-g++-4.0.1: -lhighgui: linker input file unused because linking not done
    i686-apple-darwin9-g++-4.0.1: -lcvaux: linker input file unused because linking not done
    i686-apple-darwin9-g++-4.0.1: -lml: linker input file unused because linking not done
    i686-apple-darwin9-g++-4.0.1: -lcxcore: linker input file unused because linking not done
    i686-apple-darwin9-g++-4.0.1: -lcv: linker input file unused because linking not done
    i686-apple-darwin9-g++-4.0.1: -lhighgui: linker input file unused because linking not done
    i686-apple-darwin9-g++-4.0.1: -lcvaux: linker input file unused because linking not done
    i686-apple-darwin9-g++-4.0.1: -lml: linker input file unused because linking not done
    i686-apple-darwin9-g++-4.0.1: preprocessing.o: linker input file unused because linking not done
    ld: can’t open output file for writing: OCR, errno=21
    collect2: ld returned 1 exit status
    krukow:~/libraries/ocr/basicOCR/basicOCR$

    Any chance you can see the problem?

    Thanks,
    - Karl

  20. Karl Krukow 16 May 2009 at 7:14 pm #

    I’m using opencv1.1.pre1, btw.

    /K

  21. angela 8 June 2009 at 1:39 pm #

    hi damiles how r u, im a studing computer science i need some help in a program is like ur program but i have some difficulties, could u help me plz??

  22. damiles 8 June 2009 at 1:41 pm #

    How I can help you angela?

  23. angela 8 June 2009 at 1:42 pm #

    befor, could u give me ur mail to contact with u cuz it will be difficult to talk there

  24. wei 7 July 2009 at 9:58 am #

    Sir,
    May i know how to compile using ORCbuild.sh cz i nvr use this kind of compilation method before.

    regards,
    wei

  25. damiles 7 July 2009 at 10:03 am #

    This is a simply shell script for unix system, it only execute this commands:

    g++ -ggdb `pkg-config opencv –cflags –libs` preprocessing.c -c
    g++ -ggdb `pkg-config opencv –cflags –libs` preprocessing.o basicOCR.cpp -c
    g++ -ggdb `pkg-config opencv –cflags –libs` preprocessing.o basicOCR.o main.c -o OCR

  26. wei 7 July 2009 at 10:06 am #

    Sir,
    I get the compilation method already.
    May i know that if i wan do text recognition in real time, then how can i extract the text feature from camera captured image.

    regards,
    wei

  27. Jonas 28 July 2009 at 6:59 am #

    Very good info. Thank you Damiles.

  28. christian 2 August 2009 at 3:23 pm #

    hi, do know how can i connect a scanner to opencv? for scanned images to be processed in a program, thanks

  29. damiles 2 August 2009 at 4:07 pm #

    christian, you must save the scanner image data to read it in opencv

    Then you must have access to scanner with the scanner’s library to get the image data to pass as a iplimage to opencv.

    Regards

  30. troi 2 August 2009 at 6:48 pm #

    hello sir, im a friend of christian’s. did you mean the TWAIN library?

  31. christian 2 August 2009 at 6:55 pm #

    thank u for d reply. :)

    but do u have an idea how can i process multiple images one after another without saving the scanned images?

  32. damiles 2 August 2009 at 7:20 pm #

    troi twain for example is good.

    christian, if you use twain, you get a twain image structure in memory that you can save or convert to another structure as iplimage for opencv, when you finish to use this image memory you can save this image or release and capture another….

  33. Ali 10 August 2009 at 7:45 am #

    Hi Damiles;
    first I want to thank you for this code and also the idea.

    I read the forum from the beginning to end, you answered one question about how to compile this code and you said using VC++, do you mean visual C ++?
    can you give me link to download the compile for this language?

    I appreciated your reply.
    thanks.

  34. christian 10 August 2009 at 4:48 pm #

    are you familiar with computer vision? what library could you suggest? thank you for the info

  35. damiles 10 August 2009 at 9:01 pm #

    christian, in this web you can find more links (more are broken but you can look for in web search) http://www.cs.cmu.edu/~cil/v-source.html
    I like opencv as good library, but in the link you can get more libraries.

    Ali, search in google for vc++ and get the msdn web page and look for the vc++ and download the free visual studio express

    Regards. David

  36. Ali 11 August 2009 at 9:11 am #

    Thanks for reply Damiles.

    I download the VC++ visual studio express, but the debug button (F5) is disabled. I couldn’t compile the files, any idea.

    note that I am using windows vista SP1, and visual studio 2008.

  37. nivasan 24 August 2009 at 9:15 pm #

    hi damils

    thanks for your demo source it really helped me understanding more about opencv

    but i have some doubts..

    what is the signiicance of using 0.0039215 in the cvConvertScale function?

    i understand that the scale parameter of the function multiplies the value of each pixel and the shift parameter adds to every pixel

    but how multiplying by 0.0039215 gets the image to the 32 floating point image?? i want to know how to get this number

    and if i am using jpeg images for the sample data than pbm files, do i have to use the cvConvertScale function?

  38. damiles 24 August 2009 at 9:55 pm #

    Yes the 0.0039215 is for convert 32 floating point image.

    If you use a jpeg you are importing the image to 255 and you need to convert to 255 too.

    Regards.

  39. nivasan 24 August 2009 at 10:16 pm #

    hi Damiles

    thanks for the immediate reply.. :)

    but i tried it with the following code

    //for accessing file paths

    sprintf(file,”%s%d/%d0%d.jpg”,file_path, i, i , j);

    and
    //loading image in CV_LOAD_GRAYSCALE
    src_image = cvLoadImage(file,0);

    and i didnt change the cvConvertScale function

    cvConvertScale(&prs_image, img, 0.0039215, 0);

    but i am getting number 0 for all the possible inputs in the loadwindow from main any ideas?? can you give me an exact idea for using jpeg for sample data??

    thanks

    nivasan

  40. damiles 24 August 2009 at 10:35 pm #

    nivasan, you must debug your application, and ensure you are loading correctly images and you are not overwritting pointers or no load correctly images, more times we don’t load correctly the image and then we have a array filled with 0’s….

  41. nivasan 25 August 2009 at 12:05 am #

    oh it was my mistake

    well i only gave 4 training data and two classes and my k was 10 so obviously it will allways return the first value as the nearest match ( at least i hope this is the explanation) i reduced the k to 5 so it gave me some ok results

    sorry for bad English

    thanks for your support and guidance,

    Nivasan Sharma

  42. Amin 25 August 2009 at 6:55 am #

    Hi Damiles,

    I try to run your OCR program but there is an errors such as:

    error C2258: illegal pure syntax, must be ‘= 0′
    error C2252: ‘K’ : pure specifier can only be specified error C2065: ‘K’ : undeclared identifier

    basicOCR.obj – 3 error(s), 0 warning(s)

    Please help me.

    Amin

  43. Amin 25 August 2009 at 7:05 am #

    Hi again.

    I try run again and the error reduce from 3 to 2 errors:
    which are:

    error C2252: ‘K’ : pure specifier can only be specified for functions
    error C2065: ‘K’ : undeclared identifier

    basicOCR.obj – 2 error(s), 0 warning(s)

    Please help me.

    Thanks again.

    Amin

  44. nivasan 25 August 2009 at 9:06 pm #

    Hi damiles

    in the preprocessing file you can use the cvfindcontours and take the bounding box easily rather than using the findx and findy and findbb functions

    please clarify me if i am wrong.

    Regards

    Nivasan Sharma

  45. damiles 26 August 2009 at 12:56 pm #

    nivasan, is correct, you can use findcontours…. And use this contours to do the classify, but you mus train with contours too.

  46. phen 4 September 2009 at 2:03 pm #

    Hello!

    have you tried statistical moments (cvMoments) as features?

    By using moments you dont have to finding the smallest bounding box, because they are scale and place invariant.

    it would be interesting how the performance of your pretty straightforward approach differs to the complicated one.

  47. damiles 4 September 2009 at 2:06 pm #

    hi phen, thanks for your comment.

  48. phen 7 September 2009 at 11:06 am #

    thanks for this great blog!

    keep us updated if you continue to work on it :)

  49. sandra407 9 September 2009 at 5:55 pm #

    Hi! I was surfing and found your blog post… nice! I love your blog. :) Cheers! Sandra. R.

  50. Jurgen 22 October 2009 at 2:08 pm #

    Hi
    I am using Dev-cpp and I am trying to compile the program. I recieve following error :
    File format not recognized
    ld returned 1 exit status
    D:\Panto\OCR\Makefile.win [Build Error] [OCR.exe] Error 1

    Anybody an idea ?


Leave a Reply