Basic OCR in OpenCV

Nov 20, 2008 by     296 Comments    Posted under: OpenCV, Tutorials

Demo Source from GitHub

In this tutorial we go to create a basic number OCR. It consist to classify a handwrite number into his class.

To do it, we go to use all we learn in before tutorials, we go to use a simple basic painter and the basic pattern recognition and classification with openCV tutorial.

In a typical pattern recognition classifier consist in three modules:

Preprocessing: in this module we go to process our input image, for example size normalize, convert color to BN…

Feature extraction: in this module we convert our image processed to a characteristic vector of features to classify, it can be the pixels matrix convert to vector or get contour chain codes data representation

Classification module get the feature vectors and train our system or classify an input feature vector with a classify method as knn.

In this basic OCR we go to use this graph:

Where we get a train set and test set of image to train and test our classifier method (knn)

We have a 1000 handwrite images, 100 images of each number. We get 50 images of each number (class) to train and other 50 to test our system.

Then the first work we do is pre-process all train image, to do it we create a preprocessing function. In this function we get a image and a new width and height we want as result of preprocessing, then the function return a normalized size with bounding box image. You can see more clear the process in this graph:

Pre-processing code:

void findX(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->width; i++){
cvGetCol(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max= i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}

void findY(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->height; i++){
cvGetRow(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max=i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}
CvRect findBB(IplImage* imgSrc){
CvRect aux;
int xmin, xmax, ymin, ymax;
xmin=xmax=ymin=ymax=0;

findX(imgSrc, &xmin, &xmax);
findY(imgSrc, &ymin, &ymax);

aux=cvRect(xmin, ymin, xmax-xmin, ymax-ymin);

//printf("BB: %d,%d - %d,%d\n", aux.x, aux.y, aux.width, aux.height);

return aux;

}

IplImage preprocessing(IplImage* imgSrc,int new_width, int new_height){
IplImage* result;
IplImage* scaledResult;

CvMat data;
CvMat dataA;
CvRect bb;//bounding box
CvRect bba;//boundinb box maintain aspect ratio

//Find bounding box
bb=findBB(imgSrc);

//Get bounding box data and no with aspect ratio, the x and y can be corrupted
cvGetSubRect(imgSrc, &data, cvRect(bb.x, bb.y, bb.width, bb.height));
//Create image with this data with width and height with aspect ratio 1
//then we get highest size betwen width and height of our bounding box
int size=(bb.width>bb.height)?bb.width:bb.height;
result=cvCreateImage( cvSize( size, size ), 8, 1 );
cvSet(result,CV_RGB(255,255,255),NULL);
//Copy de data in center of image
int x=(int)floor((float)(size-bb.width)/2.0f);
int y=(int)floor((float)(size-bb.height)/2.0f);
cvGetSubRect(result, &dataA, cvRect(x,y,bb.width, bb.height));
cvCopy(&data, &dataA, NULL);
//Scale result
scaledResult=cvCreateImage( cvSize( new_width, new_height ), 8, 1 );
cvResize(result, scaledResult, CV_INTER_NN);

//Return processed data
return *scaledResult;

}

We use the function getData of basicOCR class to create the train data and train classes, this function get all images under OCR folder to create this train data, the OCR forlder is structured with 1 folder to each class and each file have are pbm files with this name cnn.pbm where c is the class {0..9} and nn is the number of image {00..99}

Each image we get is pre-processed and then convert the data in a feature vector we use.

basicOCR.cpp getData code:

void basicOCR::getData()
{
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
for(i =0; i<classes; i++){
for( j = 0; j< train_samples; j++){

//Load file
if(j<10)
sprintf(file,"%s%d/%d0%d.pbm",file_path, i, i , j);
else
sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);

//Set class label
cvGetRow(trainClasses, &row, i*train_samples + j);
cvSet(&row, cvRealScalar(i));
//Set data
cvGetRow(trainData, &row, i*train_samples + j);

IplImage* img = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
//convert 8 bits image to 32 float image
cvConvertScale(&prs_image, img, 0.0039215, 0);

cvGetSubRect(img, &data, cvRect(0,0, size,size));

CvMat row_header, *row1;
//convert data matrix sizexsize to vecor
row1 = cvReshape( &data, &row_header, 0, 1 );
cvCopy(row1, &row, NULL);
}
}
}

After processed and get train data and classes whe then train our model with this data, in our sample we use knn method then:

knn=new CvKNearest( trainData, trainClasses, 0, false, K );

Then we now can test our model, and we can use the test result to compare to another methods we can use, or if we reduce the image scale or similar. There are a function to create the test in our basicOCR class, test function.

This function get the other 500 samples and classify this in our selected method and check the obtained result.

void basicOCR::test(){
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
int error=0;
int testCount=0;
for(i =0; i<classes; i++){
for( j = 50; j< 50+train_samples; j++){

sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);
float r=classify(&prs_image,0);
if((int)r!=i)
error++;

testCount++;
}
}
float totalerror=100*(float)error/(float)testCount;
printf("System Error: %.2f%%\n", totalerror);

}

Test use the classify function that get image to classify, process image, get feature vector and classify it with a find_nearest of knn class. This function we use to classify the input user images:

float basicOCR::classify(IplImage* img, int showResult)
{
IplImage prs_image;
CvMat data;
CvMat* nearest=cvCreateMat(1,K,CV_32FC1);
float result;
//process file
prs_image = preprocessing(img, size, size);

//Set data
IplImage* img32 = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
cvConvertScale(&prs_image, img32, 0.0039215, 0);
cvGetSubRect(img32, &data, cvRect(0,0, size,size));
CvMat row_header, *row1;
row1 = cvReshape( &data, &row_header, 0, 1 );

result=knn->find_nearest(row1,K,0,0,nearest,0);

int accuracy=0;
for(int i=0;i<K;i++){
if( nearest->data.fl[i] == result)
accuracy++;
}
float pre=100*((float)accuracy/(float)K);
if(showResult==1){
printf("|\t%.0f\t| \t%.2f%%  \t| \t%d of %d \t| \n",result,pre,accuracy,K);
printf(" ---------------------------------------------------------------\n");
}

return result;

}

All work or training and test is in basicOCR class, when we create a basicOCR instance then only we need call to classify function to classify our input image. Then we go to use basic Painter we create before in other tutorial to user interactivity to draw a image and classify it.

Demo Source

296 Comments + Add Comment

  • I am kind of new in the OCR. I am currently working Sikuli and have some issue with it inconsistent result. I am looking for a way of improvement. My question is. If font (type,size) that will be scan and extract is predetermine. Can you use the font info to improve performance.?

  • Hi, this respond is dificult, sure you have better improvement, but you go to recive bad results when you change font or size.

    It’s no easy improve a pattern recognition system, i sugest you create diferent dataset, diferent features for training and evaluate the result errors and select the best.

  • Hi, thanks you for this good blog. Have you a good tutoriel for SVM classifier ?

    another question : i hope to develop on my MacBook laptop but i don’t know if i can found all the necessary for using OpenCV ?

    thanks.

  • Hi, I don’t have now any svm tutorial

    And you can use OpenCV in your mac, i use mac and i never have problems to compile and use opencv, i only recomend you use cmake for script compilation.

  • thanks for your response.

    Can i develop for the real-time application !!! my objectif is to use the GPU but on my MacBook i have ATI card and not the NVIDIA !! so i don’t know if it’s possible to use GPU with OpenCV?

    Thanks.

  • Great work damiles, thank you

    I have two questions about the code:

    In findX shouldn’t we do:
    CvScalar maxVal=cvRealScalar(imgSrc->height * 255);
    instead of
    CvScalar maxVal=cvRealScalar(imgSrc->width * 255);

    and in findBB
    aux=cvRect(xmin, ymin, xmax-xmin+1, ymax-ymin+1);
    instead of
    aux=cvRect(xmin, ymin, xmax-xmin, ymax-ymin);

    Thank you again for this great article
    David

  • i can’t compile the project with VC++ ?? please help mee !!

  • What is the log of VC++

  • Yes, you have reason, but in my code works correct because my images are same size for with and height.

  • Hi,
    I need to get a OCR software for numbers (1-9) in Arial running on a Microcontroller (I can get one with Linux if it helps….). Im neu here and dont now that much about OCRs.
    Googling didnt help much.
    Can someone help me?

    Thanks!

    Matthias

  • Hi Damiles, thank for your helpful turtorial.
    How do you make .pbm files in OCR folder?

  • its an awesome tutorual.
    i’ld like to know if i want to extract the number
    i mean i have an IlpImage and i want to get char from it so which function should i use?

  • Thank you very much for sharing this source code. Your code give me more information about OCR description :)

  • Great sharing!
    Well done!!
    Keep up with this :)

  • Can I use this for text detection from subtitles in video?

  • hi.damiles
    (I am a Vietnamese, English poorly)
    in main,I add code:
    IplImage* img=cvLoadImage(“out.png”);
    Then.I Flow code classify:
    ocr.classify(img,1);

    But when press key “c” show error:
    [IMG]http://i546.photobucket.com/albums/hh425/sonth8x1/nhchpmnhnh_2012-02-27_213402.png[/IMG]

    Expect people to help me.

  • Hi, no accept 4 channels image or RGB i think, please check with a 1 single channel image

  • Yes of course, but, the problems is detect the subtitle region video.

  • You can use gimp for create pbm files

  • You can use this app training the system with arial numbers, or you can use tessreact

  • Thanks damiles
    I convert to 1 channel image and run example very good.thank you very much!

  • sorry to bother damiles.I could ask you a few more problems
    -The file .pbm:
    +you use software export file”.pbm”
    +or do I right threshold image to image binary.
    if (pixel (0,0,0))
    {
    file==0
    }
    else
    {
    file==1
    }
    you have used any method to transfer images to .pbm?
    Thanks U.

  • Sorry damiles
    I forgot to read “You can use gimp for create pbm files”

  • Hi Damiles,

    I read somewhere you should first extract the features for dataset to use with kNN or SVM? Have you done anything like that? What kind of features did you extract?

  • The feature is that you think is some specific characteristics that you can choose to diferenciate one characters between others.

    In this demo i use the most basic feature, the pixels of images. But we can use others as the histogram, or sum all black pixels in each X, or wathever you want.

  • So you just gave all the pixel values of the digit?
    How much accuracy do you think it will give, if you have tested?

    Can you specify a few more features for better accuracy which are normally used in OCR apps?

  • Great work damiles, really helpful for my undergraduate research project,
    Have you tried multi layer perception, can you guide me how to do this on MLP,are
    there any drawbacks if use MLP.

  • I think you want said perceptron and no perception.

    OpenCV has a good documentation, and for Machine Learning functions/clases are more similar. Then The Knn i use is more similar usage than MLP.

    Here is the documentation about MLP on OpenCV. http://opencv.willowgarage.com/documentation/cpp/neural_networks.html

  • you can make a example Neural networks on OpenCV?Thanks damiles

  • Can you please guide me how to create those pbm files, cus even i use gimp and i created ASCII pbm files for my images, yet they are not neat as your ones, in your images background is 0 and only the letter 1(if we open it on note pad), but when i create, it creates background with 0 but the letters contain 1s and 0s both,
    can you guide me how to do it properly, I’m stuck with creating my data set because of it.
    damiles Rocks.
    Thank you

  • hey, great blog!
    i’m getting the same error as Đức Sơn when i tried to load my own image for classification..
    i loaded the image with:
    IplImage* img=cvLoadImage(“abc.pbm”);

    i wasnt sure how i was supposed to convert to single channel so i used:

    IplImage* temp = cvCreateImage(cvGetSize(img), img->depth, 1);
    cvSetImageCOI(img, 1);
    cvCopy(img, temp);

    but im still getting the same error when i try to classify..
    please help.. how can i solve it?
    how did you solve it, Đức Sơn?

  • I can’t understand line code:
    http://i546.photobucket.com/albums/hh425/sonth8x1/baiviet/Capture.png
    Set Data to matrix trainData, i understand line code:
    cvGetRow(trainData, &row, i*train_samples + j);
    get value trainData by row i*train_samples + j to matrix row.
    I can’t understand trainData set Data?
    please. Help me? Thank all.

  • @Rebecca You right convert Image 4 channel to image 1 channel

  • [...] DAMILES在网上发布了一个应用OpenCV进行OCR的例子, http://blog.damiles.com/2008/11/basic-ocr-in-opencv/。 [...]

  • Hi Damiles,
    thank you for your code, it was really helpful to me.

    Anyway, a little correction. as David Fernandes already pointed out, in findBB it should be
    aux=cvRect(xmin, ymin, xmax-xmin+1, ymax-ymin+1);
    instead of
    aux=cvRect(xmin, ymin, xmax-xmin, ymax-ymin);

    You can’t notice much in single digits image, but since in my code I have to recursively crop an image looking for digits in different positions, as time goes on, the missing pixels start to be noticed.

  • Hi Damiles, great blog!. I wanted use your code to extend it to recognise other character, and use it as base to recognise pool of character in a document or page. I will publish the code after finishing and make it open source, and I will give you the credits you deserve when I publish. I hope I can go ahead.

  • Hello,

    Great tutorial! It really helped me to make first steps in character recognition.
    The benefit of this article in simlicity, sorce code gives final details how it works.

    I’m grateful for sharing knowledge.

  • dear damiles!
    thanks so much.

  • I wanna ask you about your ocr’s program. I use ocr to recognize printed character. is that program can use for my problem?
    thanks

  • What programming language are you using?

  • The Programing language is C/C++ and this app is a sample to know how a OCR work but not for comercial use. I recomend use Tesseract or other comercia or complete OCR apps.

  • damiles this is a great effort and help for the sake of other. i appreciate your work.
    i am here want to share something with you, actually i am going to read a business card data like email, phone no etc, so is it possible to complete in Android using open cv, and is the above code is working good for recognition of any text in an image, if yes then please upload the complete source code,
    please please help,

  • The code is complete for numbers, but it’s a basic tutorial to understand how works a OCR. If you want a more powerful OCR you can use Tesseract.

  • Great tutorial Damiles, although most of it is too complicated for me right now!
    I’m totally new to OCR stuff, and trying to do something for an Android app that needs to recognise single typed letters (not numbers, not handwritten).
    I have a few very basic questions to make sure I’m understanding the process properly:
    1) I assume you have to train (the system) only once, then ONLY use the classify() (or test()?) methods in the user application?
    2) If I have characters in different orientations (just 4, every 90 degrees) do you think it would be easier to a) check, rotate, check etc, or b) just train with letter images in all 4 orientations?

    Graeme

  • Hi Graeme,

    1) In almost systems that no have to learn, is sufficient to make only one training and this is saved is a file for future. And then you can clasify with this train set all times you need.
    2) You can use any that you said, a or b, and check what of both is better, or you can check a more advanced invariant scale/rotation features to use.

Got anything to say? Go ahead and leave a comment!

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

Category

Polls

How Is My Site?

View Results

Loading ... Loading ...

Twitter: damiles3D

  • Could not connect to Twitter