Basic OCR in OpenCV
Update!. Demo is now with CMake, the cross-platform, open-source build system.
In this tutorial we go to create a basic number OCR. It consist to classify a handwrite number into his class.
To do it, we go to use all we learn in before tutorials, we go to use a simple basic painter and the basic pattern recognition and classification with openCV tutorial.
In a typical pattern recognition classifier consist in three modules:
Preprocessing: in this module we go to process our input image, for example size normalize, convert color to BN…
Feature extraction: in this module we convert our image processed to a characteristic vector of features to classify, it can be the pixels matrix convert to vector or get contour chain codes data representation
Classification module get the feature vectors and train our system or classify an input feature vector with a classify method as knn.
In this basic OCR we go to use this graph:
Where we get a train set and test set of image to train and test our classifier method (knn)
We have a 1000 handwrite images, 100 images of each number. We get 50 images of each number (class) to train and other 50 to test our system.
Then the first work we do is pre-process all train image, to do it we create a preprocessing function. In this function we get a image and a new width and height we want as result of preprocessing, then the function return a normalized size with bounding box image. You can see more clear the process in this graph:
Pre-processing code:
void findX(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->width; i++){
cvGetCol(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max= i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}
void findY(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->height; i++){
cvGetRow(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max=i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}
CvRect findBB(IplImage* imgSrc){
CvRect aux;
int xmin, xmax, ymin, ymax;
xmin=xmax=ymin=ymax=0;
findX(imgSrc, &xmin, &xmax);
findY(imgSrc, &ymin, &ymax);
aux=cvRect(xmin, ymin, xmax-xmin, ymax-ymin);
//printf("BB: %d,%d - %d,%d\n", aux.x, aux.y, aux.width, aux.height);
return aux;
}
IplImage preprocessing(IplImage* imgSrc,int new_width, int new_height){
IplImage* result;
IplImage* scaledResult;
CvMat data;
CvMat dataA;
CvRect bb;//bounding box
CvRect bba;//boundinb box maintain aspect ratio
//Find bounding box
bb=findBB(imgSrc);
//Get bounding box data and no with aspect ratio, the x and y can be corrupted
cvGetSubRect(imgSrc, &data, cvRect(bb.x, bb.y, bb.width, bb.height));
//Create image with this data with width and height with aspect ratio 1
//then we get highest size betwen width and height of our bounding box
int size=(bb.width>bb.height)?bb.width:bb.height;
result=cvCreateImage( cvSize( size, size ), 8, 1 );
cvSet(result,CV_RGB(255,255,255),NULL);
//Copy de data in center of image
int x=(int)floor((float)(size-bb.width)/2.0f);
int y=(int)floor((float)(size-bb.height)/2.0f);
cvGetSubRect(result, &dataA, cvRect(x,y,bb.width, bb.height));
cvCopy(&data, &dataA, NULL);
//Scale result
scaledResult=cvCreateImage( cvSize( new_width, new_height ), 8, 1 );
cvResize(result, scaledResult, CV_INTER_NN);
//Return processed data
return *scaledResult;
}
We use the function getData of basicOCR class to create the train data and train classes, this function get all images under OCR folder to create this train data, the OCR forlder is structured with 1 folder to each class and each file have are pbm files with this name cnn.pbm where c is the class {0..9} and nn is the number of image {00..99}
Each image we get is pre-processed and then convert the data in a feature vector we use.
basicOCR.cpp getData code:
void basicOCR::getData()
{
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
for(i =0; i<classes; i++){
for( j = 0; j< train_samples; j++){
//Load file
if(j<10)
sprintf(file,"%s%d/%d0%d.pbm",file_path, i, i , j);
else
sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);
//Set class label
cvGetRow(trainClasses, &row, i*train_samples + j);
cvSet(&row, cvRealScalar(i));
//Set data
cvGetRow(trainData, &row, i*train_samples + j);
IplImage* img = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
//convert 8 bits image to 32 float image
cvConvertScale(&prs_image, img, 0.0039215, 0);
cvGetSubRect(img, &data, cvRect(0,0, size,size));
CvMat row_header, *row1;
//convert data matrix sizexsize to vecor
row1 = cvReshape( &data, &row_header, 0, 1 );
cvCopy(row1, &row, NULL);
}
}
}
After processed and get train data and classes whe then train our model with this data, in our sample we use knn method then:
knn=new CvKNearest( trainData, trainClasses, 0, false, K );
Then we now can test our model, and we can use the test result to compare to another methods we can use, or if we reduce the image scale or similar. There are a function to create the test in our basicOCR class, test function.
This function get the other 500 samples and classify this in our selected method and check the obtained result.
void basicOCR::test(){
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
int error=0;
int testCount=0;
for(i =0; i<classes; i++){
for( j = 50; j< 50+train_samples; j++){
sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);
float r=classify(&prs_image,0);
if((int)r!=i)
error++;
testCount++;
}
}
float totalerror=100*(float)error/(float)testCount;
printf("System Error: %.2f%%\n", totalerror);
}
Test use the classify function that get image to classify, process image, get feature vector and classify it with a find_nearest of knn class. This function we use to classify the input user images:
float basicOCR::classify(IplImage* img, int showResult)
{
IplImage prs_image;
CvMat data;
CvMat* nearest=cvCreateMat(1,K,CV_32FC1);
float result;
//process file
prs_image = preprocessing(img, size, size);
//Set data
IplImage* img32 = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
cvConvertScale(&prs_image, img32, 0.0039215, 0);
cvGetSubRect(img32, &data, cvRect(0,0, size,size));
CvMat row_header, *row1;
row1 = cvReshape( &data, &row_header, 0, 1 );
result=knn->find_nearest(row1,K,0,0,nearest,0);
int accuracy=0;
for(int i=0;i<K;i++){
if( nearest->data.fl[i] == result)
accuracy++;
}
float pre=100*((float)accuracy/(float)K);
if(showResult==1){
printf("|\t%.0f\t| \t%.2f%% \t| \t%d of %d \t| \n",result,pre,accuracy,K);
printf(" ---------------------------------------------------------------\n");
}
return result;
}
All work or training and test is in basicOCR class, when we create a basicOCR instance then only we need call to classify function to classify our input image. Then we go to use basic Painter we create before in other tutorial to user interactivity to draw a image and classify it.
Demo Source
Demo Source with CMake build
164 Comments to “Basic OCR in OpenCV”
Leave a Reply









You can replace four lines by one:
———————————-
if(j<10)
sprintf(file,”%s%d/%d0%d.pbm”,file_path, i, i , j);
else
sprintf(file,”%s%d/%d%d.pbm”,file_path, i, i , j);
———————————-
sprintf(file,”%s%d/%d%02d.pbm”,file_path, i, i , j);
———————————-
This is basic C know-how
Thanks Tönu XD
Hello damiles,
you are a good teacher for my favorite computer vision library
I am looking for a segmentation solution in opencv esp. for handwritten letters (described in http://yann.lecun.com/exdb/publis/#lecun-98 – e.g. Heuristic Over Segmentation). As I still am a beginner with c/c++ and also not very familiar with complex mathematical formulars, I’d like to know, if you know about any code example with this method – or if you could give some helpfull hints/starting points.
Any help would be great!
Thank you,
Qazi
Hey,
Thanks for your article.
However, I’m quite disappointed with this very line :
cvCopy(row1, &row, NULL);
which basically means that you train the classifier with the *hole image*.
How long did the knn classifier take for those 50 samples of 40² pixels ?
More importantly : which “high level” criterion would you recommend (area, spline contour parameters, moments…) ?
thanks!
Arnaud, you are the first who analize the code and give me good questions, and doubts or disagree, THANKS!!
First, for train classifier i use all image pixels, I know it’s no the best criterion to use, but i use it for explain the basic of pattern recognition in ocr.
Then in cvCopy(row1, &row, NULL); i copy all pixels of each image in a row for clasifiy, first i get a row of my matrix for training “trainData”
cvGetRow(trainData, &row, i*train_samples + j);
then when i have the pixels in a array (row1) copy it in a row, a pointer of traindata row.
cvCopy(row1, &row, NULL);
For knn classifier i take 40² samples. The 50 number i think you extract from function void basicOCR::test() in line 10 “for( j = 50;…” , in this fucntion i classify 50 random numbers to get a error, this function is only statistic of my classifier, where i can test the knn with 50 random numbers and get the error obtained.
And of course, the more important, the criterion, i recomended other criterion, no pixels, contour or moments, for select criterion for this case you must test all, moments, pixels, spline, area, eigvals… and select the best, and test diferents size of image and all sets you can think.
I expect respond all your questions. I’m not a expert.
Regards, David.
Hi
first of all thanks for this wonderful tutorial.
I am doing a project that needs to extract a string of numbers from an image. I can get an image where it only contains numbers.
My problem is how can i segment the image so i can get each number and test it against the sample?
You can use cvBlobd linbrary http://opencv.willowgarage.com/wiki/cvBlobsLib
Or user any segmentation type or use countours or similar.
Regards.
damiles, thanks for this great tutorial. it was actually very interesting and complete!
Hey…very nice tutorials. I was wondering if you have any tutorial on human action recognition as well…using some feature point descriptors and then training the model to classify the actions….
Hi,
i compiled and run your code in vc++. I changed your pbm images into jpg but it did not give the good result if i draw the single horizontal line which is classified as class 4. Is there any code change is needed? please help me
Thanks in advance
Manivasakan B
Sorry vasakan, i don’t understand what error you have… Can you explain it more detailed.
Thanks
Thank you very much for your reply.
if i draw the number from 0 to 9 that is classified from the corresponding class but i draw the shape that is not a number(differnet shape ) that shape also classified from the number class which give 100% accuracy.
how to solve this problem.?
Thanks
Manivasakan.
Ah, ok, the system is prepared to get only 10 classes, numbers, then you must only draw numbers, if you draw a no number, then the system get the more approximate number class you draw. If you want no number class you must train with a new class to get the no number objects, but it’s more complex.
The accuaracy is how many of k-neighbourds are of the same class knn algorithm select as better. Then if you have 5 of 10 neighbourds you have a 50%.
Solve proble of no class defined, you must work with a diferent learning algorithm as EM that is unsupervised in some cases more flexible.
Regards.
Thanks
I really appreciate your valuable help. currently i am working on pit pattern classification. i want classify pattern from the cancer tumor images now i try to classify the Type III L pit pattern which look like tubular shape. i expect help from you.
I have basic knowledge in image processing with opencv
thanks
Manivasakan .
vasakan, your project is more advanced than this example, but it can help to start. you must have a good database of image properties, and train you classifier with it, and create 2 class, the tumor cancer and no tumor cancer.
There are a lot of medical papers with this themes, i recomend you search technical papers.
Regards.
thanks
Is KNN classifier good choice for pattern classification. I have a good knowledge about Type III L pit pattern. I like to take contour points for processing. can you give your mail id?. i will send the image to you for your clarification.
Regards
Manivasakan.
KNN is good classifier for most task but there are more that you must check
Thank you very much.
i have a image 640X480 size and i want to create the template for 128X128 size using contour points in opencv.
can you suggest me good idea for template creation?
i have tried to convert the contour points manually from 640X480 to 128X128 but it did not give the good template.
Regards
Manivasakan.
First, try use non pixels as parameters, i use it because is the most simpy to understand. I suggest read scientific papers to see what is the best template, and algorithm. I can’t explain correctly what are the best for this computer vision tasks
Thank you very much for spending lot of time for me.
i will read good papers for best template and algorithm and then i will get back you.
Thanks
Manivasakan
Hi,
Have you seen my images? I want to create the class for that images. Can you give some idea for template creation. I going to create the two classes one for Type III L pit pattern another one for non Type III L.
Is there any tool for template creation?
Thanks
Manivasakan B.
Hi!
Can I use OpenCV in recognizing the font used in a document image? I am a noob in OpenCV. Can you give me some tips to make my project possible?
Thank you in advance!
Hi,
Can I use opencv in my code for ocr in mobile phone?because having a hard time where or how will i start my project. by the way im a student.
i hope you give me more basic tutorial.
thanks in advance.
Hi, there are some opencv port for some mobiles as symbian os or iphone, search in web about opencv port mobile.
This tutorial is the most basic about ocr basics and pattern recognition, but it’s not the best way to do a pattern recognition for ocr, you must use other ways as chains or similar simplify data and do more fast for mobile.
Regards David.
how do i segment textlines and words of a document using opencv?? this is for offline handwritten recognition..pls help
I’m using Tesseract OCR engine for my hand written character recognition project. The letters are recognized one by one in the system and finally out put as words. But I want to improve the accuracy by again reviewing the entire words after the character recognition. In that case we can check whether the the word is having any meaning by comparing them with the words in our data base.
But I don’t have any idea of how to do it with openCv. So please can you help me on this issue. Thanks in advance.
Thilanka.
Hi Damiles!! Your basic OCR is very good)) But I have some questions – do you use Neural Networks? And if use – what`s the method of recognition in this OpenCV OCR you use? Is this neural network`s back propagation method? Where is in this code you using Neural Networks? Thanks)))
Hi EstateMX, I don’t use Neural Networks, i use a simply knn algorithm
hey there.do you have a code for motion tracking of multiple objects that doesn’t necessarily appear in the 1st frame?
Jay, i go to prepare a new tutorial for this tasks. Wait some days.
I would like for my robot to do two things. First to simply be able to read type written pages. Hand written is nice also like this lesson,but typewritten is what I wish for. Next, I would like for him to be able to recognize objects such as a medicine bottle, coke can, etc. Is this possible with a few mods to THIS program. Oh, I also need for it to be in OpenSource because I am on a fixed income. Your tutorial was very good!
Thanks!
Ooops! I forgot. The robot is using standard web-cams to see with.
Thanks again,
Hi MovieMaker, yes you can do it with some mods of this app, other people use this simple sample as base of Sign language. But this is a sample and no is the best way for a real and profesional project. But is good for simply projects, to know how work the functions and the pattern recognition bases.
Regards.
Hi!! thanks for the post
Please help me,
i’ve tried implement the classifier but i always receive the greater K,i paste some of my code:
#define cantidad 100
int clases = 2;
int K = 2;
CvMat* trainData = cvCreateMat( clases*cantidad , 36 , CV_32FC1 );
CvMat* trainClasses = cvCreateMat( clases*cantidad , 1 , CV_32FC1 );
printf(“\nObteniendo datos…”);
//Obtenemos los datos del archivo
printf(“…”);
trainData = ( CvMat* ) cvLoad( “datos.xml” );
printf(“%d,%d”,trainData->rows,trainData->cols);
CvMat row;
for( int j = 0 ; j < clases ; j++ )
for( int i = 0 ; i < cantidad ; i++ )
{
//Set class label
cvGetRow( trainClasses , &row , j*cantidad + i );
cvSet(&row, cvRealScalar( j+1 ));
}
printf( "…Datos obtenidos" );
printf("\nEntrenando…");
//Entrenamos el clasificador
printf( "…" );
CvKNearest knn( trainData, trainClasses, 0, false, K );
printf( "…Entrenamiento terminado" );
printf("\nProbando…");
//Lo probamos
printf( "…\n\n" );
int error = 0 , cuenta = 0 , k;
CvMat *prueba = cvCreateMat( clases*50 , 36 , CV_32FC1 );
prueba = ( CvMat* ) cvLoad( "test.xml" );
for( int j = 0 ; j < clases ; j++ )
{
for( int i = 0 ; i <50 ; i++ )
{
cvGetRow( prueba , &row , j*50 + i );
float r = knn.find_nearest( &row , K , 0 , 0 , 0 , 0 );
printf( "%f, " , r );
k = j+1;
if( (int)r != k )
error++;
cuenta++;
}
}
float porcentaje = ( 100. * ( (float)error ) ) / ( (float)cuenta );
cvReleaseMat( &prueba );
printf( "…prueba terminada\nError promedio: %.2f" , porcentaje );
printf("\nTodo salio bien ");
The problem is that the find_nearest always return 2 ¬_¬, i thought that the problem was the file "datos.xml" & "test.xml" but i've seen that the matrix is correct, but if instead cvRealScalar( j+1 ) i put cvRealScalar(j) the only response is 1 ¬_¬'.
Please help me,
Thanks in advance
Greetings
Hi Ricardo, I see you use a lower K value, you use 2, i recommend use 8 or higher for Knn construct and then for find use 4 or higher and lower than 8.
Regards David.
ohhh, lot of thanks man!!!!
what a stupid error =P, you’re the man, i put k=10 and it’s ok ohh my god, just 2 hours before that i have to present it =D
Oh by the way, when i was looking for help in the web i have seen somebody who copy paste your post, http://leemin.tistory.com/7
Thank u again =D
is this only for handwritten image? or it can be use for other digit image(those in photos or pictures)
Hi lalafell, this is a tutorial with an example of handwritten images, and i want explain how train and use of basic pattern recognition with handwritten but you can use this for any other use images, Vehicle registration plate…, i know this tutorial was used for a sign language project.
Hi damiles ,
I am using OpenCV 1.0 and VS 2005.
I got a compilation error after adding in main.c, preprocessing.c and basicOCR.cpp.
Can you please advice what is missing in my environment to get your code compiled?
Many thanks.
Errors like
———–
Error 3 error C2061: syntax error : identifier ‘CvVectors’ c:\opencv110\ml\include\ml.h 75
Error 4 error C2059: syntax error : ‘}’ c:\opencv110\ml\include\ml.h 82
Error 5 error C2061: syntax error : identifier ‘CvStatModel’ c:\opencv110\ml\include\ml.h 128
Error 6 error C2059: syntax error : ‘;’ c:\opencv110\ml\include\ml.h 128
Error 7 error C2449: found ‘{‘ at file scope (missing function header?) c:\opencv110\ml\include\ml.h 129
Error 8 error C2059: syntax error : ‘}’ c:\opencv110\ml\include\ml.h 144
Error 9 error C2061: syntax error : identifier ‘CvKNearest’ c:\opencv110\ml\include\ml.h 188
Error 10 error C2059: syntax error : ‘;’ c:\opencv110\ml\include\ml.h 188
Hello
Sorry but i am a new programmer here and i wand to make an ocr application but unfortunately i cant even run your code , i have a visual studio 2005!!
is there any read me file or something to help
Thanks,
Are you using opencv 1.10? check update to last opencv version…
Regards.
Hi Mena, there are no readme for this example. i work with unix system and no check my examples on other os. Then i can help with vsc++. Check first you can install and run the opencv examples. then you must create new console project over vc++ and import my project to run it.
Regards.
Hi damiles,
it works with me in a good way under windows xp and the system error is 9.80%
but there is something i want to ask about, when i tried to load image instead of draw image by cursor by using imagen=cvLoadImage(“pic.png”);
it cause an exception
“OpenCV Error: Assertion failed (src.channels() == dst.channels()) in un
known function, file ..\..\..\src\cxcore\cxcopy.cpp, line 484″
Thanks,
Hi Damiles,
Its an awesome tutorial…. But its for handwritten text…. I wanna do character recognition on computer fonts only. Specifically, I want to remove the labels in a map. Can you please tell me how to go about it?
Thanks.
It’s same process only you need train your model with computer fonts instead handwritten text.
Hi Damiles,
it seems to me a great tutorial but, I’m very beginner with OpenCV. Can you tell me which version of the library was used in your project ?
I’m working on Windows XP with VC 2005. With the version 2.0.0 I couldn’t compile (the constructor CvKNearest::CvKNearest you used in basicOCR::train was changed) and with the 2.1.0 version I can compile but at runtime I always get an error message that the application couldn’t start.
i think i compile it with opencv2.0. But, what error you get? what runtime error? you can debug to see where is the runtime error occour?
With version 2.0 the main problem was at runtime where I got messages like “cannot find function cvDestroyWinodow in cv200.dll” (the wrong DLL); so, I guess it’s a linking problem.
With version 2.1 first I have a compilation problem in “cvinternal.h” at line 117 (#include “pmmintrin.h” – the file is missing); I guess it’s a bug I commented the line and then it compiles with no problem.
At runtime debug, when I link with the libraries with ending “d” (like: cv210d.lib) – the application fails with the message: “The application was unable to start correctley (0×0150002)” – so, I guess another bug – the libraries are not linked correctley; it doesn’t work even if I make a program with only one call to cvNamedWindow.
At runtime release it works perfectly (I tried it today) when I link with the libraries with no ending “d” (like cv210.lib)
It’s strange, i’m not use VC++ but it can be a incorrect project settings…
Regards
thanks
very nice