Basic OCR in OpenCV
In this tutorial we go to create a basic number OCR. It consist to classify a handwrite number into his class.
To do it, we go to use all we learn in before tutorials, we go to use a simple basic painter and the basic pattern recognition and classification with openCV tutorial.
In a typical pattern recognition classifier consist in three modules:
Preprocessing: in this module we go to process our input image, for example size normalize, convert color to BN…
Feature extraction: in this module we convert our image processed to a characteristic vector of features to classify, it can be the pixels matrix convert to vector or get contour chain codes data representation
Classification module get the feature vectors and train our system or classify an input feature vector with a classify method as knn.
In this basic OCR we go to use this graph:
Where we get a train set and test set of image to train and test our classifier method (knn)
We have a 1000 handwrite images, 100 images of each number. We get 50 images of each number (class) to train and other 50 to test our system.
Then the first work we do is pre-process all train image, to do it we create a preprocessing function. In this function we get a image and a new width and height we want as result of preprocessing, then the function return a normalized size with bounding box image. You can see more clear the process in this graph:
Pre-processing code:
void findX(IplImage* imgSrc,int* min, int* max){ int i; int minFound=0; CvMat data; CvScalar maxVal=cvRealScalar(imgSrc->width * 255); CvScalar val=cvRealScalar(0); //For each col sum, if sum < width*255 then we find the min //then continue to end to search the max, if sum< width*255 then is new max for (i=0; i< imgSrc->width; i++){ cvGetCol(imgSrc, &data, i); val= cvSum(&data); if(val.val[0] < maxVal.val[0]){ *max= i; if(!minFound){ *min= i; minFound= 1; } } } } void findY(IplImage* imgSrc,int* min, int* max){ int i; int minFound=0; CvMat data; CvScalar maxVal=cvRealScalar(imgSrc->width * 255); CvScalar val=cvRealScalar(0); //For each col sum, if sum < width*255 then we find the min //then continue to end to search the max, if sum< width*255 then is new max for (i=0; i< imgSrc->height; i++){ cvGetRow(imgSrc, &data, i); val= cvSum(&data); if(val.val[0] < maxVal.val[0]){ *max=i; if(!minFound){ *min= i; minFound= 1; } } } } CvRect findBB(IplImage* imgSrc){ CvRect aux; int xmin, xmax, ymin, ymax; xmin=xmax=ymin=ymax=0; findX(imgSrc, &xmin, &xmax); findY(imgSrc, &ymin, &ymax); aux=cvRect(xmin, ymin, xmax-xmin, ymax-ymin); //printf("BB: %d,%d - %d,%d\n", aux.x, aux.y, aux.width, aux.height); return aux; } IplImage preprocessing(IplImage* imgSrc,int new_width, int new_height){ IplImage* result; IplImage* scaledResult; CvMat data; CvMat dataA; CvRect bb;//bounding box CvRect bba;//boundinb box maintain aspect ratio //Find bounding box bb=findBB(imgSrc); //Get bounding box data and no with aspect ratio, the x and y can be corrupted cvGetSubRect(imgSrc, &data, cvRect(bb.x, bb.y, bb.width, bb.height)); //Create image with this data with width and height with aspect ratio 1 //then we get highest size betwen width and height of our bounding box int size=(bb.width>bb.height)?bb.width:bb.height; result=cvCreateImage( cvSize( size, size ), 8, 1 ); cvSet(result,CV_RGB(255,255,255),NULL); //Copy de data in center of image int x=(int)floor((float)(size-bb.width)/2.0f); int y=(int)floor((float)(size-bb.height)/2.0f); cvGetSubRect(result, &dataA, cvRect(x,y,bb.width, bb.height)); cvCopy(&data, &dataA, NULL); //Scale result scaledResult=cvCreateImage( cvSize( new_width, new_height ), 8, 1 ); cvResize(result, scaledResult, CV_INTER_NN); //Return processed data return *scaledResult; }
We use the function getData of basicOCR class to create the train data and train classes, this function get all images under OCR folder to create this train data, the OCR forlder is structured with 1 folder to each class and each file have are pbm files with this name cnn.pbm where c is the class {0..9} and nn is the number of image {00..99}
Each image we get is pre-processed and then convert the data in a feature vector we use.
basicOCR.cpp getData code:
void basicOCR::getData() { IplImage* src_image; IplImage prs_image; CvMat row,data; char file[255]; int i,j; for(i =0; i<classes; i++){ for( j = 0; j< train_samples; j++){ //Load file if(j<10) sprintf(file,"%s%d/%d0%d.pbm",file_path, i, i , j); else sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j); src_image = cvLoadImage(file,0); if(!src_image){ printf("Error: Cant load image %s\n", file); //exit(-1); } //process file prs_image = preprocessing(src_image, size, size); //Set class label cvGetRow(trainClasses, &row, i*train_samples + j); cvSet(&row, cvRealScalar(i)); //Set data cvGetRow(trainData, &row, i*train_samples + j); IplImage* img = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 ); //convert 8 bits image to 32 float image cvConvertScale(&prs_image, img, 0.0039215, 0); cvGetSubRect(img, &data, cvRect(0,0, size,size)); CvMat row_header, *row1; //convert data matrix sizexsize to vecor row1 = cvReshape( &data, &row_header, 0, 1 ); cvCopy(row1, &row, NULL); } } }
After processed and get train data and classes whe then train our model with this data, in our sample we use knn method then:
knn=new CvKNearest( trainData, trainClasses, 0, false, K );
Then we now can test our model, and we can use the test result to compare to another methods we can use, or if we reduce the image scale or similar. There are a function to create the test in our basicOCR class, test function.
This function get the other 500 samples and classify this in our selected method and check the obtained result.
void basicOCR::test(){ IplImage* src_image; IplImage prs_image; CvMat row,data; char file[255]; int i,j; int error=0; int testCount=0; for(i =0; i<classes; i++){ for( j = 50; j< 50+train_samples; j++){ sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j); src_image = cvLoadImage(file,0); if(!src_image){ printf("Error: Cant load image %s\n", file); //exit(-1); } //process file prs_image = preprocessing(src_image, size, size); float r=classify(&prs_image,0); if((int)r!=i) error++; testCount++; } } float totalerror=100*(float)error/(float)testCount; printf("System Error: %.2f%%\n", totalerror); }
Test use the classify function that get image to classify, process image, get feature vector and classify it with a find_nearest of knn class. This function we use to classify the input user images:
float basicOCR::classify(IplImage* img, int showResult) { IplImage prs_image; CvMat data; CvMat* nearest=cvCreateMat(1,K,CV_32FC1); float result; //process file prs_image = preprocessing(img, size, size); //Set data IplImage* img32 = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 ); cvConvertScale(&prs_image, img32, 0.0039215, 0); cvGetSubRect(img32, &data, cvRect(0,0, size,size)); CvMat row_header, *row1; row1 = cvReshape( &data, &row_header, 0, 1 ); result=knn->find_nearest(row1,K,0,0,nearest,0); int accuracy=0; for(int i=0;i<K;i++){ if( nearest->data.fl[i] == result) accuracy++; } float pre=100*((float)accuracy/(float)K); if(showResult==1){ printf("|\t%.0f\t| \t%.2f%% \t| \t%d of %d \t| \n",result,pre,accuracy,K); printf(" ---------------------------------------------------------------\n"); } return result; }
All work or training and test is in basicOCR class, when we create a basicOCR instance then only we need call to classify function to classify our input image. Then we go to use basic Painter we create before in other tutorial to user interactivity to draw a image and classify it.
122 Comments to “Basic OCR in OpenCV”
Leave a Reply









You can replace four lines by one:
———————————-
if(j<10)
sprintf(file,”%s%d/%d0%d.pbm”,file_path, i, i , j);
else
sprintf(file,”%s%d/%d%d.pbm”,file_path, i, i , j);
———————————-
sprintf(file,”%s%d/%d%02d.pbm”,file_path, i, i , j);
———————————-
This is basic C know-how
Thanks Tönu XD
Hello damiles,
you are a good teacher for my favorite computer vision library
I am looking for a segmentation solution in opencv esp. for handwritten letters (described in http://yann.lecun.com/exdb/publis/#lecun-98 – e.g. Heuristic Over Segmentation). As I still am a beginner with c/c++ and also not very familiar with complex mathematical formulars, I’d like to know, if you know about any code example with this method – or if you could give some helpfull hints/starting points.
Any help would be great!
Thank you,
Qazi
Hey,
Thanks for your article.
However, I’m quite disappointed with this very line :
cvCopy(row1, &row, NULL);
which basically means that you train the classifier with the *hole image*.
How long did the knn classifier take for those 50 samples of 40² pixels ?
More importantly : which “high level” criterion would you recommend (area, spline contour parameters, moments…) ?
thanks!
Arnaud, you are the first who analize the code and give me good questions, and doubts or disagree, THANKS!!
First, for train classifier i use all image pixels, I know it’s no the best criterion to use, but i use it for explain the basic of pattern recognition in ocr.
Then in cvCopy(row1, &row, NULL); i copy all pixels of each image in a row for clasifiy, first i get a row of my matrix for training “trainData”
cvGetRow(trainData, &row, i*train_samples + j);
then when i have the pixels in a array (row1) copy it in a row, a pointer of traindata row.
cvCopy(row1, &row, NULL);
For knn classifier i take 40² samples. The 50 number i think you extract from function void basicOCR::test() in line 10 “for( j = 50;…” , in this fucntion i classify 50 random numbers to get a error, this function is only statistic of my classifier, where i can test the knn with 50 random numbers and get the error obtained.
And of course, the more important, the criterion, i recomended other criterion, no pixels, contour or moments, for select criterion for this case you must test all, moments, pixels, spline, area, eigvals… and select the best, and test diferents size of image and all sets you can think.
I expect respond all your questions. I’m not a expert.
Regards, David.
Hi
first of all thanks for this wonderful tutorial.
I am doing a project that needs to extract a string of numbers from an image. I can get an image where it only contains numbers.
My problem is how can i segment the image so i can get each number and test it against the sample?
You can use cvBlobd linbrary http://opencv.willowgarage.com/wiki/cvBlobsLib
Or user any segmentation type or use countours or similar.
Regards.
damiles, thanks for this great tutorial. it was actually very interesting and complete!
Hey…very nice tutorials. I was wondering if you have any tutorial on human action recognition as well…using some feature point descriptors and then training the model to classify the actions….
Hi,
i compiled and run your code in vc++. I changed your pbm images into jpg but it did not give the good result if i draw the single horizontal line which is classified as class 4. Is there any code change is needed? please help me
Thanks in advance
Manivasakan B
Sorry vasakan, i don’t understand what error you have… Can you explain it more detailed.
Thanks
Thank you very much for your reply.
if i draw the number from 0 to 9 that is classified from the corresponding class but i draw the shape that is not a number(differnet shape ) that shape also classified from the number class which give 100% accuracy.
how to solve this problem.?
Thanks
Manivasakan.
Ah, ok, the system is prepared to get only 10 classes, numbers, then you must only draw numbers, if you draw a no number, then the system get the more approximate number class you draw. If you want no number class you must train with a new class to get the no number objects, but it’s more complex.
The accuaracy is how many of k-neighbourds are of the same class knn algorithm select as better. Then if you have 5 of 10 neighbourds you have a 50%.
Solve proble of no class defined, you must work with a diferent learning algorithm as EM that is unsupervised in some cases more flexible.
Regards.
Thanks
I really appreciate your valuable help. currently i am working on pit pattern classification. i want classify pattern from the cancer tumor images now i try to classify the Type III L pit pattern which look like tubular shape. i expect help from you.
I have basic knowledge in image processing with opencv
thanks
Manivasakan .
vasakan, your project is more advanced than this example, but it can help to start. you must have a good database of image properties, and train you classifier with it, and create 2 class, the tumor cancer and no tumor cancer.
There are a lot of medical papers with this themes, i recomend you search technical papers.
Regards.
thanks
Is KNN classifier good choice for pattern classification. I have a good knowledge about Type III L pit pattern. I like to take contour points for processing. can you give your mail id?. i will send the image to you for your clarification.
Regards
Manivasakan.
KNN is good classifier for most task but there are more that you must check
Thank you very much.
i have a image 640X480 size and i want to create the template for 128X128 size using contour points in opencv.
can you suggest me good idea for template creation?
i have tried to convert the contour points manually from 640X480 to 128X128 but it did not give the good template.
Regards
Manivasakan.
First, try use non pixels as parameters, i use it because is the most simpy to understand. I suggest read scientific papers to see what is the best template, and algorithm. I can’t explain correctly what are the best for this computer vision tasks
Thank you very much for spending lot of time for me.
i will read good papers for best template and algorithm and then i will get back you.
Thanks
Manivasakan
Hi,
Have you seen my images? I want to create the class for that images. Can you give some idea for template creation. I going to create the two classes one for Type III L pit pattern another one for non Type III L.
Is there any tool for template creation?
Thanks
Manivasakan B.
Hi!
Can I use OpenCV in recognizing the font used in a document image? I am a noob in OpenCV. Can you give me some tips to make my project possible?
Thank you in advance!