Basic OCR in OpenCV
Update!. Demo is now with CMake, the cross-platform, open-source build system.
In this tutorial we go to create a basic number OCR. It consist to classify a handwrite number into his class.
To do it, we go to use all we learn in before tutorials, we go to use a simple basic painter and the basic pattern recognition and classification with openCV tutorial.
In a typical pattern recognition classifier consist in three modules:
Preprocessing: in this module we go to process our input image, for example size normalize, convert color to BN…
Feature extraction: in this module we convert our image processed to a characteristic vector of features to classify, it can be the pixels matrix convert to vector or get contour chain codes data representation
Classification module get the feature vectors and train our system or classify an input feature vector with a classify method as knn.
In this basic OCR we go to use this graph:
Where we get a train set and test set of image to train and test our classifier method (knn)
We have a 1000 handwrite images, 100 images of each number. We get 50 images of each number (class) to train and other 50 to test our system.
Then the first work we do is pre-process all train image, to do it we create a preprocessing function. In this function we get a image and a new width and height we want as result of preprocessing, then the function return a normalized size with bounding box image. You can see more clear the process in this graph:
Pre-processing code:
void findX(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->width; i++){
cvGetCol(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max= i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}
void findY(IplImage* imgSrc,int* min, int* max){
int i;
int minFound=0;
CvMat data;
CvScalar maxVal=cvRealScalar(imgSrc->width * 255);
CvScalar val=cvRealScalar(0);
//For each col sum, if sum < width*255 then we find the min
//then continue to end to search the max, if sum< width*255 then is new max
for (i=0; i< imgSrc->height; i++){
cvGetRow(imgSrc, &data, i);
val= cvSum(&data);
if(val.val[0] < maxVal.val[0]){
*max=i;
if(!minFound){
*min= i;
minFound= 1;
}
}
}
}
CvRect findBB(IplImage* imgSrc){
CvRect aux;
int xmin, xmax, ymin, ymax;
xmin=xmax=ymin=ymax=0;
findX(imgSrc, &xmin, &xmax);
findY(imgSrc, &ymin, &ymax);
aux=cvRect(xmin, ymin, xmax-xmin, ymax-ymin);
//printf("BB: %d,%d - %d,%d\n", aux.x, aux.y, aux.width, aux.height);
return aux;
}
IplImage preprocessing(IplImage* imgSrc,int new_width, int new_height){
IplImage* result;
IplImage* scaledResult;
CvMat data;
CvMat dataA;
CvRect bb;//bounding box
CvRect bba;//boundinb box maintain aspect ratio
//Find bounding box
bb=findBB(imgSrc);
//Get bounding box data and no with aspect ratio, the x and y can be corrupted
cvGetSubRect(imgSrc, &data, cvRect(bb.x, bb.y, bb.width, bb.height));
//Create image with this data with width and height with aspect ratio 1
//then we get highest size betwen width and height of our bounding box
int size=(bb.width>bb.height)?bb.width:bb.height;
result=cvCreateImage( cvSize( size, size ), 8, 1 );
cvSet(result,CV_RGB(255,255,255),NULL);
//Copy de data in center of image
int x=(int)floor((float)(size-bb.width)/2.0f);
int y=(int)floor((float)(size-bb.height)/2.0f);
cvGetSubRect(result, &dataA, cvRect(x,y,bb.width, bb.height));
cvCopy(&data, &dataA, NULL);
//Scale result
scaledResult=cvCreateImage( cvSize( new_width, new_height ), 8, 1 );
cvResize(result, scaledResult, CV_INTER_NN);
//Return processed data
return *scaledResult;
}
We use the function getData of basicOCR class to create the train data and train classes, this function get all images under OCR folder to create this train data, the OCR forlder is structured with 1 folder to each class and each file have are pbm files with this name cnn.pbm where c is the class {0..9} and nn is the number of image {00..99}
Each image we get is pre-processed and then convert the data in a feature vector we use.
basicOCR.cpp getData code:
void basicOCR::getData()
{
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
for(i =0; i<classes; i++){
for( j = 0; j< train_samples; j++){
//Load file
if(j<10)
sprintf(file,"%s%d/%d0%d.pbm",file_path, i, i , j);
else
sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);
//Set class label
cvGetRow(trainClasses, &row, i*train_samples + j);
cvSet(&row, cvRealScalar(i));
//Set data
cvGetRow(trainData, &row, i*train_samples + j);
IplImage* img = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
//convert 8 bits image to 32 float image
cvConvertScale(&prs_image, img, 0.0039215, 0);
cvGetSubRect(img, &data, cvRect(0,0, size,size));
CvMat row_header, *row1;
//convert data matrix sizexsize to vecor
row1 = cvReshape( &data, &row_header, 0, 1 );
cvCopy(row1, &row, NULL);
}
}
}
After processed and get train data and classes whe then train our model with this data, in our sample we use knn method then:
knn=new CvKNearest( trainData, trainClasses, 0, false, K );
Then we now can test our model, and we can use the test result to compare to another methods we can use, or if we reduce the image scale or similar. There are a function to create the test in our basicOCR class, test function.
This function get the other 500 samples and classify this in our selected method and check the obtained result.
void basicOCR::test(){
IplImage* src_image;
IplImage prs_image;
CvMat row,data;
char file[255];
int i,j;
int error=0;
int testCount=0;
for(i =0; i<classes; i++){
for( j = 50; j< 50+train_samples; j++){
sprintf(file,"%s%d/%d%d.pbm",file_path, i, i , j);
src_image = cvLoadImage(file,0);
if(!src_image){
printf("Error: Cant load image %s\n", file);
//exit(-1);
}
//process file
prs_image = preprocessing(src_image, size, size);
float r=classify(&prs_image,0);
if((int)r!=i)
error++;
testCount++;
}
}
float totalerror=100*(float)error/(float)testCount;
printf("System Error: %.2f%%\n", totalerror);
}
Test use the classify function that get image to classify, process image, get feature vector and classify it with a find_nearest of knn class. This function we use to classify the input user images:
float basicOCR::classify(IplImage* img, int showResult)
{
IplImage prs_image;
CvMat data;
CvMat* nearest=cvCreateMat(1,K,CV_32FC1);
float result;
//process file
prs_image = preprocessing(img, size, size);
//Set data
IplImage* img32 = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
cvConvertScale(&prs_image, img32, 0.0039215, 0);
cvGetSubRect(img32, &data, cvRect(0,0, size,size));
CvMat row_header, *row1;
row1 = cvReshape( &data, &row_header, 0, 1 );
result=knn->find_nearest(row1,K,0,0,nearest,0);
int accuracy=0;
for(int i=0;i<K;i++){
if( nearest->data.fl[i] == result)
accuracy++;
}
float pre=100*((float)accuracy/(float)K);
if(showResult==1){
printf("|\t%.0f\t| \t%.2f%% \t| \t%d of %d \t| \n",result,pre,accuracy,K);
printf(" ---------------------------------------------------------------\n");
}
return result;
}
All work or training and test is in basicOCR class, when we create a basicOCR instance then only we need call to classify function to classify our input image. Then we go to use basic Painter we create before in other tutorial to user interactivity to draw a image and classify it.
Demo Source
Demo Source with CMake build
164 Comments to “Basic OCR in OpenCV”
Leave a Reply









Hi ,
I m having dificulties in compiling /binding your package on MAC OS ,, I have OpenCV done ,,, any sugestions ?
Thaks,
Hi sergio, what is the problem, what is the error you get? I compile it in mac os perfectly…. I can create a cmake cross compiler if better for you.
This might be a stupid question, but I’m trying to understand how and where in your code you extract features from the images in your samples. More specifically how are you extracting the features and what features are you looking at for the images of the numbers? How are you evaluating these features when comparing it to the image file you are testing the accuracy of the classifier on?
There are no stupids questions.
The features are the pixels of image.
This is in the “basicOCR::getData()” function
I use trainData variable for store the features.
I use trainClasses for store the class identifier.
First you get the row we go to use of trainData variable
cvGetRow(trainData, &row, i*train_samples + j);
Then get the image data
IplImage* img = cvCreateImage( cvSize( size, size ), IPL_DEPTH_32F, 1 );
And convert it into 32 float image
cvConvertScale(&prs_image, img, 0.0039215, 0);
I convert the image matrix into a vector data
cvGetSubRect(img, &data, cvRect(0,0, size,size));
CvMat row_header, *row1;
row1 = cvReshape( &data, &row_header, 0, 1 );
We store the data into row that is pointer of variable row of trainData
cvCopy(row1, &row, NULL);
Hi damiles,
I really appreciate your example code.
I am a newbie of OpenCV. I were assigned a project to separate the text from the image, then to recognize the text and also analysis the image structure(with text removed). But I really have no any idea how to separate the text from image. Any suggestions or example code?
Thank you in advance!
hello,
can help me?
I yet study openCV and I would compile your application with VC++ (MS-Visual Studio C++), do you have any example?
Hi Togan, i don’t have any example to separate the image to the text, sorry.
Has, it’s no dificult compile with VC++, you only need set correctly the libraries. I go to create a cmake for this example, for correct cross compiling.
hello,
i just downloaded openCV and was trying to get a hang of it. I was going through the OCR code, becuase I want to work on OCR too….when i checked the findX and findY functions it seems to me, this will work for only text on white backgrounds. Am i right? how do i work around that? also, how do i get the color of a pixel eg i want to test if (20,30) is black?
yes you’re right.
For test the color pixel of x,y position you can visit the faq documentation you get with opencv source (doc/faq.htm)
in faq said:
How to access image pixels
(The coordinates are 0-based and counted from image origin, either top-left (img->origin=IPL_ORIGIN_TL) or bottom-left (img->origin=IPL_ORIGIN_BL)
* Suppose, we have 8-bit 1-channel image I (IplImage* img):
I(x,y) ~ ((uchar*)(img->imageData + img->widthStep*y))[x]
* Suppose, we have 8-bit 3-channel image I (IplImage* img):
I(x,y)blue ~ ((uchar*)(img->imageData + img->widthStep*y))[x*3]
I(x,y)green ~ ((uchar*)(img->imageData + img->widthStep*y))[x*3+1]
I(x,y)red ~ ((uchar*)(img->imageData + img->widthStep*y))[x*3+2]
e.g. increasing brightness of point (100,100) by 30 can be done this way:
CvPoint pt = {100,100};
((uchar*)(img->imageData + img->widthStep*pt.y))[pt.x*3] += 30;
((uchar*)(img->imageData + img->widthStep*pt.y))[pt.x*3+1] += 30;
((uchar*)(img->imageData + img->widthStep*pt.y))[pt.x*3+2] += 30;
or more efficiently
CvPoint pt = {100,100};
uchar* temp_ptr = &((uchar*)(img->imageData + img->widthStep*pt.y))[x*3];
temp_ptr[0] += 30;
temp_ptr[1] += 30;
temp_ptr[2] += 30;
* Suppose, we have 32-bit floating point, 1-channel image I (IplImage* img):
I(x,y) ~ ((float*)(img->imageData + img->widthStep*y))[x]
* Now, the general case: suppose, we have N-channel image of type T:
I(x,y)c ~ ((T*)(img->imageData + img->widthStep*y))[x*N + c]
or you may use macro CV_IMAGE_ELEM( image_header, elemtype, y, x_Nc )
I(x,y)c ~ CV_IMAGE_ELEM( img, T, y, x*N + c )
There are functions that work with arbitrary (up to 4-channel) images and matrices (cvGet2D, cvSet2D), but they are pretty slow.
Hi,
Could you please explain your getData function to me….I’m not understanding the functions you’ve used and why you’ve used them.
1.How are you exactly setting trainData? Are trainClasses and trainData initialized to something or do I just declare them as CvMat* variables?
2.I know cvGetRow gets the row value from a matrix but I what is this line doing exactly:
cvGetRow(trainData, &row, i*train_samples + j);
Hi again,
I got it going….took me time to understand but finally did…thanks for the earlier reply btw…. just a lingering doubt, everytime I run the program do I need to train it?
yes, you need do this if you don’t save the train data, because this information you need for classify.
You can save this data for no process the train images with opencv save data functions.
Sir,
I have to train a set of retinal images for classifying another test set. Can you explain the working of knn classifier and cvNearest method.
Thanking You
Hi Rahul, you can see the before post, in this post i explain a simple demo about knn, here is the link http://blog.damiles.com/?p=84 (The basic patter recognition and classification with openCV)
Dear damiles
do you have a c# version of this code??
thanks
No, i don’t have a c# version of this.
Hello,
Thanks for a good post! I’ve downloaded the basicOCR.tar.gz however the makefile is empty. Is this a mistake?
/Karl
KarKrukow, there are a ORCbuild.sh for comile.
Ok thanks for helping out
It is probably just me that doesn’t understand what it needs to compile. When I run OCRbuild.sh I get:
krukow:~/libraries/ocr/basicOCR/basicOCR$ ./OCRbuild.sh
i686-apple-darwin9-g++-4.0.1: -lcxcore: linker input file unused because linking not done
i686-apple-darwin9-g++-4.0.1: -lcv: linker input file unused because linking not done
i686-apple-darwin9-g++-4.0.1: -lhighgui: linker input file unused because linking not done
i686-apple-darwin9-g++-4.0.1: -lcvaux: linker input file unused because linking not done
i686-apple-darwin9-g++-4.0.1: -lml: linker input file unused because linking not done
i686-apple-darwin9-g++-4.0.1: -lcxcore: linker input file unused because linking not done
i686-apple-darwin9-g++-4.0.1: -lcv: linker input file unused because linking not done
i686-apple-darwin9-g++-4.0.1: -lhighgui: linker input file unused because linking not done
i686-apple-darwin9-g++-4.0.1: -lcvaux: linker input file unused because linking not done
i686-apple-darwin9-g++-4.0.1: -lml: linker input file unused because linking not done
i686-apple-darwin9-g++-4.0.1: preprocessing.o: linker input file unused because linking not done
ld: can’t open output file for writing: OCR, errno=21
collect2: ld returned 1 exit status
krukow:~/libraries/ocr/basicOCR/basicOCR$
Any chance you can see the problem?
Thanks,
- Karl
I’m using opencv1.1.pre1, btw.
/K
hi damiles how r u, im a studing computer science i need some help in a program is like ur program but i have some difficulties, could u help me plz??
How I can help you angela?
befor, could u give me ur mail to contact with u cuz it will be difficult to talk there
Sir,
May i know how to compile using ORCbuild.sh cz i nvr use this kind of compilation method before.
regards,
wei
This is a simply shell script for unix system, it only execute this commands:
g++ -ggdb `pkg-config opencv –cflags –libs` preprocessing.c -c
g++ -ggdb `pkg-config opencv –cflags –libs` preprocessing.o basicOCR.cpp -c
g++ -ggdb `pkg-config opencv –cflags –libs` preprocessing.o basicOCR.o main.c -o OCR
Sir,
I get the compilation method already.
May i know that if i wan do text recognition in real time, then how can i extract the text feature from camera captured image.
regards,
wei
Very good info. Thank you Damiles.
hi, do know how can i connect a scanner to opencv? for scanned images to be processed in a program, thanks
christian, you must save the scanner image data to read it in opencv
Then you must have access to scanner with the scanner’s library to get the image data to pass as a iplimage to opencv.
Regards
hello sir, im a friend of christian’s. did you mean the TWAIN library?
thank u for d reply.
but do u have an idea how can i process multiple images one after another without saving the scanned images?
troi twain for example is good.
christian, if you use twain, you get a twain image structure in memory that you can save or convert to another structure as iplimage for opencv, when you finish to use this image memory you can save this image or release and capture another….
Hi Damiles;
first I want to thank you for this code and also the idea.
I read the forum from the beginning to end, you answered one question about how to compile this code and you said using VC++, do you mean visual C ++?
can you give me link to download the compile for this language?
I appreciated your reply.
thanks.
are you familiar with computer vision? what library could you suggest? thank you for the info
christian, in this web you can find more links (more are broken but you can look for in web search) http://www.cs.cmu.edu/~cil/v-source.html
I like opencv as good library, but in the link you can get more libraries.
Ali, search in google for vc++ and get the msdn web page and look for the vc++ and download the free visual studio express
Regards. David
Thanks for reply Damiles.
I download the VC++ visual studio express, but the debug button (F5) is disabled. I couldn’t compile the files, any idea.
note that I am using windows vista SP1, and visual studio 2008.
hi damils
thanks for your demo source it really helped me understanding more about opencv
but i have some doubts..
what is the signiicance of using 0.0039215 in the cvConvertScale function?
i understand that the scale parameter of the function multiplies the value of each pixel and the shift parameter adds to every pixel
but how multiplying by 0.0039215 gets the image to the 32 floating point image?? i want to know how to get this number
and if i am using jpeg images for the sample data than pbm files, do i have to use the cvConvertScale function?
Yes the 0.0039215 is for convert 32 floating point image.
If you use a jpeg you are importing the image to 255 and you need to convert to 255 too.
Regards.
hi Damiles
thanks for the immediate reply..
but i tried it with the following code
//for accessing file paths
sprintf(file,”%s%d/%d0%d.jpg”,file_path, i, i , j);
and
//loading image in CV_LOAD_GRAYSCALE
src_image = cvLoadImage(file,0);
and i didnt change the cvConvertScale function
cvConvertScale(&prs_image, img, 0.0039215, 0);
but i am getting number 0 for all the possible inputs in the loadwindow from main any ideas?? can you give me an exact idea for using jpeg for sample data??
thanks
nivasan
nivasan, you must debug your application, and ensure you are loading correctly images and you are not overwritting pointers or no load correctly images, more times we don’t load correctly the image and then we have a array filled with 0’s….
oh it was my mistake
well i only gave 4 training data and two classes and my k was 10 so obviously it will allways return the first value as the nearest match ( at least i hope this is the explanation) i reduced the k to 5 so it gave me some ok results
sorry for bad English
thanks for your support and guidance,
Nivasan Sharma
Hi Damiles,
I try to run your OCR program but there is an errors such as:
error C2258: illegal pure syntax, must be ‘= 0′
error C2252: ‘K’ : pure specifier can only be specified error C2065: ‘K’ : undeclared identifier
basicOCR.obj – 3 error(s), 0 warning(s)
Please help me.
Amin
Hi again.
I try run again and the error reduce from 3 to 2 errors:
which are:
error C2252: ‘K’ : pure specifier can only be specified for functions
error C2065: ‘K’ : undeclared identifier
basicOCR.obj – 2 error(s), 0 warning(s)
Please help me.
Thanks again.
Amin
Hi damiles
in the preprocessing file you can use the cvfindcontours and take the bounding box easily rather than using the findx and findy and findbb functions
please clarify me if i am wrong.
Regards
Nivasan Sharma
nivasan, is correct, you can use findcontours…. And use this contours to do the classify, but you mus train with contours too.
Hello!
have you tried statistical moments (cvMoments) as features?
By using moments you dont have to finding the smallest bounding box, because they are scale and place invariant.
it would be interesting how the performance of your pretty straightforward approach differs to the complicated one.
hi phen, thanks for your comment.
thanks for this great blog!
keep us updated if you continue to work on it
Hi! I was surfing and found your blog post… nice! I love your blog.
Cheers! Sandra. R.
Hi
I am using Dev-cpp and I am trying to compile the program. I recieve following error :
File format not recognized
ld returned 1 exit status
D:\Panto\OCR\Makefile.win [Build Error] [OCR.exe] Error 1
Anybody an idea ?