Detection of Mathematical Equations using Optical Character Recognition
1. Motivation
(i) To detect mathematical equations from images in textbooks or papers
(ii) To check the validity of a mathematical equation by using some computational engine such as Wolfram Alpha
2. Strategy
Step 1 - Processing of Input Image
Step 2 - Division to rows and columns
Step 3 - Classify every symbol image extracted using SVM (Support Vector Machine) classifier
Step 4 - Result Processing
Figure: Framework of the proposed method.
2.1. Processing of Input Image
(i) Input image is converted to gray scale image
(ii) Gray scale image is converted to BW image to derive intensity based image
(Only the equation in the image by removal of redundant background information)
2.2. Division to rows and columns
Processed input image is divided to rows and columns to isolate every single mathematical symbol.
Figure: Division of the processed input image to rows and columns.
2.3. Classify every symbol image extracted using SVM (Support Vector Machine) classifier
A SVM classifier is used to classify each individual symbol image extracted from Step 2.
In order to train the SVM classifier, a database with 50 train images for every mathematical symbol is developed and used for training.
The kernel used for the SVM is RBF (Radial Basis Function).
Every symbol image extracted is then classified using the trained SVM model to the predefined mathematical symbol, digit or letter.
The list of mathematical symbols, digits and letters defined are as follows:
2.4. Result Processing
Result received from the Steps 1-3 is the initial form of the output
It has to be further processed to form a complete representation of the equation
Example Image: