Detection of Mathematical Equations using Optical Character Recognition

    1. Motivation

    (i) To detect mathematical equations from images in textbooks or papers
    (ii) To check the validity of a mathematical equation by using some computational engine such as Wolfram Alpha

    2. Strategy

    Step 1 - Processing of Input Image
    Step 2 - Division to rows and columns
    Step 3 - Classify every symbol image extracted using SVM (Support Vector Machine) classifier
    Step 4 - Result Processing


    Figure: Framework of the proposed method.

    2.1. Processing of Input Image

    (i) Input image is converted to gray scale image
    (ii) Gray scale image is converted to BW image to derive intensity based image
    (Only the equation in the image by removal of redundant background information)

    2.2. Division to rows and columns

    Processed input image is divided to rows and columns to isolate every single mathematical symbol.


    Figure: Division of the processed input image to rows and columns.

    2.3. Classify every symbol image extracted using SVM (Support Vector Machine) classifier

    A SVM classifier is used to classify each individual symbol image extracted from Step 2.
    In order to train the SVM classifier, a database with 50 train images for every mathematical symbol is developed and used for training.
    The kernel used for the SVM is RBF (Radial Basis Function).
    Every symbol image extracted is then classified using the trained SVM model to the predefined mathematical symbol, digit or letter.
    The list of mathematical symbols, digits and letters defined are as follows:


    2.4. Result Processing

    Result received from the Steps 1-3 is the initial form of the output
    It has to be further processed to form a complete representation of the equation
    Example Image:

    Initial Output:
    d:-:dx:(:x^(2)+3x^(3:)-:x^(2)e^(x2)+5x^(3)e^(x:):):
    Processed Ouptut:
    ((d)/(dx) )(((x^(2)+3x^(3))/(x^(2)e^(x^2)+5x^(3)e^(x))) )

    3. Experiments