Energy-Based Learning Models for Image Analysis and Recognition

Yann LeCun
Courant Institute of Mathematical Sciences, NYU


Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of those variables. Given a set of observed variables X (e.g. the pixels from a robot camera), and a set of variables to be predicted Y (e.g. the steering control of the robot), making a decision consists in finding a value of Y that minimizes the energy function E(Y,X). A properly trained model will assign low energies to configurations of X and Y that are compatible (turning left when an obstacle is present in the right side of the visual field), and high energies to incompatible configurations (turning toward the obstacle). Training an EBM consists in finding an energy function that minimizes a loss functional averaged over a training set. We discuss conditions that the loss functional must satisfy so that its minimization will cause the machine to approach the desired behavior. The main advantages of EBMs over traditional probabilistic approaches is that there is no need for computing normalization terms that may be intractable. Additionally, the absence of normalization gives us complete freedom on the parameterization the energy. In particular, we will combine EBM training with convolutional networks, a biologically-inspired trainable architecture designed to process images with invariance to geometric distortions. We will describe several trainable vision systems based on these concepts, including: - A system that recognizes handwritten digit strings - A real-time system for simultaneously detecting human faces in images and estimating their pose. - A face verification system based on a trainable similarity metric. - A mobile robot trained to emulate a human driver so as to avoid obstacles in natural environment by relying solely on camera input. - A real-time system for detecting and recognizing generic objects such as vehicles, people, airplanes, and animals, with full invariance to pose, illumination, and clutter. Live, real-time demonstrations, and videos of these systems will be shown. This seminar is also part of RUMBA series at