Adaptive Deep Representations for Text and Video Understanding

Dr. Renqiang Min
NEC Laboratories America


Abstract

Deep learning algorithms have achieved revolutionary successes in many applications including computer vision, natural language processing, and biomedical informatics, leveraging a huge amount of labeled training data to generate powerful distributed representations. Rather than training standard end-to-end supervised models, learning adaptive models capable of understanding different input contexts is one-step closer towards building self-aware machine reasoning systems. In this talk, I will discuss learning adaptive deep representations of texts, videos, and their interactions. First I will talk about how context-dependent representation learning helps a question answering system become self-aware to handle ambiguous situations. Then I will show how to generate adaptive spatiotemporal representations for translating videos to natural language descriptions. Finally, I will present our recent work on video generation from text capturing semantically meaningful context-dependent representations.