Learning Local Representations of Images and Text

Fuwen Tan
Images and text inherently exhibit hierarchical structures, e.g. scenes built from objects, sentences built from words. In many computer vision and natural language processing tasks, learning accurate prediction models requires analyzing the correlation of the local primitives of both the input and output data. In this thesis, we develop techniques for learning local representations of images and text and demonstrate their effectiveness on visual recognition, retrieval, and synthesis. In particular, the thesis includes three primary...
This data repository is not currently reporting usage information. For information on how your repository can submit usage information, please see our documentation.