Vision and language: an application-oriented perspective

Sharma, Karan

Vision and language: an application-oriented perspective

Sharma, Karan

2018

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Vision and language are two of the most critical human faculties. If we are to develop more useful Artificial Intelligence (AI) systems, these modalities will need to work in tandem. Although we are still far from the ultimate goal of synergetic integration of vision and language, several practical applications lying at the intersection of computer vision (CV) and natural language processing (NLP) have experienced a huge upsurge in recent times. This upsurge in the integration of vision and language has been accelerated by recent advances in deep learning and ready availability of both, benchmark and real-world datasets. In this dissertation, we address a few interesting and important applications, such as automated image captioning and classification of objects and actions in images, that lie at the intersection of CV and NLP and have a significant potential impact in important problem domains such as information retrieval and product marketing.First, we propose an approach to speed up image caption retrieval guided by the top object detected in an image. Second, we propose an approach to classify an action in an image without executing explicit action classifiers on the image. In this approach, we first detect objects in an image and then, with the aid of top objects and associated word embeddings obtained via training on a natural language corpus, we infer the the most probable action in the image. Next, we propose a model to guess objects in an image in situations where the datasets for training classifiers for such objects are unavailable. Finally, we conduct a similarity study on consumer products using both visual and textual features. We believe that these studies and the proposed models will provide practitioners with insights that they could apply in designing AI systems for specific applications.

Details

Record ID

13693

Record Created

2024-12-05

Title

Vision and language: an application-oriented perspective

Author

Sharma, Karan

Contributor

Bhandarkar, Suchendra Advisor
Liu, Tianming Committee Member
Rasheed, Khaled Committee Member

College or School

College of Engineering

Department

School of Computing

Date

2018

Publisher

University of Georgia

Content Type

Dissertation

Language

English

Dissertation/ Thesis Note

Doctoral

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia, Summer 2018

Year Degree Granted

2018

Keywords

Deep Learning, Computer Vision, Natural Language Processing

Record Appears in

College, School, or Unit > College of Engineering > School of Computing
Electronic Theses and Dissertations > Doctoral Dissertation
All Resources
Doctoral

System Control Number

9949334247102959

PDF

Statistics

Download Full History