Spatiotemporal Multimodal Representation Learning for Activity Understanding and Behavior Monitoring

Asali, Ehsan

Spatiotemporal Multimodal Representation Learning for Activity Understanding and Behavior Monitoring

Asali, Ehsan

2025

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Files

Abstract

Advances in artificial intelligence and multimodal sensing technologies have transformed activity understanding and behavior monitoring in complex, real-world environments. This dissertation presents novel frameworks for spatiotemporal multimodal representation learning, focusing on precision livestock monitoring by integrating deep learning with multimodal data fusion to analyze behaviors using audio, RGB, depth, and kinematic data. It introduces DeepMSRF, a multimodal speaker recognition framework that fuses audio and visual streams via advanced feature selection, achieving superior accuracy over unimodal baselines. A low-cost 3D monitoring system for broiler chickens, utilizing affordable RGB-D sensors and custom hardware, enables continuous, non-invasive tracking in commercial farms. A dedicated pipeline for automated 3D gait scoring employs synchronized RGB-D video, pose estimation, and segmentation to provide accurate, high-throughput welfare assessments, outperforming manual methods. Separately, a zero-shot and transformer-based approach for 3D footpad and gait scoring recognizes unseen behaviors and conditions, enhancing adaptability across diverse farm settings. Additionally, mask-based multimodal action recognition pipelines leverage RGB-D data, zero-shot segmentation, and spatiotemporal models to classify fine-grained behaviors like feeding and drinking, offering insights for farm management. Comprehensive experiments validate these approaches, while new datasets, open-source tools, and evaluation protocols advance activity recognition, multimodal fusion, and precision animal welfare monitoring, supporting intelligent, ethical, and sustainable livestock production.

Details

Record ID

27531

Record Created

2025-09-23

Title

Spatiotemporal Multimodal Representation Learning for Activity Understanding and Behavior Monitoring

Author

Asali, Ehsan

Contributor

Li, Guoming GL Advisor (University of Georgia)
Liu, Tianming TL Advisor (University of Georgia)
Yuan, Geng GY Committee Member (University of Georgia)
Niu, Wei WN Committee Member (University of Georgia)

Department

Computer science

Date

2025-08

Content Type

Dissertation

Pagination

187

File Format

pdf

Language

English

Degree Type

Doctor of Philosophy (PHD)

Name of Granting Institution

University of Georgia

Year Degree Granted

2025-08

Keywords

Activity Monitoring and Analysis; Deep Learning; Multi-Modal Data Fusion; Object Detection; Spatiotemporal Feature Extraction; Zero-Shot Image Segmentation

Record Appears in

Electronic Theses and Dissertations > Doctoral Dissertation
All Resources
Doctoral

System Control Number

https://www.proquest.com/LegacyDocView/DISSNUM/32120112

PDF

Statistics

Download Full History