Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Advances in artificial intelligence and multimodal sensing technologies have transformed activity understanding and behavior monitoring in complex, real-world environments. This dissertation presents novel frameworks for spatiotemporal multimodal representation learning, focusing on precision livestock monitoring by integrating deep learning with multimodal data fusion to analyze behaviors using audio, RGB, depth, and kinematic data. It introduces DeepMSRF, a multimodal speaker recognition framework that fuses audio and visual streams via advanced feature selection, achieving superior accuracy over unimodal baselines. A low-cost 3D monitoring system for broiler chickens, utilizing affordable RGB-D sensors and custom hardware, enables continuous, non-invasive tracking in commercial farms. A dedicated pipeline for automated 3D gait scoring employs synchronized RGB-D video, pose estimation, and segmentation to provide accurate, high-throughput welfare assessments, outperforming manual methods. Separately, a zero-shot and transformer-based approach for 3D footpad and gait scoring recognizes unseen behaviors and conditions, enhancing adaptability across diverse farm settings. Additionally, mask-based multimodal action recognition pipelines leverage RGB-D data, zero-shot segmentation, and spatiotemporal models to classify fine-grained behaviors like feeding and drinking, offering insights for farm management. Comprehensive experiments validate these approaches, while new datasets, open-source tools, and evaluation protocols advance activity recognition, multimodal fusion, and precision animal welfare monitoring, supporting intelligent, ethical, and sustainable livestock production.

Details

PDF

Statistics

from
to
Export
Download Full History