Files
Abstract
While datasets are foundational to the development of action recognition systems, data work is often marginalized in favor of model innovation. In this thesis, I studied how data work is portrayed by action recognition researchers via the introduction and discussions of datasets containing videos of humans in action recognition research publications. I analyzed the use of 962 unique action recognition datasets across 1,309 publications. These publications were collected from three top computer vision conferences---CVPR, ICCV, and ECCV. This study finds that human subjects are frequently abstracted through metadata and discussed with minimal contextual detail. Emphasis is typically placed on attributes like scale, novelty, and technical difficulty, while demographic diversity and cultural specificity are overlooked. Additionally, the absence of a standardized definition of “action” limits the ability to consistently design and evaluate these datasets. These patterns reflect broader cultural values within computer vision that privilege technical innovation over human-centered considerations.