Abstract

Most work on temporal action detection is formulated as an offline problem, in which the start and end times of actions are determined after the entire video is fully observed. However, important real-time applications including surveillance and driver assistance systems require identifying actions as soon as each video frame arrives, based only on current and historical observations. In this paper, we propose a novel framework, the Temporal Recurrent Network (TRN), to model greater temporal context of each frame by simultaneously performing online action detection and anticipation of the immediate future. At each moment in time, our approach makes use of both accumulated historical evidence and predicted future information to better recognize the action that is currently occurring, and integrates both of these into a unified end-to-end architecture. We evaluate our approach on two popular online action detection datasets, HDD and TVSeries, as well as another widely used dataset, THUMOS'14. The results show that TRN significantly outperforms the state-of-the-art.


Original document

The different versions of the original document can be found in:

http://dx.doi.org/10.1109/iccv.2019.00563
https://openaccess.thecvf.com/content_ICCV_2019/papers/Xu_Temporal_Recurrent_Networks_for_Online_Action_Detection_ICCV_2019_paper.pdf,
https://arxiv.org/abs/1811.07391,
https://arxiv.org/pdf/1811.07391.pdf,
http://openaccess.thecvf.com/content_ICCV_2019/html/Xu_Temporal_Recurrent_Networks_for_Online_Action_Detection_ICCV_2019_paper.html,
https://ui.adsabs.harvard.edu/abs/2018arXiv181107391X/abstract,
https://academic.microsoft.com/#/detail/2989506443
Back to Top

Document information

Published on 01/01/2018

Volume 2018, 2018
DOI: 10.1109/iccv.2019.00563
Licence: CC BY-NC-SA license

Document Score

0

Views 5
Recommendations 0

Share this document

claim authorship

Are you one of the authors of this document?