Abstract | A tree-based approach to integrated action segmentation,localization and recognition is proposed. An action is represented as a
sequence of joint hog-flow descriptors extracted independently from each
frame. During training, a set of action prototypes is first learned based
on a k-means clustering, and then a binary tree model is constructed
from the set of action prototypes based on hierarchical k-means cluster-
ing. Each tree node is characterized by a shape-motion descriptor and a
rejection threshold, and an action segmentation mask is defined for leaf
nodes (corresponding to a prototype). During testing, an action is local-
ized by mapping each test frame to a nearest neighbor prototype using
a fast matching method to search the learned tree, followed by global fil-
tering refinement. An action is recognized by maximizing the sum of the
joint probabilities of the action category and action prototype over test
frames. Our approach does not explicitly rely on human tracking and
background subtraction, and enables action localization and recognition
in realistic and challenging conditions (such as crowded backgrounds).
Experimental results show that our approach can achieve recognition
rates of 100% on the CMU action dataset and 100% on the Weizmann
dataset.
|