%0 Conference Proceedings %A Martin, Pierre-Etienne %A Benois-Pineau, Jenny %A Péteri, Renaud %A Morlier, Julien %+ Department of Comparative Cultural Psychology, Max Planck Institute for Evolutionary Anthropology, Max Planck Society %T Three-stream 3D/1D CNN for fine-grained action classification and segmentation in table tennis : %G eng %U https://hdl.handle.net/21.11116/0000-000A-4B31-B %R 10.1145/3475722.3482793 %F OTHER: hal-03353945 %D 2021 %B MMSports '21 %Z date of event: 2021-10-20 - 2021-10-24 %C Chengdu, China (Online) %X This paper proposes a fusion method of modalities extracted from video
through a three-stream network with spatio-temporal and temporal convolutions
for fine-grained action classification in sport. It is applied to TTStroke-21
dataset which consists of untrimmed videos of table tennis games. The goal is
to detect and classify table tennis strokes in the videos, the first step of a
bigger scheme aiming at giving feedback to the players for improving their
performance. The three modalities are raw RGB data, the computed optical flow
and the estimated pose of the player. The network consists of three branches
with attention blocks. Features are fused at the latest stage of the network
using bilinear layers. Compared to previous approaches, the use of three
modalities allows faster convergence and better performances on both tasks:
classification of strokes with known temporal boundaries and joint segmentation
and classification. The pose is also further investigated in order to offer
richer feedback to the athletes.
%K Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Artificial Intelligence, cs.AI,Computer Science, Human-Computer Interaction, cs.HC,Computer Science, Learning, cs.LG,Computer Science, Multimedia, cs.MM %B MMSports'21: Proceedings of the 4th International Workshop on Multimedia Content Analysis in Sports %P 35 - 41