Recurrent bag-of-features for visual information analysis

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review

  • Marios Krestenitis, Aristotle University of Thessaloniki
  • ,
  • Nikolaos Passalis, Aristotle University of Thessaloniki
  • ,
  • Alexandros Iosifidis
  • Moncef Gabbouj, Tampereen Yliopisto
  • ,
  • Anastasios Tefas, Aristotle University of Thessaloniki

Deep Learning (DL) has provided powerful tools for visual information analysis. For example, Convolutional Neural Networks (CNNs) are excelling in complex and challenging image analysis tasks by extracting meaningful feature vectors with high discriminative power. However, these powerful feature vectors are crushed through the pooling layers of the network, that usually implement the pooling operation in a less sophisticated manner. This can lead to significant information loss, especially in cases where the informative content of the data is sequentially distributed over the spatial or temporal dimension, e.g., videos, which often require extracting fine-grained temporal information. A novel stateful recurrent pooling approach, that can overcome the aforementioned limitations, is proposed in this paper. The proposed method is inspired by the well-known Bag-of-Features (BoF) model, but employs a stateful trainable recurrent quantizer, instead of plain static quantization, allowing for efficiently processing sequential data and encoding both their temporal, as well as their spatial aspects. The effectiveness of the proposed Recurrent BoF model to enclose spatio-temporal information compared to other competitive methods is demonstrated using six different datasets and two different tasks.

OriginalsprogEngelsk
Artikelnummer107380
TidsskriftPattern Recognition
Vol/bind106
ISSN0031-3203
DOI
StatusUdgivet - okt. 2020

Se relationer på Aarhus Universitet Citationsformater

ID: 187614882