Home

ACAN - Video-based Unsupervised Domain Adaptation

Adversarial Correlation Adaptation Network (ACAN)

A Novel Method and A New Benchmark Dataset for Video-based Unsupervised Domain Adaptation (VUDA)

Download

alt text

Abstract

Domain adaptation (DA) approaches address domain shift and enable networks to be applied to different scenarios. Although various image DA approaches have been proposed in recent years, there is limited research towards Video-based Unsupervised Domain Adaptation(VUDA). This is partly due to the complexity in adapting the different modalities of features in videos, which includes the correlation features extracted as long-term dependencies of pixels across spatiotemporal dimensions. The correlation features are highly associated with action classes and proven their effectiveness in accurate video feature extraction through the supervised action recognition task. Yet correlation features of the same action would differ across domains due to domain shift. Therefore we propose a novel Adversarial Correlation Adaptation Network (ACAN) to align action videos by aligning pixel correlations. ACAN aims to minimize the distribution of correlation information, termed as Pixel Correlation Discrepancy (PCD). Additionally, VUDA research is also limited by the lack of cross-domain video datasets with larger domain shifts. We, therefore, introduce a novel HMDB-ARID dataset with a larger domain shift caused by a larger statistical difference between domains. This dataset is built in an effort to leverage current datasets for dark video classification. Empirical results demonstrate the state-of-the-art performance of our proposed ACAN for both existing and the new VUDA datasets.

Structure of ACAN

The structure of ACAN is as follows:

alt text

The correlation extraction is computed with the following structure:

alt text

The HMDB-ARID dataset for VUDA

There are rather limited cross-domain benchmark datasets for VUDA tasks, therefore hindering the research for VUDA. More recently, larger cross-domain video datasets, such as UCF-HMDBfull have been introduced with larger domain discrepancies. Though larger cross-domain datasets are introduced, both domains included in these datasets are still based on current well-established action recognition datasets. These action recognition datasets may include different classes with different videos, yet most of them are collected on public video platforms, leading to similar video statistics among these datasets which suggest high probability of similar scenarios exist among current action recognition datasets. Thus the domain shift between these datasets may not be significant. Consequently, the difficulty of adapting the same model across the different domains with similar video statistics or similar scenarios may be trivial. VUDA approaches that perform well in these cross-domain video datasets may not be well applicable in real-world applications where the gap between domains may be much larger than current cross-domain datasets. We argue that VUDA approaches would be more useful for bridging with video domains with large distribution shifts, such as dark videos (adverse illumination) or hazy videos (adverse contrast).

We compare our HMDB-ARID dataset statistically with other commonly used VUDA datasets:

alt text

Sampled frames from HMDB-ARID:

alt text

Benchmark Results

We tested our proposed ACAN on both UCF-HMDBfull and HMDB-ARID, while comparing with previous domain adaptation methods. The results are as follows:

alt text

Papers and Download

CC BY 4.0

Back to Project Page