Multilevel Deformable Attention-Aggregated Networks for Change Detection in Bitemporal Remote Sensing Imagery

Image credit: Unsplash


Deep learning (DL) approaches based on convolutional encoder–decoder networks have shown promising results in bitemporal change detection. However, their performance is limited by insufficient contextual information aggregation because they cannot fully capture the implicit contextual dependency relationships among feature maps at different levels. Moreover, harvesting long-range contextual information typically incurs high computational complexity. To circumvent these challenges, we propose multilevel deformable attention-aggregated networks (MLDANets) to effectively learn long-range dependencies across multiple levels of bitemporal convolutional features for multiscale context aggregation. Specifically, a multilevel change-aware deformable attention (MCDA) module consisting of linear projections with learnable parameters is built based on multihead self-attention (SA) with a deformable sampling strategy. It is applied in the skip connections of an encoder–decoder network taking a bitemporal deep feature hypersequence (BDFH) as input. MCDA can progressively address a set of informative sampling locations in multilevel feature maps for each query element in the BDFH. Simultaneously, MCDA learns to characterize beneficial information from different spatial and feature subspaces of BDFH using multiple attention heads for change perception. As a result, contextual dependencies across multiple levels of bitemporal feature maps can be adaptively aggregated via attention weights to generate multilevel discriminative change-aware representations. Experiments on very-high-resolution (VHR) datasets verify that MLDANets outperform state-of-the-art change detection approaches with dramatically faster training convergence and high computational efficiency.

IEEE Transactions on Geoscience and Remote Sensing, 60:1-18
Xiaokang Zhang
Xiaokang Zhang

My research interests include remote sensing, computer vision and deep learning.