Abstract:
Unsupervised Person Re-Identification (UPR) technology is widely applied in security engineering and smart city applications. However, many existing UPR algorithms neglect local feature matching and spatial location feature information during feature extraction, and may discard a large number of unclustered samples in the pseudo-label clustering process. To overcome the above drawbacks, an unsupervised person re-identification method (LFHC) based on local feature matching and hybrid contrastive learning is proposed. First, to address the issue that the network fails to extract feature information of different spatial locations, a self-similar non-local attention mechanism (Non-local) is introduced into the ResNet50 backbone network for feature extraction. To solve the problem of local feature mismatch, a local feature matching module (Aligned) is designed, which takes into account the mat-ching of human body structures while learning image similarity. Finally, in response to the problem of insufficient feature extraction caused by discarding unclustered samples during the training process, a hybrid memory of cluster-level and instance-level (HCL) is proposed to store cluster-level identity features and outlier instance features. To verify the effectiveness of the model's performance, comparisons are made with 12 existing unsupervised algorithms on two public datasets (Market-1501, DukeMTMC-ReID). Meanwhile, ablation experiments are conducted to explore the impacts of Non-local, Aligned, and HCL on the model's effectiveness. The comparative experimental results indicate that the LHFC model achieves map scores of 84.4% and 71.5% on the Market-1501 and DukeMTMC-ReID datasets, respectively. Compared to the CACL method, which performs the best among the 12 methods, the map scores of LHFC are improved by 3.5% and 1.9%, respectively. The results of the ablation experiments indicate that Non-local, Aligned, and HCL can improve the indicator accuracy: introducing Non-local into ResNet50 is beneficial for extracting more useful person feature information to mark the spatial location relationships between local features; the Aligned module can effectively integrate the corresponding human body structure information; HCL can reduce the errors caused by pseudo-labels in the later stage of training.