Feature Fusion and Enhancement for Lightweight Visible-Thermal Infrared Tracking via Multiple Adapters

Published in IEEE Transactions on Circuits and Systems for Video Technology, 2026

This paper proposes MFJA, a lightweight RGB-T tracking model for visible and thermal infrared object tracking. MFJA freezes a pre-trained dual-stream transformer backbone and trains only multiple lightweight adapters, enabling parameter-efficient adaptation to multimodal tracking.

The framework includes a feature fusion adapter for cross-modal interaction and a joint enhancement adapter for unimodal feature refinement. It introduces only 0.23M trainable parameters while improving robustness under challenging conditions such as occlusion, deformation, and modality quality variation.

Experiments on LasHeR and RGBT234 show that MFJA achieves competitive tracking accuracy with real-time inference. On LasHeR, it improves over prompt-learning and adapter-based baselines while maintaining an inference speed of 28.60 FPS.

DOIIEEE XploreCode

Recommended citation: H. Xue, H. Zhu, Z. Ran, X. Tang, G. Qi, Z. Zhu, S.-C. Kuok, and H. Leung. (2026). "Feature Fusion and Enhancement for Lightweight Visible-Thermal Infrared Tracking via Multiple Adapters." IEEE Transactions on Circuits and Systems for Video Technology, 36(1), 959-970. doi:10.1109/TCSVT.2025.3595632.
Download Paper