Feature Fusion and Enhancement for Lightweight Visible-Thermal Infrared Tracking via Multiple Adapters
Published in IEEE Transactions on Circuits and Systems for Video Technology, 2026
This paper proposes MFJA, a lightweight RGB-T tracking model for visible and thermal infrared object tracking. MFJA freezes a pre-trained dual-stream transformer backbone and trains only multiple lightweight adapters, enabling parameter-efficient adaptation to multimodal tracking.
The framework includes a feature fusion adapter for cross-modal interaction and a joint enhancement adapter for unimodal feature refinement. It introduces only 0.23M trainable parameters while improving robustness under challenging conditions such as occlusion, deformation, and modality quality variation.
Experiments on LasHeR and RGBT234 show that MFJA achieves competitive tracking accuracy with real-time inference. On LasHeR, it improves over prompt-learning and adapter-based baselines while maintaining an inference speed of 28.60 FPS.
| DOI | IEEE Xplore | Code |
Recommended citation: H. Xue, H. Zhu, Z. Ran, X. Tang, G. Qi, Z. Zhu, S.-C. Kuok, and H. Leung. (2026). "Feature Fusion and Enhancement for Lightweight Visible-Thermal Infrared Tracking via Multiple Adapters." IEEE Transactions on Circuits and Systems for Video Technology, 36(1), 959-970. doi:10.1109/TCSVT.2025.3595632.
Download Paper
