TY - GEN
T1 - Multi-Modal Integration of 2D and 3D Attributes for Multi-Vehicles Tracking
AU - Altaf, Muhammad Adeel
AU - Kim, Min Young
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Tracking multiple objects is crucial in autonomous vehicles, but relying on one sensor is unreliable due to potential failures in challenging scenarios. 2D cameras provide texture information, whereas LiDAR offers 3D structural data, each excelling under different conditions. Therefore, combining the features of these two sensors is essential for learning distinct characteristics. Effective fusion is challenging because the modal-ities contain fundamentally different data. In this study, we introduce multi-modal integration of point-level and pixel-level features to enhance feature distinctiveness. We utilize VoxelNet for obtaining multi-scale point cloud representations, and ResNet-50 for 2D image-based feature extraction. Additionally, we assess the benefits of pre-training individual modalities followed by fine-tuning the multi-modal. Our technique achieves MOTA 91.28% and 73.53% HOTA on the KITTI dataset, surpassing many methods without multi-modal integration.
AB - Tracking multiple objects is crucial in autonomous vehicles, but relying on one sensor is unreliable due to potential failures in challenging scenarios. 2D cameras provide texture information, whereas LiDAR offers 3D structural data, each excelling under different conditions. Therefore, combining the features of these two sensors is essential for learning distinct characteristics. Effective fusion is challenging because the modal-ities contain fundamentally different data. In this study, we introduce multi-modal integration of point-level and pixel-level features to enhance feature distinctiveness. We utilize VoxelNet for obtaining multi-scale point cloud representations, and ResNet-50 for 2D image-based feature extraction. Additionally, we assess the benefits of pre-training individual modalities followed by fine-tuning the multi-modal. Our technique achieves MOTA 91.28% and 73.53% HOTA on the KITTI dataset, surpassing many methods without multi-modal integration.
KW - autonomous vehicles
KW - deep learning
KW - merge features
KW - Multiple object tracking
KW - neural networks
UR - https://www.scopus.com/pages/publications/85217650370
U2 - 10.1109/ICTC62082.2024.10827704
DO - 10.1109/ICTC62082.2024.10827704
M3 - Conference contribution
AN - SCOPUS:85217650370
T3 - International Conference on ICT Convergence
SP - 364
EP - 369
BT - ICTC 2024 - 15th International Conference on ICT Convergence
PB - IEEE Computer Society
T2 - 15th International Conference on Information and Communication Technology Convergence, ICTC 2024
Y2 - 16 October 2024 through 18 October 2024
ER -