12-17, 11:15–11:30 (Asia/Bangkok), Auditorium Hall 1
The rapid development of autonomous vehicles (AVs) demands strong perception systems capable of reliably recognizing and classifying objects in complicated urban environments. To improve the reliability and precision of 3D object identification, combining camera and LiDAR sensors has emerged as a potential solution. This research describes a multimodal fusion framework that uses camera pictures and LiDAR point clouds to accomplish high-performance 3D object recognition in urban circumstances. Camera sensors provide precise color and texture information necessary for identifying traffic signs, pedestrians, and cars, whereas LiDAR provides exact depth measurements required for interpreting object geometry and spatial relationships. The indicated fusion technique improves detection accuracy by using these sensors' enhancing strengths, especially in difficult settings such as occlusions and fluctuating illumination conditions taking the KITTI open dataset. Here the OpenPCDet is used for 3D object detection and MMDetection for 2D detection as open library. Beside this the Facebook AI Research, Detectron2 flexible framework for 2D and 3D object detection tasks is more popular too.
Fusion is accomplished via a well built architecture that aligns and combines data from both modalities at various stages of the detection pipeline, such as feature extraction, region proposal, and classification. Advanced deep learning techniques, such as convolutional neural networks, are used to process and integrate multimodal input. Experimental results show that 3D object detection outperforms single-modality techniques in terms of robustness and precision, particularly when recognizing small and partially occluded objects.