TS-LCD: Two-Stage Loop-Closure Detection Based on Heterogeneous Data Fusion (2024)

Journal List
Sensors (Basel)
PMC11207695

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

Sensors (Basel). 2024 Jun; 24(12): 3702.

Published online 2024 Jun 7. doi:10.3390/s24123702

PMCID: PMC11207695

PMID: 38931487

Fangdi Jiang, Methodology, Software, Validation, Formal analysis, Data curation,^1,^† Wanqiu Wang, Writing – original draft,^1,^† Hongru You, Software,¹ Shuhang Jiang, Data curation,¹ Xin Meng, Formal analysis,¹ Jonghyuk Kim, Supervision,² and Shifeng Wang, Supervision^1,^3,^*

Thierry Badard, Academic Editor and Stephane Guinard, Academic Editor

Author information Article notes Copyright and License information PMC Disclaimer

Associated Data

Data Availability Statement

Abstract

Loop-closure detection plays a pivotal role in simultaneous localization and mapping (SLAM). It serves to minimize cumulative errors and ensure the overall consistency of the generated map. This paper introduces a multi-sensor fusion-based loop-closure detection scheme (TS-LCD) to address the challenges of low robustness and inaccurate loop-closure detection encountered in single-sensor systems under varying lighting conditions and structurally similar environments. Our method comprises two innovative components: a timestamp synchronization method based on data processing and interpolation, and a two-order loop-closure detection scheme based on the fusion validation of visual and laser loops. Experimental results on the publicly available KITTI dataset reveal that the proposed method outperforms baseline algorithms, achieving a significant average reduction of 2.76% in the trajectory error (TE) and a notable decrease of 1.381 m per 100 m in the relative error (RE). Furthermore, it boosts loop-closure detection efficiency by an average of 15.5%, thereby effectively enhancing the positioning accuracy of odometry.

Keywords: loop-closure detection, multi-sensor fusion, timestamp synchronization, feature extraction

1. Introduction

Loop-closure detection has emerged as a promising approach to addressing the challenges encountered in simultaneous localization and mapping (SLAM) [1] technology. This technique, which holds a pivotal position in autonomous driving, augmented reality, andvirtual reality, effectively mitigates the accumulation of errors in localization and map construction. Byestablishing robust constraints between the current frame and historical frames, loop-closure detection significantly enhances the practical utility of SLAM in autonomous navigation[2] and robotics applications[3]. Consequently, it facilitates the attainment of more precise and reliable spatial perception and navigation capabilities, thereby playing a pivotal role in ensuring the accuracy and efficiency of SLAM systems. Depending on the sensor employed, this methodology can be segmented into two primary categories: vision-based and laser-based loop-closure detection[4].

Vision-based loop-closure detection techniques hinge on visual features extracted from the surrounding environment. These features predominantly originate from scenes replete with textures, such as the facades of buildings or road signage[5,6,7]. Through the utilization of methodologies like bag-of-words modeling, thesystem possesses the capability to swiftly identify candidate loop-closure frames bearing similarities to the present scene amidst an extensive corpus of imagery[8,9]. Nevertheless, vision-based loop-closure detection exhibits notable sensitivity to environmental alterations. Forinstance, when a robot revisits a locale under varying lighting conditions or from altered perspectives, thesystem might fail to precisely recognize the loop closure due to substantial shifts in visual features. This ultimately yields erroneous detection outcomes[10]. Laser-based loop-closure detection methodologies exhibit enhanced robustness. These approaches generally involve the extraction of local or global descriptors from point clouds acquired through LiDAR scans. These descriptors remain unaffected by variations in illumination and viewing angles, thereby rendering them resilient to environmental changes. Nonetheless, laser-based loop-closure detection faces challenges in structurally hom*ologous environments[11], such as elongated corridors or repetitive architectural layouts. Insuch scenarios, thesystem might erroneously identify distinct positions as loop closures due to the descriptor similarities, ultimately causing disarray within the navigation system and compromising localizationaccuracy.

To address the prevalent issues and challenges associated with vision- and laser-based loop-closure detection, this paper presents an innovative loop-closure detection algorithm based on the principle of multivariate heterogeneous data fusion. Its algorithmic framework is shown in Figure 1. Thepresent study overcomes the inherent performance limitations of single-sensor data in specific environments. Our approach harnesses the complementary strengths of multiple sensor data to enhance the accuracy and robustness of closed-loop inspection. Theprimary contributions of this study can be summarized as follows:

(1)
This paper proposes an adaptive tightly coupled framework for loop-closure detection, named two-stage loop-closure detection (TS-LCD), achieving improved robustness and accuracy for closed-loop detection in differentenvironments.
(2)
An innovative method based on the interpolation technique is proposed in this paper, which optimizes the data processing flow and achieves timestampsynchronization.
(3)
The effectiveness of the algorithm is validated through the integration of mainstream laser odometry frameworks and rigorous evaluation utilizing the KITTIdataset.

Figure 1

Open in a separate window

TS-LCD processing pipeline showing the LiDAR, camera, and IMU data input, preprocessing, and feature extraction. The loop-closure detection module utilizes both visual and LiDAR information for two-stage detection.

2. RelatedWorks

Loop-closure detection holds significant importance in the domain of SLAM and has garnered considerable research attention in recent times. References[12,13,14] offer diverse algorithms for loop detection and location identification, each boasting unique strengths in efficiency, precision, andversatility. Two loop-closure detection approaches that are pertinent to our study are those reliant on vision andlaser.

2.1. LiDAR-Based Loop-ClosureDetection

Loop-closure detection methods for laser sensors can be categorized into local feature-based methods and global feature-based methods. Formethods based on local feature fusion to obtain global features, Bosseetal.[15] divided the point cloud data into regions according to cylindrical shape and fused the average height, variance, overall roundness, andcylindricality of the point cloud in each region to obtain the global descriptor of the point cloud. Stederetal.[16] used the NARF features[17] as the local features and generated a bag-of-words model through the bag-of-words vectors for keyframe retrieval. Zaganidisetal.[18] used semantic segmentation to assist loop-closure detection for NDT statistical maps. Forthe global descriptor approach, Granströmetal.[19] utilized point cloud rotational invariance, The authors used the statistical features of the point cloud as the parameters of the global descriptor, such as the point cloud range, volume, etc., andAdaBoost as the classifier of the features to complete the loop-closure detection. LeGO-LOAM added a loop-closure detection module based on the LOAM system and used the construction of a KD tree for the keyframe positions to search for the closest keyframe in the spatial location. Kimetal.[20] used Scan Context as the point cloud descriptor for loop-closure detection based on LeGO-LOAM. Linetal.[21] proposed a new SLAM system that calculated the feature distribution of keyframes using a 2D statistical histogram for loop-closure detection. CNNs (Convolutional Neural Networks) have shown advantages in feature extraction, so many extraction methods have adopted CNN features. Yangetal.[22] used the PointNetVLAD network as a local feature extraction network for loop-closure detection. They improved the classification part of PointNetVLAD, trained the classification model for the extracted descriptors, andused cross-entropy with stochastic gradient descent as a loss function to improve the classification results of PointNetVLAD. Yinetal.[23] utilized the neural network LocNet to extract the global descriptors of point cloud frames and added loop-closure detection to the SLAM system based on the Monte Carlo localization algorithm. Zhuetal.[24] proposed the GOSMatch method, which employs semantic hierarchy descriptors and geometric constraints for loop-closure detection. They utilized RangeNet++ to detect the semantic information of the current frame of point cloud data, employed a statistical histogram of semantic object connectivity relations as the global descriptor of the semantic hierarchy of point cloud frames, andfinally utilized the RANSAC algorithm for geometric validation. Vidanapathiranaetal.[25] proposed the fusion of point cloud features and a point cloud frame spatio-temporal feature network called Locus for extracting global descriptors. TheOverlapNet method[26] used pairs of depth maps, normal vector maps, intensity maps, andsemantic-type maps of the point cloud to extract the globaldescriptors.

To address the lack of color and textural information in point cloud data, Zhuetal.[27] embedded a visual sensor-based loop-closure detection method into a laser SLAM system and proposed using an ORB-based bag-of-words model and the RANSAC algorithm to accomplish loop-closure detection. Krispeletal.[28] proposed a global feature extraction method that utilized fused image and point cloud features. Xieetal.[29] fused point cloud and image global descriptor extraction methods based on PointNetVLAD, utilizing PointNetVLAD and ResNet50 as the feature extraction methods for point cloud features and image features, respectively, andfinally obtained global descriptors by fusing thesefeatures.

2.2. Vision-Based Loop-ClosureDetection

The traditional VSLAM (visual simultaneous localization and mapping) system constructs the global descriptor of the current frame by extracting local manual features to complete the retrieval of loop-closure candidate frames. Theloop-closure detection module of the ORB-SLAM system[30] adopts ORB feature points as the manual features and uses the bag-of-words model to construct the bag-of-words vector and complete the matching of candidate frames. Due to the weak robustness of manual features, which were easily affected by lighting, Zhangetal.[31] used a CNN to extract local features instead of manual features. Yueetal.[32] proposed adding the spatial structure information of the feature points and using triangular segmentation and graph validation as geometric constraints based on the extraction of local features using a CNN. However, theabove methods cannot obtain the semantic information or dynamic and static attributes of the featurepoints.

For visual loop-closure detection in dynamic environments, Wangetal.[33] constructed a SURF feature database of dynamic objects offline and judged the motion attributes of the feature points based on the database. Miglioreetal.[34] filtered static feature points through triangulation, andMousavianetal.[35] used semantic segmentation to eliminate dynamic feature points, improving dynamic feature point recognition accuracy. Similarly, inDynaSLAM[36], adynamic feature point rejection part was added to the ORB-SLAM2 system, rejecting dynamic object feature points through semantic segmentation based on MaskRCNN[37] and feature point geometric constraint relations. Theauthors also discussed the effects of adding image restoration to the SLAM system based on the rejection of dynamic feature points. Ina related study on scene recognition, NetVLAD[38] improved VLAD by integrating local features to obtain global descriptors. It was successfully introduced into deep learning models, which could be trained to obtain global descriptors through deep learning networks. Inaddition, theresearchers of CALC2.0[39] designed a CNN-based approach that integrated appearance, semantic, andgeometric information, categorizing all dynamic objects as “other” semantic attributes in terms of semantic labels. Althoughthe dynamic objects were unified into the “other” category in CALC2.0, themethod of generating global descriptors through deep learning networks was still affected by dynamic region pixels to varying degrees due to the lack of image preprocessing. Toavoid the impact of dynamic scenes on the construction of global descriptors, Naseeretal.[40] used a CNN to segment the image and then extracted the global descriptors from the segmented image. Munozetal.[41] used a network of object recognition methods instead of a segmentation network to modify the global descriptors of animage.

2.3. Deep Learning-Based Loop-ClosureDetection

With the development of deep learning technology, learned local features have been employed for geometrical verification in LCD (loop-closure detection). Nohetal.[42] introduced the DEep Local Feature (DELF) approach, focusing on extracting and selectively utilizing local features via an attention mechanism tailored for geometrical verification. Anetal.[43] presented FILD++, aninnovative LCD system that leverages a two-pass CNN model to extract both global and local representations from input images. Thegeometrical verification between query and candidate pairs is subsequently conducted based on the locally learned features extracted through this process. Hausleretal.[44] proposed the patch-level feature approach (SaliencyNetVLAD), which further optimizes the pixel-level local features to cover a larger spatial range. Jinetal.[45] utilized the SaliencyNetVLAD method with a newly designed facet descriptor loss, enabling SaliencyNetVLAD to extract more discriminative facet-level local features. Jinetal.[46] proposed a generalized framework called LRN-LCD, alightweight relational network for LCDs, which integrates the feature extraction module and the similarity measure module into a simple lightweightnetwork.

Deep learning-based loop-closure detection possesses powerful feature extraction capabilities compared to traditional methods. Deep learning models are able to extract high-level features from complex images or point cloud data. These features are then compared to determine whether a loop closure has occurred. However, it still encounters some challenges in practical applications. Thetraining and inference of deep learning models necessitate a significant amount of computational resources, posing a challenge for resource-limited devices. Inthis paper, we propose a two-stage loop-closure detection mechanism. Theaccuracy and robustness of closed-loop detection are improved by utilizing the complementary advantages of multi-sensordata.

3. Method

The proposed TS-LCD algorithm framework is illustrated in Figure 2. Theinput consists of LiDAR point cloud and camera image data, which undergo parallel preprocessing to extract real-time LiDAR and visual features. Theextracted features are stored in a local map. Subsequently, bypreprocessing the IMU data, thepose information of keyframes is obtained. Further filtering and calibration of the IMU data can reduce the attitude error of the keyframes and improve the accuracy of visual loop-closure detection. Inaddition, IMU data can be used to address point cloud distortion arising from LiDAR movement. Finally, SC (Scan Context) is employed for loop-closure frame detection and matching to minimize false detection rates. Thedetails of each module within the algorithm framework are introduced in orderbelow.

Open in a separate window

Figure 2

First, timestamp synchronization is performed. Then, feature extraction is performed to compute the similarity to obtain the selected frame after loop-closure detection, which is then further confirmed by SC.

3.1. Preprocessing of InputData

Since the sampling frequencies of the LiDAR, camera, andIMU are not the same, this paper synchronizes the timestamps of the LiDAR and camera data by finding the nearest-neighbor frames based on data processing and interpolation. Theposition states of the LiDAR and camera keyframes are obtained by pre-integrating the IMUdata.

3.1.1. TimestampSynchronization

For each frame of the point cloud data, its timestamp $T_{L}$ is registered, andfor each frame of the image, its timestamp $T_{C}$ is registered. Thetimestamps of the LiDAR and camera are sorted separately. Foreach LiDAR timestamp $T_{L_{i}}$ , find the closest neighboring timestamps $T_{C_{j}}$ and $T_{C_{j + 1}}$ in the camera timestamp sequence. Ifthe neighboring camera timestamps $T_{C_{j}}$ and $T_{C_{j + 1}}$ are located on either side of $T_{L_{i}}$ , thecamera timestamp corresponding to $T_{L_{i}}$ can be estimated using linear interpolation or other interpolation methods. Thelinear interpolation formula is:

$T_{C_{e s t}} = T_{C_{j}} + (T_{L_{i}} - T_{C_{j}}) \times \frac{T_{C_{j + 1}} - T_{C_{j}}}{T_{L_{i + 1}} - T_{L_{i}}}$

(1)

3.1.2. IMUPre-Credit

The IMU acquires the acceleration and angular velocity, andthe position information of the keyframes can be obtained through the integration operation of the IMU measurements. Thesampling frequency of the IMU is much larger than the keyframe release frequency of the image and the laser, corresponding to the red line and the green line in Figure 4, respectively. Assuming that the two neighboring red lines correspond to moments k and $k + 1$ , theaverage acceleration and the average angular velocity of the IMU during this time period are, respectively,

${\bar{\hat{a}}}_{\dot{k}} = \frac{1}{2} [q_{\dot{k}} ({\hat{a}}_{\dot{k}} - b_{a_{k}}) + q_{\dot{k} + 1} ({\hat{a}}_{\dot{k} + 1} - b_{a_{k}})]$

(2)

$\bar{\hat{w_{k}}} = \frac{1}{2} (\hat{w_{k}} + \hat{w_{k + 1}}) - b_{w_{k}}$

(3)

where ${\hat{a}}_{k}$ and ${\hat{a}}_{k + 1}$ are the accelerations of k and $k + 1$ , respectively; $b_{a_{k}}$ and $b_{w_{k}}$ are the zero-biases; $\hat{w_{k}}$ and $\hat{w_{k + 1}}$ are the angular velocities of k and $k + 1$ , respectively; and $q_{k}$ and $q_{k + 1}$ are the directional state quantities (DSPs) of k and $k + 1$ , respectively.

Open in a separate window

Figure 4

Pre-integration of IMU data for obtaining the pose of keyframes. The red arrow indicates IMU observation, and the green arrow indicates Camera and LiDAR observation.

At moment $k + 1$ , theposition ${\hat{α}}_{k + 1}^{b_{k}}$ , velocity ${\hat{β}}_{k + 1}^{b_{k}}$ , andattitude ${\hat{γ}}_{k + 1}^{b_{k}}$ of a keyframe can be expressed as follows:

${\hat{α}}_{k + 1}^{b_{k}} = {\hat{α}}_{k}^{b_{k}} + {\hat{β}}_{k}^{b_{k}} δ t + \frac{1}{2} {\hat{\hat{α}}}_{k} δ t^{2}$

(4)

${\hat{β}}_{k + 1}^{b_{k}} = {\hat{β}}_{\dot{k}}^{b_{k}} + {\bar{\hat{\hat{a}}}}_{\dot{k}} δ t$

(5)

${\hat{γ}}_{k + 1}^{b_{k}} = {\hat{γ}}_{k}^{b_{k}} \otimes {\hat{γ}}_{k + 1}^{k} = {\hat{γ}}_{k}^{b_{k}} \otimes [\frac{1}{\frac{1}{2} \bar{{\hat{w}}_{k}} δ t}]$

(6)

where $δ t$ is the time interval from frame k to frame $k + 1$ .

3.2. FeatureExtraction

Assuming that the internal and external parameters of the LiDAR and camera are known and fixed, andtheir distortions have been corrected, inthis paper, we adopt the curvature extraction method defined in LOAM to obtain LiDAR features. LiDAR features with higher curvatures are defined as LiDAR edge features $P_{e d g e}$ , while those with lower curvatures are defined as LiDAR planar features $P_{s u r f}$ . TheLiDAR feature extraction method is shown in Figure 5. Thecurvature calculation formula is as follows:

$c = \frac{1}{∣ S ∣ \cdot ‖ X_{(\dot{k}, i)}^{L} ‖} ‖ \sum_{j \in S, j \neq i} (X_{(\dot{k}, i)}^{L} - X_{(\dot{k}, j)}^{L}) ‖$

(7)

where $S$ is the set of consecutive points returned by the laser in the same frame and $X_{(k, i)}^{L}$ and $X_{(k, j)}^{L}$ refer to the i and k points in the point cloud of the k scan in the L (LiDAR) coordinatesystem.

Open in a separate window

Figure 5

Each frame of the point cloud is subjected to feature extraction by calculating the curvature of each point; lower curvatures are defined as LiDAR planar features $P_{s u r f}$ , and higher curvatures are defined as LiDAR edge features $P_{e d g e}$ .

For the selection of visual features, this paper calculates the autocorrelation matrix of each pixel point in the image and then determines whether the point is a corner point based on the eigenvalues of this matrix. Corner points are areas where the image signal changes significantly in two-dimensional space, which typically include significant change areas such as corner points, intersections, andtextures. Thevisual feature extraction method is shown in Figure 6. Theexpression of the autocorrelation matrix $M$ is

$M = [I_{x}^{2} I_{x} I_{y}; I_{x} I_{y} I_{y}^{2}]$

(8)

where $I_{x}$ and $I_{y}$ are the gradients of the image in the x and y directions, respectively.

Open in a separate window

Figure 6

The image grayscale-based method detects visual features by calculating the curvature and gradient of the points.

3.3. Deep InformationCorrelation

Since a single camera does not have the ability to accurately measure depth information, only a scaled estimation of the feature point depth can be made, andthe depth estimation will be highly noisy if the number of observations of the feature point is low or the parallax is insufficient. Withina multimodal sensor fusion framework, thedepth information of visual feature points can be optimized using the LiDAR point cloud to improve the robustness and accuracy of the visual-inertial odometry. Thedepth information correlation module is designed to more accurately assign depths to visual features. Asshown in Figure 7, visual features are projected into the LiDAR coordinate system through an external parametermatrix.

Open in a separate window

Figure 7

Deep information linkage framework process. The x, y and z axes are the coordinate system of LiDAR respectively.

For each visual feature, thethree closest LiDAR points can be selected via a KD-tree. Depending on the depth of these points, avalidation process is performed to improve the accuracy of subsequent matching. Thespecific validation process involves calculating the Euclidean spatial distances of the three nearest LiDAR points to the current visual feature. Inthe experiment, ifthe farthest distance between the three points is less than 0.5m, thevalidation is successful, andthe depth can be calculated by bilinear interpolation. Otherwise, triangulation is applied to assign depths to visual features. Figure 7 shows the exact process of validation. Thepurpose of the validation process is to check whether the points are in the same plane; if the depth difference is too large, thepoints may be in different planes and therefore need to be excluded, thus reducing the possibility of falsematches.

3.4. Loop-ClosureDetection

The Dbow2 bag-of-words model is used as the basis of visual loop-closure detection in this paper. Firstly, thefeature points in the latest keyframes tracked by optical flow are used to calculate the corresponding descriptors, which are then matched with the history frames to search for the most compatible frames and eliminate the history frames that are outside the time or distance thresholds with the current frame. Then, theattitude data of the remaining historical frames are obtained, andthe position relationship between the current frame and the historical frames is optimized by the PnP (perspective-n-point) algorithm. Inorder to improve the accuracy of the loop-closure constraint, theattitude information of the current frame relative to the historical frames in the visual loop closure is sent to the LiDAR odometer as the initial value. TheLiDAR odometer searches for the optimal match among the stored historical keyframes according to the minimum distance principle and then matches them according to the initial value provided by the visual loop closure to further optimize the attitude constraints, ultimately achieving high-precision loop-closure detectionresults.

In addition to the optimization of visual loop-closure detection from coarse to fine, tosolve the defective view angle problem of visual loop-closure detection, asshown in Figure 8, alow-consumption LiDAR loop-closure detection mechanism is added based on visual loop-closure detection. TheLiDAR odometer stores the position of historical keyframes in real time, andwhen it detects that the positional information of the point cloud in the latest frame is close to the historical trajectory, theloop-closure detection mechanism searches for the historical information and performs a match. Usually, when there is no visual loop-closure detection, this kind of LiDAR loop-closure detection method may fail due to the large scene. However, inthis paper, we use the coupling of two types of loop-closure methods to address loop-closure detection in ordinary scenes. Through the visual loop closure, we aim to reduce the trajectory error under prolonged system operation. Meanwhile, theaccuracy of LiDAR loop-closure detection based on spatial distance can be improved significantly to compensate for loop-closure failures caused by visual perspective problems. Additionally, we introduce the SC (Scan Context) algorithm to enhance the reliability and accuracy of loop-closure detection by precisely computing the degree of closed-loop recognition. Thealgorithmic framework is outlinedbelow.

Algorithm 1Procedure of the proposed onlinesolution

1:
Input:
The data sequence $\bar{U}$ after timestamp synchronization is divided into consecutive keyframes $(U_{1}, U_{2}, \dots, U_{K - 1})$ , the paired observations $T = {\{T_{C_{e x}}\}}_{T_{L}}^{T_{C}}$ , linear interpolation (math.).
2:
Third-party libraries:
Kalman Filter KF; Random Sample Consensus (RANSAC); Scan Context (SC) loop-closure detection.
3:
Output: The set of features for each frame of data $P_{e d g e}$ , $P_{s u r f}$ , andHarris FeaturePoint.
4:
Initialize: Initialize keyframe database; OpenCV and PCL;
5:
/* Visual loop-closure detection. */
6:
for $New keyframe T_{C} in \bar{U}$ do
7:
if $r o t a t i o n A n g l e$ > $60^{\circ}$ then
8:
Similarity $S (A, B)$ Calculation by $S (A, B) = \frac{V_{A} \cdot V_{B}}{| V_{A} | \cdot | V_{B} |}$ ;
9:
if $S (A, B)$ > 0.75 then
/* Output visual loopback candidate frames $T_{C}^{Loop}$ = $T_{L}^{Loop}$ */
10:
elseif $S (A, B)$ < 0.75then
11:
Remove $T_{C}$ ;
12:
end if
13:
end if $m o v i n g D i s t a n c e$ > 10.0 h then
14:
Similarity calculation module;
15:
if $S (A, B)$ > 0.75 then
/* Output visual loopback candidate frames $T_{C}^{Loop}$ = $T_{L}^{Loop}$ */
16:
elseif $S (A, B)$ < 0.75then
17:
Remove $T_{C}$ ;
18:
end if
19:
Remove $T_{C}$ ;
20:
end if
21:
end for
22:
/* Laser loop-closure detection. */
23:
/* Laser loop closure frame $T_{L}^{Loop}$ extraction module similar to visual. */
24:
Similarity $d i s t (D_{1}, D_{2})$ Calculation by $d i s t (D_{1}, D_{2}) = \sqrt{\sum_{i = 1}^{n} {(D_{1} [i] - D_{2} [i])}^{2}}$ ;
25:
/* Loop-closure detection acknowledgment. */
26:
for Each pair of candidate loopback frames $T_{L}^{Loop}$ and $T_{Correspond}$ do
27:
/* 3D matrix generation module for spatial structure information SC. */
28:
for each SC do
29:
/* Feature extraction module. */
30:
/* Similarity calculation module. */
31:
if Similarity meets the requirements then
32:
/*Output loop-closure frames for optimization*/
33:
end if then
34:
Remove $T_{L}^{Loop}$ ;
35:
end if
36:
end for
37:
end for

Open in a separate window

Figure 8

Failure of visual loop-closure detection. The arrow indicates the direction of travel of the vehicle.

4. Experiments

The proposed TS-LCD framework was validated on the KITTI dataset, one of the largest publicly available datasets in the field of autonomous driving. The KITTI dataset comprises 11 sequences with ground-truth data (sequences 00–10), spanning a total length of 22 km and featuring a rich and diverse range of environments, including rural, urban, highway, and other mixed scenes. In this paper, we used datasets from sequences 00, 05, 06, and 07, which contain loop-closure data, for experimentation. To ensure a fair assessment of robustness and accuracy, the proposed framework was validated across all KITTI sequences with ground-truth data. The input data for these sequences consisted solely of LiDAR and image data, and the frequencies of the LiDAR and camera were pre-synchronized to 10 Hz using an algorithm. The root mean square error (RMSE) was calculated using EVO as an evaluation metric.

4.1. Evaluation of Odometer PositioningAccuracy

The experimental inputs consisted of binocular camera images, LiDAR point clouds, and IMU data from the KITTI dataset. The primary evaluation metric of the KITTI dataset was the average translational error (ATE), measured in terms of the drift per hundred meters and typically expressed as a percentage. The secondary metric was the average rotational error (ARE), measured in $\frac{\deg}{100 m}$ . LOAM is one of the best-performing LiDAR odometry algorithms on the KITTI benchmark, and SC is one of the most widely used loop-closure algorithms in laser SLAM applications. Therefore, in our experiments, we selected LOAM and SC-LOAM as baselines to validate the effectiveness of the proposed TS-LCD loop-closure framework. TS-LOAM denotes our algorithm combining TS-LCD and LOAM.

Odometry is an essential component of SLAM, and its accuracy is primarily reflected in the precision of trajectories without loops. Here, we compare the odometry performance of TS-LOAM with LOAM and SC-LOAM. The quantitative comparison results are shown in Table 1 and Table 2, where TS-LOAM refers to the addition of our loop-closure algorithm to LOAM. Compared to the baseline algorithms, TS-LOAM achieved the best performance, with an ATE of 1.87 and an ARE of 1.13 $\frac{\deg}{100 m}$ . The ATE was reduced by an average of 2.66 and the ARE by an average of 1.44 $\frac{\deg}{100 m}$ .

Table 1

Translational error index of odometry localization accuracy. Unit: %.

Algorithm	Seq. No.	Max.	Mean	Median	Min.	RMSE	Std.
LOAM	00	16.638040	6.509289	5.385290	1.038895	7.757144	4.219293
	05	12.416641	3.532057	3.166770	1.007233	4.095342	2.072775
	06	17.947513	7.720748	6.196751	0.000000	8.941244	4.509534
	07	1.511218	0.673462	0.676061	0.210234	0.708818	0.221072
	Avg	12.158353	4.608889	3.856218	0.564091	5.375637	2.755669
SC-LOAM	00	13.155136	3.725957	3.023545	0.752738	4.460252	2.451753
	05	3.872862	1.741009	1.572590	0.778893	1.891268	0.738771
	06	13.888364	6.792742	5.920660	0.000000	7.505593	3.192583
	07	1.639146	0.657002	0.678615	0.208712	0.700873	0.24407
	Avg	8.138877	3.229178	2.798853	0.435086	3.639497	1.656794
TS-LOAM	00	3.209416	1.317889	1.246500	0.296522	1.437708	0.574607
	05	2.788561	1.039795	0.937344	0.222877	1.153234	0.498775
	06	7.316305	3.818756	3.747921	0.000000	4.198256	1.744263
	07	1.182524	0.658270	0.638349	0.149834	0.683565	0.184231
	Avg	3.624202	1.708678	1.642529	0.167308	1.868191	0.750469

Open in a separate window

Table 2

Rotational error index of odometry localization accuracy. Unit: $\frac{\deg}{100 m}$ .

Algorithm	Seq. No.	Max.	Mean	Median	Min.	RMSE	Std.
LOAM	00	7.572816	2.861219	3.293702	0.052028	3.371129	1.782677
	05	5.962009	2.606564	3.083750	0.012628	3.052875	1.589298
	06	5.957210	2.001717	0.118190	0.015152	3.184903	2.477243
	07	6.893125	2.783081	3.128018	0.036888	3.337638	1.842359
	Avg	6.596290	2.563145	2.405915	0.029174	3.236637	1.922894
SC-LOAM	00	6.944201	1.683658	1.900435	0.026070	2.014907	1.106863
	05	4.717202	1.533127	1.627286	0.010259	1.824622	0.989327
	06	5.536863	1.086076	0.095882	0.014956	1.696411	1.303168
	07	5.404435	1.694663	1.642660	0.024214	2.054873	1.162161
	Avg	5.650675	1.499381	1.316566	0.018875	1.897703	1.140380
TS-LOAM	00	4.492446	1.024355	1.004043	0.004296	1.272371	0.754736
	05	3.267100	0.914836	1.012487	0.001205	1.110065	0.628745
	06	2.954673	0.913674	0.142899	0.007247	1.460142	1.138953
	07	1.182524	0.658270	0.638349	0.149834	0.683565	0.184231
	Avg	2.974185	0.877784	0.699445	0.040646	1.131535	0.676667

Open in a separate window

4.2. Comparison of Odometer Trajectory and GroundTruth

To further analyze the advantages of the proposed framework, a qualitative analysis was conducted using sequences 00, 05, and 07. The results are shown in Figure 9. In these scenarios, TS-LOAM demonstrated superior performance compared to both LOAM and SC-LOAM. When compared to the ground-truth trajectory, the improved accuracy of TS-LOAM was evident on end-to-end drift constraints, primarily due to the second-order loop-closure detection strategy. The green rectangular boxes mark the areas where TS-LOAM showed significant improvements. Both LOAM and SC-LOAM exhibited noticeable drift, while TS-LOAM consistently maintained alignment with the ground-truth trajectory, proving the high robustness of the proposed algorithm.

Open in a separate window

Figure 9

Plot of odometer trajectories versus ground truth using KITTI 00, 05, and 06 datasets. As seen in the figure, our proposed algorithm outperformed the two baseline algorithms in every position due to our second-order loop-closure matching mechanism.

4.3. Experimental Validation Using an UnmannedVehicle

The unmanned vehicle experiment used an unmanned vehicle equipped with a 16-line LiDAR (RS-Helios 16), a six-axis IMU (HFI-B6), and a binocular camera (Astra Pro). The experiment was conducted in a campus setting, with the experimental equipment shown in Figure 10a. We selected a circular route around the parking lot in the campus environment for experimental analysis. Figure 10b shows the projection of the traveling trajectory of the unmanned vehicle on the satellite map.

Open in a separate window

Figure 10

Unmanned vehicle experimental platform and experimental site. (a) Experimental platform; (b) Experimental site. The experimental site location is the Digital Building in Zhongshan City, China. The green color is the experimental track of driving.

Figure 11a shows the data trajectory information from the SC-LOAM operational experiments. Failure to detect the loop closure while traveling the closed-loop section resulted in a shifted trajectory. The algorithm proposed in this paper (TS-LOAM) is shown in Figure 11b after integrating visual loop closure. It accurately detects the loop closure and optimizes the trajectory. By comparing the performance of the traditional SC-LOAM algorithm with the TS-LOAM algorithm proposed in this paper, the effectiveness of the two-stage loop-closure detection system in optimizing unmanned vehicle trajectory offsets is verified.

Open in a separate window

Figure 11

Comparison of trajectory information. (a) SC-LOAM trajectory; (b) TW-LOAM trajectory.

4.4. Loop-Closure DetectionPerformance

The performance of SLAM loop-closure detection is conventionally appraised by two primary metrics: precision and recall. Precision pertains to the proportion of genuinely detected loops among all the loops identified by the system, as exhibited in Table 3. In contrast, recall denotes the likelihood of a genuine loop being accurately detected within the system. The calculation formulas are as follows:

$P r e s i o n = \frac{T P}{T P + F P}$

(9)

$R e c a l l = \frac{T P}{T P + F N}$

(10)

Table 3

Evaluation of loop-closure detection parameters.

Algorithm Judgment\Factual Truth Value	Be Looped	No Loop
Be looped	True Positive	False Positive
No loop	False Negative	True Negative

Open in a separate window

The loop-closure detection experiments used the publicly available dataset KITTI to evaluate the performance of loop-closure detection and compare it with the loop-closure results of SC-LOAM. The loop-closure detection scheme based on multi-sensor fusion exhibits higher robustness and can more accurately screen out the candidate loop-closure frames. As shown in Table 4, the algorithm improves the accuracy of loop-closure detection by 16.7% and recall by 14.3% relative to the SC-LOAM algorithm.

Table 4

Comparison of loop-closure detection results.

Dataset	SC-LOAM		TS-LOAM
Dataset	Accuracy Rate	Recall Rate	Accuracy Rate	Recall Rate
00	5/5	5/7	6/6	6/7
Our data	0/1	0/1	1/1	1/1

Open in a separate window

5. Conclusions

This paper proposes a framework for a loop-closure detection algorithm based on multi-sensor adaptive tight coupling, aiming to achieve accuracy and effectiveness in loop-closure detection. The proposed framework addresses the issue of mismatched visual loop frames and laser loop frames due to different sampling frequencies between LiDAR and cameras by utilizing data processing and interpolation techniques. Additionally, to enhance the accuracy of loop-closure detection, a second-order loop-closure detection scheme is introduced. To validate the robustness and accuracy of the proposed framework, extensive experiments are conducted on the KITTI dataset. The results demonstrate that compared to existing methods, the proposed TS-LCD and TS-LOAM framework significantly reduces the absolute translational error (ATE) by an average of 2.76% and the absolute rotational error (ARE) by 1.381 m/100 m. In addition, it improves closed-loop inspection efficiency by an average of 15.5%. In future work, we plan to incorporate deep neural networks for semantic segmentation of point cloud data within the existing algorithm framework and construct point cloud semantic graph descriptors using graph models.

Funding Statement

This work was supported by the International Cooperation Foundation of Jilin Province (Grant No. 20210402074GH).

Author Contributions

Methodology, F.J.; Software, F.J. and H.Y.; Validation, F.J.; Formal analysis, F.J. and X.M.; Data curation, F.J. and S.J.; Writing – original draft, W.W.; Supervision, J.K. and S.W. All authors have read and agreed to the published version of this manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

1. Taheri H., Xia Z.C. SLAM; definition and evolution. Eng. Appl. Artif. Intell. 2021;97:104032. doi:10.1016/j.engappai.2020.104032. [CrossRef] [Google Scholar]

2. Ok K., Liu K., Frey K., How J.P., Roy N. Robust object-based slam for high-speed autonomous navigation; Proceedings of the 2019 International Conference on Robotics and Automation (ICRA); Montreal, QC, Canada. 20–24 May 2019; pp. 669–675. [Google Scholar]

3. Yarovoi A., Cho Y.K. Review of simultaneous localization and mapping (SLAM) for construction robotics applications. Autom. Constr. 2024;162:105344. doi:10.1016/j.autcon.2024.105344. [CrossRef] [Google Scholar]

4. Chen S., Zhou B., Jiang C., Xue W., Li Q. A LiDAR/visual slam backend with loop closure detection and graph optimization. Remote Sens. 2021;13:2720. doi:10.3390/rs13142720. [CrossRef] [Google Scholar]

5. Wang C., Wu Z., Chen Y., Zhang W., Ke W., Xiong Z. Improving 3D Zebrafish Tracking with Multi-View Data Fusion and Global Association. IEEE Sens. J. 2023;23:17245–17259. doi:10.1109/JSEN.2023.3288729. [CrossRef] [Google Scholar]

6. Wang C., Wu Z., Ke W., Xiong Z. A simple transformer-based baseline for crowd tracking with Sequential Feature Aggregation and Hybrid Group Training. J. Vis. Commun. Image Represent. 2024;100:104144. doi:10.1016/j.jvcir.2024.104144. [CrossRef] [Google Scholar]

7. Wu Z., Wang C., Zhang W., Sun G., Ke W., Xiong Z. Online 3D behavioral tracking of aquatic model organism with a dual-camera system. Adv. Eng. Inform. 2024;61:102481. doi:10.1016/j.aei.2024.102481. [CrossRef] [Google Scholar]

8. Wang Y., Qiu Y., Cheng P., Duan X. Robust loop closure detection integrating visual–spatial–semantic information via topological graphs and CNN features. Remote Sens. 2020;12:3890. doi:10.3390/rs12233890. [CrossRef] [Google Scholar]

9. Wang W., Liu J., Wang C., Luo B., Zhang C. DV-LOAM: Direct visual LiDAR odometry and mapping. Remote Sens. 2021;13:3340. doi:10.3390/rs13163340. [CrossRef] [Google Scholar]

10. Artal R. Ph.D. Thesis. Universidad de Zaragoza; Zaragoza, Spain: 2017. [(accessed on 24 August 2022)]. Mur Real-Time Accurate Visual SLAM with Place Recognition. Available online: https://zaguan.unizar.es/record/60871/files/TESIS-2017-027.pdf [Google Scholar]

11. Huang S., Dissanayake G. Convergence analysis for extended Kalman filter based SLAM; Proceedings of the 2006 IEEE International Conference on Robotics and Automation (ICRA 2006); Orlando, FL, USA. 15–19 May 2006; pp. 412–417. [Google Scholar]

12. Lowry S., Sünderhauf N., Newman P., Leonard J.J., Cox D., Corke P., Milford M.J. Visual place recognition: A survey. IEEE Trans. Robot. 2015;32:1–19. doi:10.1109/TRO.2015.2496823. [CrossRef] [Google Scholar]

13. Masone C., Caputo B. A survey on deep visual place recognition. IEEE Access. 2021;9:19516–19547. doi:10.1109/ACCESS.2021.3054937. [CrossRef] [Google Scholar]

14. Chen Y., Gan W., Zhang L., Liu C., Wang X. A survey on visual place recognition for mobile robots localization; Proceedings of the 2017 14th Web Information Systems and Applications Conference (WISA); Liuzhou, China. 11–12 November 2017; pp. 187–192. [Google Scholar]

15. Bosse M., Zlot R. Place recognition using keypoint voting in large 3D LiDAR datasets; Proceedings of the 2013 IEEE International Conference on Robotics and Automation; Karlsruhe, Germany. 6–10 May 2013; pp. 2677–2684. [Google Scholar]

16. Steder B., Ruhnke M., Grzonka S., Burgard W. Place recognition in 3D scans using a combination of bag of words and point feature based relative pose estimation; Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems; San Francisco, CA, USA. 25–30 September 2011; pp. 1249–1255. [Google Scholar]

17. Steder B., Rusu R.B., Konolige K., Burgard W. NARF: 3D range image features for object recognition; Proceedings of the Workshop on Defining and Solving Realistic Perception Problems in Personal Robotics at the IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2020; Las Vegas, NV, USA. 24 October 2020–24 January 2021; p. 2. [Google Scholar]

18. Zaganidis A., Zerntev A., Duckett T., Cielniak G. Semantically Assisted Loop Closure in SLAM Using NDT Histograms; Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Macau, China. 3–8 November 2019; pp. 4562–4568. [Google Scholar]

19. Granström K., Schön T.B., Nieto J.I., Ramos F.T. Learning to close loops from range data. Int. J. Robot. Res. 2011;30:1728–1754. doi:10.1177/0278364911405086. [CrossRef] [Google Scholar]

20. Kim G., Kim A. Scan context: Egocentric spatial descriptor for place recognition within 3D point cloud map; Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Madrid, Spain. 1–5 October 2018; pp. 4802–4809. [Google Scholar]

21. Lin J., Zhang F. A fast, complete, point cloud based loop closure for LiDAR odometry and mapping. arXiv. 20191909.11811 [Google Scholar]

22. Yang Y., Song S., Toth C. CNN-based place recognition technique for LIDAR SLAM. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020;44:117–122. doi:10.5194/isprs-archives-XLIV-M-2-2020-117-2020. [CrossRef] [Google Scholar]

23. Yin H., Wang Y., Ding X., Tang L., Huang S., Xiong R. 3D LiDAR-based global localization using siamese neural network. IEEE Trans. Intell. Transp. Syst. 2019;21:1380–1392. doi:10.1109/tit*.2019.2905046. [CrossRef] [Google Scholar]

24. Zhu Y., Ma Y., Chen L., Liu C., Ye M., Li L. GOSMatch: Graph-of-Semantics Matching for Detecting Loop Closures in 3D LiDAR data; Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Las Vegas, NV, USA. 24 October 2020–24 January 2021; pp. 5151–5157. [Google Scholar]

25. Vidanapathirana K., Moghadam P., Harwood B., Zhao M., Sridharan S., f*ckes C. Locus: LiDAR-based place recognition using spatiotemporal higher-order pooling; Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA); Xi’an, China. 30 May–5 June 2021; pp. 5075–5081. [Google Scholar]

26. Chen X., Läbe T., Milioto A., Röhling T., Vysotska O., Haag A., Behley J., Stachniss C. OverlapNet: Loop closing for LiDAR-based SLAM. arXiv. 20212105.11344 [Google Scholar]

27. Zhu Z., Yang S., Dai H., Li F. Loop Detection and Correction of 3D Laser-Based SLAM with Visual Information; Proceedings of the 31st International Conference on Computer Animation and Social Agents; Beijing, China. 21–23 May 2018; pp. 53–58. [Google Scholar]

28. Krispel G., Opitz M., Waltner G., Possegger H., Bischof H. Fuseseg: LiDAR point cloud segmentation fusing multi-modal data; Proceedings of the 2020 IEEE/CVF Winter Conference on Applications of Computer Vision; Snowmass, CO, USA. 1–5 March 2020; pp. 1874–1883. [Google Scholar]

29. Xie S., Pan C., Peng Y., Liu K., Ying S. Large-scale place recognition based on camera-LiDAR fused descriptor. Sensors. 2020;20:2870. doi:10.3390/s20102870. [PMC free article] [PubMed] [CrossRef] [Google Scholar]

30. Mur-Artal R., Montiel J.M.M., Tardos J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015;31:1147–1163. doi:10.1109/TRO.2015.2463671. [CrossRef] [Google Scholar]

31. Zhang X., Su Y., Zhu X. Loop closure detection for visual SLAM systems using convolutional neural network; Proceedings of the 2017 23rd International Conference on Automation and Computing (ICAC); Huddersfield, UK. 7–8 September 2017; pp. 1–6. [Google Scholar]

32. Yue H., Miao J., Yu Y., Chen W., Wen C. Robust Loop Closure Detection based on Bag of SuperPoints and Graph Verification; Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Macau, China. 3–8 November 2019; pp. 3787–3793. [Google Scholar]

33. Wang Y.T., Lin M.C., Ju R.C. Visual SLAM and moving-object detection for a small-size humanoid robot. Int. J. Adv. Robot. Syst. 2010;7:13. doi:10.5772/9700. [CrossRef] [Google Scholar]

34. Migliore D., Rigamonti R., Marzorati D., Matteucci M., Sorrenti D.G. Use a single camera for simultaneous localization and mapping with mobile object tracking in dynamic environments; Proceedings of the ICRA Workshop on Safe Navigation in Open and Dynamic Environments: Application to Autonomous Vehicles; Kobe, Japan. 12–17 May 2009; pp. 12–17. [Google Scholar]

35. Mousavian A., Košecká J., Lien J.M. Semantically guided location recognition for outdoors scenes; Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA); Seattle, WA, USA. 26–30 May 2015; pp. 4882–4889. [Google Scholar]

36. Bescos B., Fácil J.M., Civera J., Neira J. DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 2018;3:4076–4083. doi:10.1109/LRA.2018.2860039. [CrossRef] [Google Scholar]

37. He K., Gkioxari G., Dollár P., Girshick R. Mask R-CNN; Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); Venice, Italy. 22–29 October 2017; pp. 2961–2969. [Google Scholar]

38. Arandjelovic R., Gronat P., Torii A., Pajdla T., Sivic J. NetVLAD: CNN architecture for weakly supervised place recognition; Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Las Vegas, NV, USA. 27–30 June 2016; pp. 5297–5307. [Google Scholar]

39. Merrill N., Huang G. CALC2.0: Combining appearance, semantic and geometric information for robust and efficient visual loop closure; Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Macau, China. 3–8 November 2019; pp. 4554–4561. [Google Scholar]

40. Naseer T., Oliveira G.L., Brox T., Burgard W. Semantics-aware visual localization under challenging perceptual conditions; Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA); Singapore. 29 May–3 June 2017; pp. 2614–2620. [Google Scholar]

41. Munoz J.P., Dexter S. Improving Place Recognition Using Dynamic Object Detection. arXiv. 20202002.04698 [Google Scholar]

42. Noh H., Araujo A., Sim J., Weyand T., Han B. Large-scale image retrieval with attentive deep local features; Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); Venice, Italy. 22–29 October 2017; pp. 3456–3465. [Google Scholar]

43. An S., Zhu H., Wei D., Tsintotas K.A., Gasteratos A. Fast and incremental loop closure detection with deep features and proximity graphs. J. Field Robot. 2022;39:473–493. doi:10.1002/rob.22060. [CrossRef] [Google Scholar]

44. Hausler S., Garg S., Xu M., Milford M., Fischer T. Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition; Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); Nashville, TN, USA. 20–25 June 2021; pp. 14141–14152. [Google Scholar]

45. Jin S., Dai X., Meng Q. Loop closure detection with patch-level local features and visual saliency prediction. Eng. Appl. Artif. Intell. 2023;120:105902. doi:10.1016/j.engappai.2023.105902. [CrossRef] [Google Scholar]

46. Jin S., Chen L., Gao Y., Shen C., Sun R. Learning a deep metric: A lightweight relation network for loop closure in complex industrial scenarios. Chin. J. Electron. 2021;30:45–54. [Google Scholar]

Articles from Sensors (Basel, Switzerland) are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)