Visual Localization Domain for Accurate V-SLAM from Stereo Cameras

Di Salvo, Eleonora; Bellucci, Sara; Celidonio, Valeria; Rossini, Ilaria; Colonnese, Stefania; Cattai, Tiziana

doi:10.3390/s25030739

Open AccessArticle

Visual Localization Domain for Accurate V-SLAM from Stereo Cameras

by

Eleonora Di Salvo

,

Sara Bellucci

,

Valeria Celidonio

,

Ilaria Rossini

,

Stefania Colonnese

^*

and

Tiziana Cattai

Department of Information Engineering, Electronics and Telecommunications, Sapienza University of Rome, 00184 Rome, Italy

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(3), 739; https://doi.org/10.3390/s25030739

Submission received: 15 December 2024 / Revised: 14 January 2025 / Accepted: 24 January 2025 / Published: 26 January 2025

(This article belongs to the Special Issue Emerging Advances in Wireless Positioning and Location-Based Services)

Download

Browse Figures

Versions Notes

Abstract

:

Trajectory estimation from stereo image sequences remains a fundamental challenge in Visual Simultaneous Localization and Mapping (V-SLAM). To address this, we propose a novel approach that focuses on the identification and matching of keypoints within a transformed domain that emphasizes visually significant features. Specifically, we propose to perform V-SLAM in a VIsual Localization Domain (VILD), i.e., a domain where visually relevant feature are suitably represented for analysis and tracking. This transformed domain adheres to information-theoretic principles, enabling a maximum likelihood estimation of rotation, translation, and scaling parameters by minimizing the distance between the coefficients of the observed image and those of a reference template. The transformed coefficients are obtained from the output of specialized Circular Harmonic Function (CHF) filters of varying orders. Leveraging this property, we employ a first-order approximation of the image-series representation, directly computing the first-order coefficients through the application of first-order CHF filters. The proposed VILD provides a theoretically grounded and visually relevant representation of the image. We utilize VILD for point matching and tracking across the stereo video sequence. The experimental results on real-world video datasets demonstrate that integrating visually-driven filtering significantly improves trajectory estimation accuracy compared to traditional tracking performed in the spatial domain.

Keywords:

visual localization; circular harmonic functions; visually relevant features; stereo camera

1. Introduction

The ability to navigate and map unknown environments in real time is a crucial capability for autonomous systems [1,2]. Visual Simultaneous Localization and Mapping (VSLAM) enables devices such as robots, autonomous vehicles, and augmented reality (AR) platforms to achieve this by utilizing visual information from cameras to simultaneously construct a map of the environment while tracking their position within it [3,4]. As VSLAM technology has progressed, it has become increasingly important for dynamic, real-time applications, where systems must overcome challenges such as moving objects, fluctuating lighting conditions, and limited computational resources. For instance, in AR applications, SLAM enables the accurate overlay of virtual objects onto physical spaces, which requires precise localization and mapping under potentially difficult lighting or environmental conditions. Similarly, autonomous vehicles rely on SLAM to generate maps on-the-fly while adjusting to changes in the environment to ensure safe navigation [5]. SLAM methods have also advanced with improvements in sensor technology, processing power, and algorithmic techniques, which allow for higher accuracy and adaptability. However, real-world environments are rarely static and often demand sophisticated, adaptive SLAM solutions capable of handling dynamic conditions and external disturbances. These challenges underscore the need for innovative methods that can enhance SLAM’s robustness, accuracy, and efficiency in dynamic scenarios. In this work, we propose to perform simultaneous localization and mapping in a VIsual Localization Domain (VILD), i.e., a domain where visually relevant features are suitably represented for simultaneous localization and mapping (SLAM). To this aim, we consider a stereo camera acquisition system as illustrated in Figure 1, and we leverage the known properties of Fisher information to detect and recognize specific image patterns. Specifically, in [6], the authors demonstrate that transforming images into a domain defined by a basis of orthogonal Circular Harmonic Function (CHF) filters with specific radial profiles enables straightforward maximum likelihood localization of 2D patterns. In this domain, the maximum likelihood estimation of visual pattern translation and rotation is achieved using a quadratic loss function. Therefore, the output of Circular Harmonic Filters can be used as a meaningful domain for signal representation. The outputs from filters of different orders highlight visually relevant features. Furthermore, they appear directly in the maximum likelihood estimation of image transformation parameters, such as scale factors, rotation, or translation. Building on this, VILD-SLAM method adopts filtering based on two-dimensional CHF, leveraging both magnitude and phase information to refine feature localization and reduce key errors such as mean squared error and scale drift. The VILD-SLAM process consists of two primary stages:

Computation of VILD: VILD highlights visually relevant regions. Specifically, after applying the CHF to detect high-intensity interest points corresponding to prominent structural edges in the environment, we compare the output magnitude against a threshold for feature localization. Then, we refine the output phase by selecting only the most relevant points, thus identifying the directions of visual structures.
VILD feature extraction and tracking: This stage adopts VILD to identify abrupt changes of the local structure direction and use this information to extract keypoints to be used for tracking and localization.

This domain allows us to improve feature matching and tracking accuracy.

The incorporation of these filtering stages improves accuracy even with lower resolution images, such as those captured at larger distances. This has led to notable improvements in trajectory accuracy by aligning the estimated SLAM trajectory more closely with GPS data. The experimental results indicate that the proposed CHF-based method effectively reduces key errors, thereby providing a more accurate trajectory estimation and improved performance in dynamic environments.

2. Related Works

Traditional SLAM methods [7] generally rely on the assumption of static environments and utilize geometric techniques for localization and mapping, possibly exploiting application-specific constraints [8] or memory-efficient data representation [9]. While effective in controlled settings, these approaches often struggle with dynamic elements commonly encountered in real-world environments, such as moving objects or sudden lighting changes, as discussed in [10,11]. To overcome these limitations, recent research has focused on more adaptive SLAM methods that integrate artificial intelligence (AI), deep learning, and advanced hardware optimizations, enhancing SLAM’s robustness and accuracy in dynamic settings [12,13]. Systems like ORB-SLAM2 [14] and DFT-VSLAM [7] utilize advanced tracking techniques and dynamic feature extraction to improve performance in dynamic environments. Additionally, deep learning-based frameworks, such as AnyFeature-VSLAM [15], adaptively manage visual features across different scenarios, maintaining high accuracy and reliability (see [16] for a comprehensive survey).

VILD-SLAM advances the literature by leveraging Circular Harmonic Filters (CHFs) to improve feature detection and tracking, offering a robust alternative to existing methods. Unlike [17], which integrates points and lines in dynamic environments, VILD-SLAM approach focuses on CHF coefficients for noise-robust edge and orientation detection, optimizing trajectory accuracy. Similarly, while [18] introduces planar constraints for road-based SLAM, VILD-SLAM excels in extracting image transformations under varying conditions, enhancing adaptability. By refining stereo feature alignment compared to standard methods, VILD-SLAM complements and extends insights from [19] and trajectory evaluations in [20].

3. System Model

Herein, we consider the system outlined in Figure 2. The acquisition and tracking step reflect the architecture in [14]. The acquisition is a stereo-type setup that relies on two sensors: a left camera and a right camera. The input video sequences are denoted as

I_{m n}^{(L)} [t], I_{m n}^{(R)} [t], m = 0, \dots M - 1, n = 0, \dots N - 1, t = 0, \dots

(1)

where

M \times N

is the sequence spatial resolution and t represents the t-th temporal index of the video.

The input video sequences are then represented into a transformed domain, by CHF filtering, as detailed in Section 3, using the notation summarized in Table 1. At each frame, two primary steps are performed. The first is visual feature extraction, where the proposed CHF filtering extracts structural information from the image using a complex representation that includes magnitude and phase components. This enhances the identification and tracking of keypoints, improving accuracy, as demonstrated in experimental results. The second step is localization and mapping, where a 3D point map is initialized from a stereo image disparity map. Features are then tracked and updated, potentially recalculating disparity until revisiting a previously explored position in the environment.

Table 1. Main notation.

Notation	Description
$I_{m n}^{(L)} [t], I_{m n}^{(R)} [t]$	Original Left and right images at time $t, t = 0, \dots$
$h_{m n}^{(l)}$	Impulse response of the visual feature extraction filter
${\hat{I}}_{m n}^{(L)} [t], {\hat{I}}_{m n}^{(R)} [t]$	Complex images at the output of the filter $h_{m n}^{(l)} \|_{l = 1}$
$M_{m n}^{(L)} [t], M_{m n}^{(R)} [t]$	Module of ${\hat{I}}_{m n}^{(L)} [t], {\hat{I}}_{m n}^{(R)} [t]$
$φ_{m n}^{(L)} [t], φ_{m n}^{(R)} [t]$	Phase of ${\hat{I}}_{m n}^{(L)} [t], {\hat{I}}_{m n}^{(R)} [t]$
$ψ_{m n}^{(L)} [t] ψ_{m n}^{(R)} [t]$	Phase threshold based on the magnitude components

Figure 2. The figure illustrates the pipeline system, from acquiring the stereo video sequence pair to the final output: the estimated trajectory, possibly combined with a 3D representation of the ambient. Three main steps can be identified: visual feature extraction using the CHF filter, and CHF-based tracking, followed by SLAM procedures for data mapping and possible loop closure in case the acquisition returns to previously observed areas.

4. Visual Features in the CHF Domain: A Review

The literature has shown that the output of Circular Harmonic Filters can be used as a meaningful domain for signal representation. The outputs from filters of different orders highlight visually relevant features. Furthermore, they appear directly in the maximum likelihood estimation of image transformation parameters, such as scale factors, rotation, or translation. In this section, we review the theory, while in the next, we explain how to apply it to the matching and tracking of points between the right and left sequences of a stereo video sequence.

The extraction of visual features has been widely applied in image processing, because it can detect representative image features, such as edges, lines, and intersections.

This procedure provides valuable information about the structures of the output image; it highlights edges while simultaneously measuring their intensity (magnitude) and direction (orientation). Among others, Circular Harmonic Filters (CHFs), formerly introduced in a previous study, exhibit theoretical properties related to how they characterize the information associated with the visually relevant structure.

The mathematical formulation of CHFs, and their impact for visual feature extraction and tracking is presented below.

Let us recall the definition of the CHF in a 2D continuous domain. In polar coordinates (r,

θ

), representing the distance from the origin and the angle concerning the x-axis, respectively, the CHF of order k is the complex filter defined by the Formula (2):

h^{(k)} (r, θ) = L^{(k)} (r) e^{j k θ}

(2)

The functions

L^{(k)} (r)

in Equation (2) are isotropic Gaussian-weighted kernels, known as Laguerre–Gauss functions:

L^{(k)} (r) = r^{k} e^{- \frac{r^{2}}{a_{k}}}

which satisfy isomorphism with the frequency space. The variable k defines the angular structure of the model: for

k = 0

, the CHF output is a low-pass version of the input image. As the order k increases, CHFs highlight increasingly complex directional structures in the visual data, such as edges (for

k = 1

), lines (for

k = 2

), bifurcations (for

k = 3

), and intersections (for

k = 4

). In the followings, we refer to first-order CHF (

l = 1

), which has a band-pass behaviour. In the frequency domain, the following formula, Formula (3) stands:

| H^{(k)} (ω_{1}, ω_{2}) | = {(\sqrt{ω_{1}^{2} + ω_{2}^{2}})}^{k} \cdot \frac{a_{k}}{N} \cdot e^{\frac{1}{2} {(\frac{a_{k}}{N})}^{2} (ω_{1}^{2} + ω_{2}^{2})},

(3)

with

ω_{1}, ω_{2} = 0, 2 π / N, \dots 2 π (N - 1) / N

. The module

| H {(ω_{1}, ω_{2})}^{(k)} |

results from the product of two factors: a radial factor, corresponding to a derivative action, and a Gaussian low-pass factor. The phase is written as follows:

∠ H (ω_{1}, ω_{2}) = k \cdot {arctan}_{2} (ω_{1}, ω_{2})

(4)

The CHF filter output is meaningful in revealing the visually relevant structure in an image. A visual example using the first-order CHF (

k = 1

) on a real image

I_{m n}^{(L)} [t]

at

k = 0

is shown. Figure 3 (top) represents the original frame before the application of the CHF filter, while in Figure 3 (bottom) we report the magnitude

M_{m n}^{(L)} [t]

and the phase

φ_{m n}^{(L)} [t]

of the CHF output. Notably, the magnitude highlights the strength of the edge and the phase identifies its orientation. Recently, the CHF filter has been extended to the non-Euclidean domain, over manifolds such as those underlying point-cloud data [21].

The CHF filters play also a relevant role in maximum likelihood (ML) estimation of translation, scaling, and rotation parameters for natural images, as demonstrated in [6]. The rational behind this is as follows. Let

g_{m n}

denote a reference image, and let

g_{m n}

be observed in presence of a scaling by a factor

α

, a rotation by an angle

β

, and a translation by a displacement

Δ

. Let us denote by

f_{m n} = T {g_{m n}; α, β, Δ} + w_{m n}

the observation of the transformed version of

g_{m n}

in presence of an additive white Gaussian noise independent on

g_{m n}

. The ML estimate of the parameters

α, β, Δ

is obtained by maximizing the log-likelihood function

Λ (α, β, Δ | f_{m n})

of the observed image

f_{m n}

with respect to the unknown parameters, i.e.,:

({\hat{α}}_{M L}, {\hat{β}}_{M L}, {\hat{Δ}}_{M L}) = arg max_{α, β, Δ} Λ (α, β, Δ | f_{m n})

In white Gaussian noise, the ML estimate are directly obtained by minimizing Euclidean distance

({\hat{α}}_{M L}, {\hat{β}}_{M L}, {\hat{Δ}}_{M L}) = arg min_{α, β, Δ} | | f_{m n} - T {g_{m n}; α, β, Δ} {| |}^{2}

The minimization of the Euclidean distance can also be obtained in a transformed orthonormal space. Let us recall the following mathematical results:

(i): The development of a generic function $u (r, θ)$ in a series of orthonormal polar basis functions [6,22] as:

$u (r, θ) = \sum_{n} u_{n} (r) e^{j n θ},$

being $u_{n} (r)$ the Fourier transform $u_{n} (r) = F_{θ} {u (r, θ)}$ and
(ii): The development of $u_{n} (r)$ in a series of Laguerre–Gauss functions [6,23]

$u_{n} (r) = \sum_{k} U_{k}^{(n)} L^{(k)} (r) .$

The application of these results in the discrete domain allows us to compute the ML cost function in a transformed space. As shown in [6], the ML estimate of the parameters

α, β, Δ

is found by minimizing the distance of the transformed coefficients of the observed image

f_{m n}

and the transformed version of the template

T {g_{m n}; α, β, Δ}

. These coefficients can in turn be obtained at the output of CHF filters of different orders.

5. Visual Localization Domain

Expanding on the above-described properties, to build the Visual Localization Domain, we resort to a first-order approximation of the image-series representation, computing the first-order coefficients directly through the application of the first-order CHF. This corresponds to representing the image in a theoretically grounded, visually relevant domain. We leverage this domain for point matching and tracking over the stereo video sequence. We have seen that Circular Harmonic Filters (CHFs) enable a visually relevant representation of signals while simultaneously providing a domain where the maximum likelihood estimation of signal parameters can be directly achieved by minimizing a quadratic distance in the coefficient domain. Here, we focus solely on the coefficients’ output by the first-order filter. For the point matching and tracking problem under consideration, we can assume that one of the stereo views serves as the reference for matching, while the goal is to identify the most similar version in the other view. This search is conducted not in the original domain but in the coefficient domain of the filter output. These coefficients allow for the identification of parameters such as rotation, as well as translation and scale, in terms of minimal mean squared error. As a result, they facilitate the search for similarities under such local image transformations. Furthermore, they are inherently robust to noise due to the low-pass effect typical of the Gaussian profile at high frequencies. The procedure resulting from these considerations is described below.

Applying the CHF filtering to the input sequences generates two complex sequences

\{\begin{matrix} {\hat{I}}_{m n}^{(L)} [t] = I_{m n}^{(L)} [t] * h_{m n}^{(k)} \\ {\hat{I}}_{m n}^{(R)} [t] = I_{m n}^{(R)} [t] * h_{m n}^{(k)} \end{matrix}

obtained by convolving the luminance of the left and right original images with the impulse response

h^{(k)} (r, θ)

. Therefore, the filtered images are characterized in terms of the modules and phase

M_{m n}^{(L)} [t], M_{m n}^{(R)} [t]

,

φ_{m n}^{(L)} [t], φ_{m n}^{(R)} [t]

. The function [6,24] returns the magnitude and phase of the filtered image, which are useful for extracting the edges of objects present in the reference scene.

The outcome is a complex image in which each edge is associated with a high amplitude value of the magnitude, while the phase provides useful information on the spatial directions of the visually relevant image components. In contrast, uniform areas correspond to low-intensity and pseudo-random phase values [6,24]. Therefore, it can be stated that the CHF filter emphasises the presence of edges and measures their strength and orientation.

The next phase involves applying the following procedure, developed as follows. Firstly, the histogram

\hat{p} (μ)

of the normalised magnitude

μ_{m n}^{(L)} [t] = \frac{M_{m n}^{(L)} [t]}{max_{m, n} M_{m n}^{(L)} [t]}

at

t = 0

is computed. Since the magnitude highlights selected regions (edges), the histogram is typically multimodal, with one peak representing real edges and a second peak, near zero, representing high-frequency noise components. Hence, the areas relevant for point extraction and tracking can be highlighted by suitably selecting a threshold value

M_{0}

on the magnitude

M (ω_{1}, ω_{2})

. Specifically, the relevant areas are obtained as the set of points such that

μ_{m n}^{(L)} [t] > M_{0}

which means that it is nonzero only at frequencies where the normalized magnitude values are above the threshold.

We improve the estimate of the orientation information by updating the phase based on the magnitude, and in particular computing the stereo visual phase sequences

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

:

\{\begin{matrix} ψ_{m n}^{(L)} [t] = φ_{m n}^{(L)} [t], M_{m n}^{(L)} [t] > M_{0}; 0 otherwise \\ ψ_{m n}^{(R)} [t] = φ_{m n}^{(R)} [t], M_{m n}^{(R)} [t] > M_{0}; 0 otherwise \end{matrix}

An example of feature extracted at the output of the CHF filter appears in Figure 4, showing the matched features on the magnitude maps

M_{m n}^{(L)} [t], M_{m n}^{(R)} [t]

and the phase maps

φ_{m n}^{(L)} [t], φ_{m n}^{(R)} [t]

(bottom) for frame 100 (

θ = 0.1

). Although meaningful, the magnitude map reports just an edge intensity information, while the phase map is rather noise. These limitations are overcome by the VILD maps

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

.

An interpretation of the role of

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

is provided in Figure 5, where we recognize that

ψ_{m n}^{(L)} [t]

is different from zero only in the correspondence of structured areas, and the value at each

(m, n)

pair represents the direction of the edge at the corresponding pixel in

I_{m n}^{(L)} [t]

.

Remarks

A few remarks are in order. Firstly, VILD computational complexity is very low, since it reduces to filtering and thresholding techniques that can be realized using hardware acceleration or parallel multi-core processing on different image regions. Specifically, VILD requires a filtering operation, and the calculation of VILD introduced an additional complexity of

Δ C \approx \underset{F F T / I F F T}{\underset{︸}{2 N^{2} l o g (N)}} + \underset{F i l t e r}{\underset{︸}{ρ (a_{1}) N^{2}}}

to realize the filtering of an

N \times N

image in the Fourier domain, where, due to the band-pass nature of CHF filtering, only a fraction

ρ (a_{1}) < 1

of coefficients have to be calculated; real-time solutions for this kind of computation are available in the literature [25].

In addition, VILD is determined based on two parameters, namely the first order CHF filter parameter

a_{1}

and the threshold

θ

. The first can be a priori, assigned by selecting its value relative to the resolution, while the threshold can be selected by histogram analysis of the CHF output module.

Finally, herein we have adopted VILD within a recursive algorithm. Still, the approach behind VILD is very general, and can be generalized also to deep learning approaches for VSLAM purposes, such as [26,27]. In particular, the deep learning algorithm can be applied directly in the VILD domain, provided careful training and hyperparameter optimization is provided. Notice that, in principle, VILD is not sufficient for the full reconstruction of images in the original domain, since at least a low-pass representation as that provided by Parameter’s estimation 0-th order CHF filtering (Gaussian filtering) is needed. Then, VILD representation can be provided to the deep neural network in parallel with images represented in the spatial domain. This relevant development is left for future studies.

6. Experimental Results

The effectiveness of the VILD-SLAM algorithm was evaluated using real-world stereo camera datasets. We present results of VSLAM in the VILD domain, based on the sequences

ψ_{m n}^{(L)} [t]

and

ψ_{m n}^{(R)} [t]

. For comparison, we report also results of the state-of-the-art method in [14], in the implementation available at [28], and the results obtained when performing tracking on

M_{m n}^{(L)} [t], M_{m n}^{(R)} [t]

.

The experiments use a stereo video sequence from the dataset in [29]. These datasets consist of 1073 stereo image pairs, captured in July under sunny conditions. The selected frames, relative to the 5-th run, are used to construct and compare trajectories. Following a training phase, a robot equipped with a stereo camera autonomously traversed a 160-m route, in a natural landscape with occasional artificial structures, over repeated runs (see Figure 6). Stereo keyframes, defined as the frames containing a sufficient number of valid keypoints for mapping purposes, were recorded approximately every 0.2 m. The true trajectory follows a 3D path, while the GPS trajectory and the estimated trajectory refer to a 2D projection. It is important to note that the GPS trajectory is known to be affected by estimation errors, resulting in random fluctuations. However, since these errors are typically smaller than those affecting the V-SLAM algorithm, we will consider the 2D GPS trajectory—disregarding changes in altitude—as the ground truth for validating the V-SLAM algorithm. Validation is performed by comparing the GPS locations with the locations estimated by V-SLAM across a set of

N_{p}

keypoints, which are assumed to be reference points.

The stereo camera configuration features a 0.24 m baseline and a resolution of

512 \times 384

. Images are captured at 16 Hz, with the full stereo stream subsequently downsampled to extract stereo keyframes at intervals of approximately 0.2 m traveled [29]. The simulation parameters are as follows: (i) the maximum horizontal displacement between corresponding keypoints was limited to 48 pixels, equating to

9.38 %

of the image width; (ii) the image pyramid employed a scale factor of 1.1 for size reduction; (iii) the pyramid included 10 levels.

Building on the framework outlined above, we assess the accuracy of the trajectory estimation compared to GPS benchmarks using VILD-based VSLAM. Applying the CHF filter to the luminance channel of the original images leads to the magnitude and phase components

M_{m n}^{(L)} [t], M_{m n}^{(R)} [t]

, and

φ_{m n}^{(L)} [t], φ_{m n}^{(R)} [t]

enhancing edges and their direction, from which we compute the VILD maps

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

, where keypoints are well concentrated in visually relevant regions. In Figure 7 we show the VILD maps

ψ_{m n}^{(L)} [t]

,

ψ_{m n}^{(R)} [t]

, and keypoints matches at frame

t = 0

.

Stemming from these calculations, the tracking is then performed in the VILD domain. To evaluate the error between the estimated and GPS trajectories, two performance metrics are used: mean square error (MSE), scale drift (SD).

The MSE quantifies the average Euclidean distance, on the

(x, y)

plane, between the

N_{p} \times 2

matrix

P

, collecting the

N_{p}

reference key data points

p_{i}

obtained by the GPS trajectories and the

N_{p} \times 2

matrix

\hat{P}

of their estimated counterparts

{\hat{p}}_{i}

. The MSE is computed as:

MSE = \frac{1}{N_{p}} | | P - \hat{P} {| |}_{F}^{2} = \frac{1}{N_{p}} \sum_{i = 1}^{N_{p}} | | p_{i} - {\hat{p}}_{i} {| |}^{2}

. The scale drift

S D

, a secondary metric, measures systematic scale deviations between estimated and GPS trajectories. It is defined as a function of the scale factor

S_{H T}

estimated through Helmert transformation, i.e., the scale factor computed as:

SD = S_{H T} - 1, S_{H T} = arg min_{s} | | P - s R \hat{P} + T {| |}^{2}

quantifies the systematic deviation of the estimated trajectory from the true GPS trajectory: negative (

S_{H T} < 1

) or positive SD values (

S_{H T} > 1

) indicate the need for compression or expansion to align the estimated trajectory with the GPS path.

Table 2 reports the MSE of the VILD-SLAM based on the maps

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

. The metrics shown are the mean square error (MSE [m²]), root mean square error (RMSE [m]) and scale drift (SD[]). In addition to the results for the complete path, metrics are provided for the first segment (from the beginning of the path to the end of the curve) and the second segment (from the end of the curve to the closure of the path). This segmentation enables a more precise assessment of VILD-SLAM performance. Transitory keypoints, i.e., points for which a match is found on less then

N_{M}

consecutive frames, are discarded [14]. We report results for two different values of the threshold

θ

, namely

θ = 0.1, N_{M} = 20

and

θ = 0.08, N_{M} = 5

. For the sake of comparison, in Table 3, we report the same metrics for the estimates obtained by the stereo ORB2-VSLAM algorithm in [14]. The VILD-SLAM shows improved performances with respect to the literature in both conditions.

Figure 8 illustrates the performance achieved by VILD-SLAM operating on

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

showing the ground-truth GPS (green), estimated and optimized trajectories (red, pink, respectively). The estimated 3D key points locations are also represented by colored dots [30,31]. The figure refers to the case of

θ = 0.1, N_{M} = 20

and

θ = 0.08, N_{M} = 5

, respectively. For the sake of comparison, the trajectories obtained by the state-of-the-art algorithm in [14] are also reported.

Implementing the CHF filter yielded both the magnitude and phase of the filtered image, followed by the magnitude thresholding and computation of the VILD maps

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

. The threshold can be selected by analysis of the magnitude histogram, illustrated in Figure 9. We recognize the typical bimodal structure, with small values corresponding to noisy components in flat image areas, while large values correspond to sparse image structures.

A further analysis is conducted by degrading the stereo video sequences using a moving average filter defined over a circular support of radius

R = 1, 2, 3, 4

. This condition is taken as a proxy of a reduced spatial resolution condition, such as that encountered when the video sequence are acquired at a larger distance from the scene, or in harsh acquisition conditions, e.g., rain. The accuracy of VILD-SLAM using

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

is evaluated terms of the MSE and SD performance metrics. Figure 10 presents the MSE and SD as a function of the low-pass filter radius. The blue bars represent the MSE, while the orange bars correspond to the absolute value of SD (|SD|). The algorithm in [14] could not complete the analysis, even for different parameter settings, due to the increased difficulty in valid keypoint identification in the presence of image blur. For the sake of comparison, we report the MSE and |SD| values for the algorithm operating on the original, unblurred sequence, indicated by the horizontal lines. Specifically, we reported the results obtained by using the ORB-VSLAM2 method with BRISK features and SIFT features [32,33]. This figure demonstrates that the adoption of VILD for tracking also enables VSLAM on blurred images, maintaining meaningful trajectory estimations at different radii.

We now assess the performance of VILD-SLAM in noisy conditions, specifically when the images are acquired under an additive white Gaussian noise. Figure 11 shows the mean squared error (MSE, left axis) and scale drift (SD, right axis) as a function of the SNR(dB). To better frame the performances of VILD-SLAM, we also reported the results obtained by using the ORB-VSLAM2 method with BRISK features and SIFT features [32,33]. We recognize that it outperforms the state-of-the-art competitors. All together, the results show the improvement of VILD in terms of accuracy and resilience.

An interpretation of these results is given in Figure 12, where (first row) we show an original image

I_{m n}^{(L)} [t]

, its CHF output module

M_{m n}^{(L)} [t]

, and the thresholded phase

ψ_{m n}^{(L)} [t]

(

t = 30

), and then (second row) we highlight some details of the captured image as they appear in the original domain and the VILD domain (

I_{m n}^{(L)} [t], ψ_{m n}^{(L)} [t], t = 30

). We recognize that the VILD domain highlights the contribution in structured areas only, implicitly performing a kind of background subtraction. Therefore, using the VILD-SLAM approach, variations occurring within non-structured areas are inherently rejected, making the approach noise resilient. This behavior is beneficial also in dynamic environments where a number of non-stationary features, e.g., illumination, change throughout the VSLAM.

A few remarks are in order. The VILD-SLAM is dynamically aware, in the sense that it represents the scene where noise is rejected; still, it does not explicitly account for moving object, and this is left for further study. To sum up, VILD-SLAM shows the potential to improve real-time state-of-the-art solutions. The ability to reject noise components suggests its integration within a deep learning-based system; this relevant point is out of the scope of the paper and it is left for further studies.

7. Conclusions

In conclusion, this work introduced VILD-SLAM, a novel approach for Visual Simultaneous Localization and Mapping that leverages Circular Harmonic Filters (CHFs) to improve trajectory estimation from stereo camera images. By transforming images into the Visual Localization Domain (VILD), the method enhances visually significant features such as edges and orientations while reducing the impact of noise and uniform regions. This transformation improves feature matching and tracking, addressing key challenges in localization and mapping tasks. The CHF filtering process provides both magnitude and phase information, enabling the extraction of critical features and refinement of orientation estimations. This approach reduces mean squared error (MSE) and mitigates scale drift (SD), leading to trajectory estimations closely aligned with ground-truth GPS data. Experimental evaluations on stereo datasets demonstrated that VILD-SLAM outperforms traditional spatial-domain methods, particularly in complex visual environments. While CHF filtering introduces some computational overhead, the significant improvements in accuracy validate the trade-off. Future work will explore integrating VILD as an input domain for deep neural networks to enhance learning-based SLAM systems. Additional efforts will focus on optimizing efficiency and incorporating multi-modal sensor data for broader applicability in diverse scenarios.

Author Contributions

Conceptualization, T.C. and S.C.; methodology, T.C. and E.D.S.; software, S.B., V.C. and I.R.; validation, S.B., V.C. and I.R.; writing—original draft preparation, S.B., V.C., I.R. and E.D.S.; writing—review and editing, T.C. and S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the European Union—Next Generation EU under the Italian National Recovery and Resilience Plan (NRRP), Mission 4, Component 2, Investment 1.3, CUP B53C22004050001, partnership on “Telecommunications of the Future” (PE00000001—program “RESTART”) in the NetWin project.

Data Availability Statement

The data used in this study are available in the public domain UTIAS Long-Term Localization and Mapping Dataset at http://asrl.utias.utoronto.ca/datasets/2020-vtr-dataset/.

Conflicts of Interest

The authors declare no conflicts of interest.

References

He, N.; Yang, Z.; Bu, C.; Fan, X.; Wu, J.; Sui, Y.; Que, W. Learning Autonomous Navigation in Unmapped and Unknown Environments. Sensors 2024, 24, 5925. [Google Scholar] [CrossRef]
Di Salvo, E.; Beghdadi, A.; Cattai, T.; Cuomo, F.; Colonnese, S. Boosting UAVs Live Uplink Streaming by Video Stabilization. IEEE Access 2024, 12, 121291–121304. [Google Scholar] [CrossRef]
Chen, W.; Shang, G.; Ji, A.; Zhou, C.; Wang, X.; Xu, C.; Li, Z.; Hu, K. An overview on visual slam: From tradition to semantic. Remote Sens. 2022, 14, 3010. [Google Scholar] [CrossRef]
Taketomi, T.; Uchiyama, H.; Ikeda, S. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 16. [Google Scholar] [CrossRef]
Zheng, S.; Wang, J.; Rizos, C.; Ding, W.; El-Mowafy, A. Simultaneous localization and mapping (slam) for autonomous driving: Concept and analysis. Remote Sens. 2023, 15, 1156. [Google Scholar] [CrossRef]
Neri, A.; Jacovitti, G. Maximum likelihood localization of 2-d patterns in the gauss-laguerre transform domain: Theoretic framework and preliminary results. IEEE Trans. Image Process. 2004, 13, 72–86. [Google Scholar] [CrossRef]
Cai, D.; Li, R.; Hu, Z.; Lu, J.; Li, S.; Zhao, Y. A comprehensive overview of core modules in visual SLAM framework. Neurocomputing 2024, 590, 127760. [Google Scholar] [CrossRef]
Zhang, R.; Worley, R.; Edwards, S.; Aitken, J.; Anderson, S.R.; Mihaylova, L. Visual Simultaneous Localization and Mapping for Sewer Pipe Networks Leveraging Cylindrical Regularity. IEEE Robot. Autom. Lett. 2023, 8, 3406–3413. [Google Scholar] [CrossRef]
Zhang, X.; Dong, J.; Zhang, Y.; Liu, Y.H. MS-SLAM: Memory-efficient visual SLAM with sliding window map sparsification. J. Field Robot. 2024. [Google Scholar] [CrossRef]
Xu, K.; Hao, Y.; Yuan, S.; Wang, C.; Xie, L. Airslam: An efficient and illumination-robust point-line visual slam system. arXiv 2024, arXiv:2408.03520. [Google Scholar]
Hu, Z.; Qi, W.; Ding, K.; Liu, G.; Zhao, Y. An adaptive lighting indoor VSLAM with limited on-device resources. IEEE Internet Things J. 2024, 11, 28863–28875. [Google Scholar] [CrossRef]
Wang, Y.; Tian, Y.; Chen, J.; Xu, K.; Ding, X. A survey of visual SLAM in dynamic environment: The evolution from geometric to semantic approaches. IEEE Trans. Instrum. Meas. 2024, 73, 2523221. [Google Scholar] [CrossRef]
Tenzin, S.; Rassau, A.; Chai, D. Application of event cameras and neuromorphic computing to VSLAM: A survey. Biomimetics 2024, 9, 444. [Google Scholar] [CrossRef] [PubMed]
Mur-Artal, R.; Tardós, J.D. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Trans. Robot. 2017, 33, 1255–1262. [Google Scholar] [CrossRef]
Fontan, A.; Civera, J.; Milford, M. AnyFeature-VSLAM: Automating the Usage of Any Chosen Feature into Visual SLAM. In Proceedings of the Robotics: Science and Systems, Delft, The Netherlands, 15–19 July 2024; Volume 2. [Google Scholar]
Huang, K.; Zhang, S.; Zhang, J.; Tao, D. Event-based simultaneous localization and mapping: A comprehensive survey. arXiv 2023, arXiv:2304.09793. [Google Scholar]
Zhang, B.; Dong, Y.; Zhao, Y.; Qi, X. Dynpl-slam: A robust stereo visual slam system for dynamic scenes using points and lines. IEEE Trans. Intell. Veh. 2024, 1–13. [Google Scholar] [CrossRef]
Zhu, Y.; An, H.; Wang, H.; Xu, R.; Wu, M.; Lu, K. RC-SLAM: Road Constrained Stereo Visual SLAM System Based on Graph Optimization. Sensors 2024, 24, 536. [Google Scholar] [CrossRef]
Fan, Y.; Wang, Z.; Zhang, P. Stereo Visual SLAM Based on Plane Features in Road Environments. In Proceedings of the 2024 10th International Conference on Electrical Engineering, Control and Robotics (EECR), Guangzhou, China, 19–31 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 291–297. [Google Scholar]
Paul, S.; Kumar, C.H.; Bongale, C.A. Enhancing Autonomous Navigation: A Visual SLAM Approach. J. Phys. Conf. Ser. 2024, 2748, 012008. [Google Scholar] [CrossRef]
Cattai, T.; Delfino, A.; Scarano, G.; Colonnese, S. VIPDA: A Visually Driven Point Cloud Denoising Algorithm Based on Anisotropic Point Cloud Filtering. Front. Signal Process. 2022, 2, 842570. [Google Scholar] [CrossRef]
Baddour, N. Discrete Two-Dimensional Fourier Transform in Polar Coordinates Part I: Theory and Operational Rules. Mathematics 2019, 7, 698. [Google Scholar] [CrossRef]
Kennedy, H.L. On the realization and analysis of circular harmonic transforms for feature detection. J. Real-Time Image Proc. 2021, 18, 1621–1633. [Google Scholar] [CrossRef]
Colonnese, S.; Randi, R.; Rinauro, S.; Scarano, G. Fast image interpolation using Circular Harmonic Functions. In Proceedings of the 2010 2nd European Workshop on Visual Information Processing (EUVIP), Paris, France, 5–7 July 2010; pp. 114–118. [Google Scholar] [CrossRef]
Khokhriakov, S.; Manumachu, R.R.; Lastovetsky, A. Performance optimization of multithreaded 2d fast fourier transform on multicore processors using load imbalancing parallel computing method. IEEE Access 2018, 6, 64202–64224. [Google Scholar] [CrossRef]
Cai, D.; Li, S.; Qi, W.; Ding, K.; Lu, J.; Liu, G.; Hu, Z. DFT-VSLAM: A Dynamic Optical Flow Tracking VSLAM Method. J. Intell. Robot. Syst. 2024, 110, 135. [Google Scholar] [CrossRef]
Lipson, L.; Teed, Z.; Deng, J. Deep Patch Visual SLAM. In Computer Vision—ECCV 2024, Proceedings of the 18th European Conference, Milan, Italy, 29 September–4 October 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2025; Volume 15060. [Google Scholar] [CrossRef]
Stereo Visual Simultaneous Localization and Mapping. Matlab Library. Available online: https://it.mathworks.com/help/vision/ug/stereo-visual-simultaneous-localization-mapping.html (accessed on 24 January 2025).
UTIAS in the Dark and Multiseason Datasets, University of Toronto Institute for Aerospace Studies. Available online: http://asrl.utias.utoronto.ca/datasets/2020-vtr-dataset/ (accessed on 24 January 2025).
Richard, H.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2003. [Google Scholar]
Yan, C.; Qu, D.; Xu, D.; Zhao, B.; Wang, Z.; Wang, D.; Li, X. GS-SLAM: Dense visual SLAM with 3D Gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 19595–19604. [Google Scholar]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2564–2571. [Google Scholar]
Muja, M.; Lowe, D.G. Fast Matching of Binary Features. In Proceedings of the Conference on Computer and Robot Vision, CRV, Toronto, ON, Canada, 28–30 May 2012. [Google Scholar]

Figure 1. Acquisition geometry: Architecture and acquisition geometry, illustrating the capture of objects within the video scene. The diagram displays two frames, representing the left and right perspectives, highlighting the spatial relationship and overlapping fields of view between the frames. This setup is fundamental for analyzing depth, structure, and motion within the captured scene, showcasing the stereo alignment and relative orientation used during the recording process.

Figure 3. Original frame (a), magnitude

M_{m n}^{(L)} [t]

(b) and phase

φ_{m n}^{(L)} [t]

(c) of the CHF output at

t = 0

.

Figure 3. Original frame (a), magnitude

M_{m n}^{(L)} [t]

(b) and phase

φ_{m n}^{(L)} [t]

(c) of the CHF output at

t = 0

.

Figure 4. Results of threshold application (

θ = 0.1

) on frame 100 after CHF: matched features of magnitude (top), and matched features of phase (bottom).

Figure 4. Results of threshold application (

θ = 0.1

) on frame 100 after CHF: matched features of magnitude (top), and matched features of phase (bottom).

Figure 5. A visual representation of the meaning associated to

ψ_{m n}^{(L)} [t]

: the map is different from zero only in correspondence of structured areas, and the value at each

(m, n)

pair represents the direction of the edge at the corresponding pixel in

I_{m n}^{(L)} [t]

.

Figure 5. A visual representation of the meaning associated to

ψ_{m n}^{(L)} [t]

: the map is different from zero only in correspondence of structured areas, and the value at each

(m, n)

pair represents the direction of the edge at the corresponding pixel in

I_{m n}^{(L)} [t]

.

Figure 6. The selected stereo video sequences belong to the dataset [29], and were captured by a robot equipped with stereo cameras and with a GPS device. The robot follows a closed loop. The true trajectory follows a 3D path; the 2D GPS trajectory, disregarding changes in altitude, is assumed as a ground truth for VSLAM algorithm validation by comparison of the GPS location and the one estimated by V-SLAM over a set of

N_{p}

reference points.

Figure 6. The selected stereo video sequences belong to the dataset [29], and were captured by a robot equipped with stereo cameras and with a GPS device. The robot follows a closed loop. The true trajectory follows a 3D path; the 2D GPS trajectory, disregarding changes in altitude, is assumed as a ground truth for VSLAM algorithm validation by comparison of the GPS location and the one estimated by V-SLAM over a set of

N_{p}

reference points.

Figure 7. Juxtaposed VILD maps

ψ_{m n}^{(L)} [t]

,

ψ_{m n}^{(R)} [t]

observed at the output of the CHF application, and keypoints matches at frame

t = 0

.

Figure 7. Juxtaposed VILD maps

ψ_{m n}^{(L)} [t]

,

ψ_{m n}^{(R)} [t]

observed at the output of the CHF application, and keypoints matches at frame

t = 0

.

Figure 8. (a) Estimated and optimized trajectories (red, pink, respectively) obtained by VILD-SLAM operating on

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

with

θ = 0.1, N_{M} = 20

and (b)

θ = 0.08, N_{M} = 5

, and ground-truth GPS trajectory (green). For the sake of comparison the (c) estimated and optimized trajectories (red, pink) obtained by the state-of-the-art algorithm in [14] are also reported.

Figure 8. (a) Estimated and optimized trajectories (red, pink, respectively) obtained by VILD-SLAM operating on

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

with

θ = 0.1, N_{M} = 20

and (b)

θ = 0.08, N_{M} = 5

, and ground-truth GPS trajectory (green). For the sake of comparison the (c) estimated and optimized trajectories (red, pink) obtained by the state-of-the-art algorithm in [14] are also reported.

Figure 9. Histogram for threshold value evaluation: the abscissa reports the values assumed by

M_{m n}^{(L)} [t]

at

t = 0

, while the ordinate represents the value occurrences. We recognize the typical bimodal structure, with small values corresponding to noisy components in flat image areas while large values correspond to sparse image structures.

Figure 9. Histogram for threshold value evaluation: the abscissa reports the values assumed by

M_{m n}^{(L)} [t]

at

t = 0

, while the ordinate represents the value occurrences. We recognize the typical bimodal structure, with small values corresponding to noisy components in flat image areas while large values correspond to sparse image structures.

Figure 10. Mean squared error (MSE, left axis) and scale drift (SD, right axis) as a function of the low-pass (LP) filter radius. The blue bars represent the MSE, while the orange bars correspond to the absolute value of SD (|SD|). The horizontal lines indicate the reference MSE and |SD| values for the baseline ORB2-VSLAM method without any degradation; specifically, the solid, dashed, and dotted lines refer to the performance achieved using ORB, BRISK and SIFT features. This figure demonstrates the relationship between filter radius and trajectory performance, highlighting the impact of the image spatial bandwidth on the method accuracy.

Figure 11. Mean squared error (MSE, left axis) and scale drift (SD, right axis) as a function of the SNR(dB). The blue bars represent the MSE, while the orange bars correspond to the absolute value of SD (|SD|). The horizontal lines indicate the reference MSE and |SD| values for the baseline ORB2-VSLAM method without any degradation; specifically, the solid, dashed, and dotted lines refer to the performance achieved using ORB, BRISK, and SIFT features, respectively. This figure demonstrates the relationship between SNR and trajectory estimation performance.

Figure 12. The figure illustrates: (first row) the original image

I_{m n}^{(L)} [t]

, the CHF output module

M_{m n}^{(L)} [t]

and the thresholded phase

ψ_{m n}^{(L)} [t]

(

t = 30

), and (second row) the details of the captured image in the red and green boxes as they are represented in the original domain and the VILD domain (

I_{m n}^{(L)} [t], ψ_{m n}^{(L)} [t], t = 30

). The compact VILD representation provides an excellent domain for feature extraction and recognition.

Figure 12. The figure illustrates: (first row) the original image

I_{m n}^{(L)} [t]

, the CHF output module

M_{m n}^{(L)} [t]

and the thresholded phase

ψ_{m n}^{(L)} [t]

(

t = 30

), and (second row) the details of the captured image in the red and green boxes as they are represented in the original domain and the VILD domain (

I_{m n}^{(L)} [t], ψ_{m n}^{(L)} [t], t = 30

). The compact VILD representation provides an excellent domain for feature extraction and recognition.

Table 2. The table reports the mean square error (MSE [m²]), root mean square error (RMSE [m]), and scale drift (SD[]) of the VILD-SLAM based on the maps

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

, for two parameter settings (

θ = 0.1, N_{M} = 20

and

θ = 0.08, N_{M} = 5

). The metrics are provided for the complete path, as well as separately for the first and the second segment.

Table 2. The table reports the mean square error (MSE [m²]), root mean square error (RMSE [m]), and scale drift (SD[]) of the VILD-SLAM based on the maps

ψ_{m n}^{(L)} [t], ψ_{m n}^{(R)} [t]

, for two parameter settings (

θ = 0.1, N_{M} = 20

and

θ = 0.08, N_{M} = 5

). The metrics are provided for the complete path, as well as separately for the first and the second segment.

$ψ_{mn}^{(L)} [t], ψ_{mn}^{(R)} [t]$ Performances	$θ = 0.1, N_{M} = 20$			$θ = 0.08, N_{M} = 5$
$ψ_{mn}^{(L)} [t], ψ_{mn}^{(R)} [t]$ Performances	Full Path	1° Segment	2° Segment	Full Path	1° Segment	2° Segment
MSE [m²]	23.1862	32.0529	7.7102	35.0658	44.6947	15.9963
RMSE [m]	4.8152	5.6615	2.7767	5.9216	6.6854	3.9995
SD	−0.0482	−0.0374	−0.0310	−0.0707	−0.0658	−0.0387

Table 3. Performance metrics (MSE [m²], RMSE [m]), and SD[]) achieved using the algorithm in [14] for the complete path and separately for the first and the second segment.

State-of-the-Art Performances [14]	Full Path	1° Segment	2° Segment
MSE [m²]	68.7747	82.7192	44.0547
RMSE [m]	8.2930	9.0950	6.6374
SD	0.8995	−0.1142	−0.0634

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Di Salvo, E.; Bellucci, S.; Celidonio, V.; Rossini, I.; Colonnese, S.; Cattai, T. Visual Localization Domain for Accurate V-SLAM from Stereo Cameras. Sensors 2025, 25, 739. https://doi.org/10.3390/s25030739

AMA Style

Di Salvo E, Bellucci S, Celidonio V, Rossini I, Colonnese S, Cattai T. Visual Localization Domain for Accurate V-SLAM from Stereo Cameras. Sensors. 2025; 25(3):739. https://doi.org/10.3390/s25030739

Chicago/Turabian Style

Di Salvo, Eleonora, Sara Bellucci, Valeria Celidonio, Ilaria Rossini, Stefania Colonnese, and Tiziana Cattai. 2025. "Visual Localization Domain for Accurate V-SLAM from Stereo Cameras" Sensors 25, no. 3: 739. https://doi.org/10.3390/s25030739

APA Style

Di Salvo, E., Bellucci, S., Celidonio, V., Rossini, I., Colonnese, S., & Cattai, T. (2025). Visual Localization Domain for Accurate V-SLAM from Stereo Cameras. Sensors, 25(3), 739. https://doi.org/10.3390/s25030739

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Visual Localization Domain for Accurate V-SLAM from Stereo Cameras

Abstract

1. Introduction

2. Related Works

3. System Model

4. Visual Features in the CHF Domain: A Review

5. Visual Localization Domain

Remarks

6. Experimental Results

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI