1. Introduction
As a relatively mature energy storage technology, the performance and lifespan of lithium batteries are influenced by various factors. To address these challenges, it is essential to establish a comprehensive battery health management system. One significant feature of such a health management system, compared to traditional Battery Management System (BMS), is the development of a dimension–preload force–expansion force model. This model utilizes the stress and volumetric energy density indicators of individual battery cells to optimize the arrangement of cells within the battery pack, thereby enhancing its durability. A critical issue to be addressed within the size–preload–expansion force model is the precise control of individual cell displacement and accurate positioning during the preloading phase of a stacked assembly. Additionally, during the assembly process of the battery pack, individual cells are connected using various techniques such as resistance welding, laser welding, ultrasonic welding, and mechanical connections, effectively integrating the components of the battery pack [
1]. Among these, ultrasonic metal welding, resistance spot welding, and pulse tungsten inert gas (TIG) spot welding are the three preferred welding methods for connecting batteries through tabs or busbars. This preference is due to the high efficiency of spot welding in the small-scale connection process of battery stack components [
2]. Due to the stacking of multiple layers of battery cells separated by foam pads of varying specifications, the application of preload can result in varying degrees of displacement and deformation among the different layers of battery cells and foam pads. These slight displacements may create uneven gaps between the battery cells, further complicating the precise positioning required for subsequent tab welding. In some cases, this may even lead to poor contact between layers or localized internal stress concentration, ultimately degrading battery performance and introducing safety risks [
3,
4,
5]. Therefore, accurately detecting and measuring the displacement of individual battery cells during the assembly process is crucial. By implementing real-time monitoring and control of battery cell displacement, it becomes possible to dynamically adjust assembly parameters and processes, ensuring the stability and uniformity of the battery cells. This approach not only enhances the performance and lifespan of lithium battery packs but also reduces the overall costs and safety risks associated with energy storage systems.
Existing displacement detection methods are primarily categorized into contact and non-contact measurements [
6]. Contact measurement techniques have been widely utilized in various industrial inspections, such as structural displacement monitoring in buildings [
7]. However, these methods face significant limitations. Firstly, installing contact measurement sensors on flexible surfaces is often challenging. Secondly, contact measurement requires direct physical interaction with the object being measured, which can introduce additional deformations, leading to inaccuracies. As a result, it is difficult to apply contact measurement techniques effectively in the cell assembly process. Non-contact measurement methods are better suited to the conditions of cell assembly. Common non-contact approaches include laser measurement [
8], ultrasonic measurement [
9], and visual measurement [
10]. These methods have now reached a mature level of development and are applicable across various industries in numerous scenarios. In particular, industrial laser displacement measuring instruments are widely used for displacement measurement, offering accuracy that can easily reach 0.01 mm depending on the specific model, along with a range of measurement distances available. However, a significant drawback of these instruments is their high cost; for monitoring multi-story structures, multiple units must be purchased, necessitating ample space for installation and calibration, which imposes certain limitations. As a result, these instruments are not particularly suitable for the scenarios explored in this study.
In the process of battery pack assembly, which involves complex multi-layer deformations and various structural changes, visual measurement methods present a more practical alternative compared to other sensors that entail high costs and cumbersome installation processes. Visual measurement requires merely a stationary image acquisition device, with displacement being determined through the analysis of captured images that illustrate the relative positions of cell monoliths across different layers. Moreover, the visual measurement approach offers a considerable advantage over contact displacement sensors: each pixel in the captured image can be treated as a sensor, thus facilitating high-resolution analysis. Based on this principle, this paper proposes a method for multi-layer displacement detection of battery monomers utilizing a standard camera, which proves to be both cost-effective and suitable for engineering applications.
Non-contact measurement based on machine vision has been widely applied in various engineering fields, particularly in the study of displacement in bridges and buildings. Piotr Olaszek [
11] proposed an edge crossing detection algorithm to identify special markers placed on the target structure, enabling displacement measurement for various types of bridges under different conditions. The feasibility of this approach was validated through experiments and comparisons with displacement meters. White K.R. et al. [
12] from New Mexico State University utilized the close-range photogrammetry method (DCRTP) to perform both laboratory and real-world bridge experiments. By applying elastic beam theory, they verified the accuracy of the measurement data, and further demonstrated the practicality of the method through bridge static load experiments. Yu Shanshan et al. [
13] developed a fast feature tracking software for multi-objective deformation measurement, which significantly improved feature extraction speed by reducing the requirements for scale and rotation invariance. Additionally, the traditional RANSAC algorithm was optimized by pre-filtering matched point pairs and quickly discarding unreasonable parameter models, improving both the accuracy and efficiency of the SURF-BRISK-based feature tracking method. Xu et al. [
14] employed a feature matching method for random five-marker detection on the external features of a pedestrian bridge. This enabled the measurement of dynamic displacement and vibration variations under different loading conditions. Dongming Feng et al. [
15] used contour detection and feature matching techniques to measure the displacement of a cable-stayed bridge. Their method also facilitated the measurement of cable vibrations and bridge deck displacements. In subsequent research, Dongming Feng [
16] developed a new visual sensor system for remote displacement measurement of structures. This system integrated a template matching algorithm with the OCM (Oriented Corner Matching) algorithm and the UCC (Unbiased Correlation Coefficient) algorithm to effectively detect and track feature points on structural surfaces. The improved upsampling factor achieved better sub-pixel resolution, enhancing the system’s robustness even in harsh environments.
Based on the aforementioned work, it is evident that research on displacement measurement utilizing visual methods has become quite widespread. However, several challenges persist in practical applications. Firstly, while there is a variety of visual measurement methods available, including traditional recognition algorithms based on image feature point matching and emerging machine learning technologies, the horizontal application of these methods remains limited. Furthermore, there is often a lack of practical scenario implementations and quantitative analyses in existing review articles, which restricts their guidance on actual engineering production processes. Secondly, machine vision-based displacement measurement techniques are predominantly used for vibration and displacement monitoring of large structures, such as bridges and buildings, with a strong emphasis on real-time and long-term monitoring. In contrast, during the pre-tensioning process of electrical stacks, particular attention must be paid to the accuracy of displacement detection at specific moments. In contrast to the long-distance macro-structural monitoring of bridges and buildings, the application of machine vision for monitoring displacement changes in multi-cell configurations during the pre-tensioning process—a scenario that involves close-range, multi-story structural displacement detection—has received relatively little attention. Furthermore, the majority of industry research focuses on cell expansion displacement during the charge and discharge cycles following pre-tensioning [
17,
18], while insufficient emphasis is placed on the core displacement occurring during the pre-tensioning phase, prior to the mechanical connection of the core to other components. In the context of automated production processes, the accurate positioning of cells and the measurement of their displacement are undoubtedly crucial for enhancing the overall quality of the final battery pack product.
Therefore, analogous to the deformation monitoring of floor structures and bridge structures, a similar approach can be adopted for the process of multi-cell stacking pre-tensioning. To address the challenges of detecting the displacement of multi-layer cells during the assembly of lithium batteries under complex working conditions, this paper proposes a novel method for cell displacement detection based on the MicKey method, which eliminates the need for special markers. First, the region of interest (ROI) of the tab is identified using an adaptive cell boundary segmentation approach based on the HSV color space. Next, the MicKey method’s neural network keypoint matching process is employed to predict the coordinates of corresponding keypoints in the 3D camera space, reconstruct the 3D target object from 2D images, and extract and match feature point pairs. Finally, the Z-Score method is applied to filter out outlier mismatched points, enabling the calculation of accurate pixel displacements. These pixel displacements are then used to determine the precise displacement of tabs at each cell level during the assembly process. This method is integrated into a comprehensive health management system for lithium battery production, and its performance has been evaluated through a series of laboratory and field tests. The results demonstrate its effectiveness and practicality in accurately detecting multi-layer cell displacements during assembly.
The structure of this paper is as follows:
Section 2 provides a detailed description of the overall process of the proposed method. It introduces the HSV image segmentation technique for extracting the target region (ROI), along with the MicKey feature matching process and the scaling factor estimation method. In
Section 3, the proposed method is validated through its application in the actual stack assembly process, and its advantages over other feature matching methods are analyzed. Finally,
Section 4 presents the conclusions and discusses potential directions for future research.
2. Materials and Methods
To address the challenge of conveniently measuring lithium battery stacks during the pre-tightening process, this paper proposes a method for detecting cell displacement during lithium battery assembly based on the MicKey method. First, an HSV-based approach is applied to automatically extract the ROI of the cell tabs, effectively reducing computational complexity. Next, the neural network keypoint matching process of the MicKey method is employed to track the target feature points, and the Z-Score method is utilized to filter out outlier feature points, ensuring accurate pixel displacement of the feature points. Finally, the actual displacement is calculated by converting the pixel displacement using the scaling factor formula, enabling precise measurement of the cell tab displacement. The flowchart of the proposed method is shown in
Figure 1.
It is important to note that under actual operating conditions, the battery stack experiences deformations beyond just those aligned with the direction of pre-tensioning force. For instance, the tabs (or terminals) may exhibit small displacements in the z-direction, as illustrated in
Figure 2. These displacements arise partly from the compressive forces exerted by the pre-tensioning, leading to deformation of the cell casing, and partly because the cell casing is not an ideally flat surface. Additionally, the pre-tensioning direction is not entirely perpendicular to the stacking direction of the cells, which can result in lateral slipping of certain cells within the battery stack in the z-direction. However, given that the predominant deformation still occurs along the direction of the pre-tensioning force, this paper focuses primarily on the displacement of cells in the pre-tensioning direction during the battery pack assembly phase, specifically the displacements along the y-direction, as depicted in the subsequent figure.
2.1. ROI Selection Based on HSV
The assembly environment of the battery cell includes various elements beyond the battery cell itself, such as fixtures, foam cushions, presses, and other background components, as illustrated in
Figure 3.
These elements can interfere with the displacement measurement of the battery cell and negatively impact the accuracy of the measurement results. Power batteries are typically covered with a blue film, where the blue color serves as the contour boundary of a single battery cell. This color is significantly distinct from the background, making it easier to identify the cell. In this paper, the HSV color space is utilized to differentiate the body of the battery cell, covered with a blue film, from its background. Compared to the RGB color space, the HSV color space is less sensitive to lighting variations, as changes in saturation and brightness do not affect the hue component [
19]. Thus, the HSV color space is well suited for separating the blue outline of the battery cell from the background. By extracting the HSV histograms of the blue film covering the lithium battery pack under various lighting conditions, the interval distribution of the target pixels in the histogram is analyzed, and the values of H, S, and V are determined, as shown in Equation (1). Specifically, when the hue (H) is within the range [100–180], the saturation (S) is within [180–255], and the brightness (V) remains at its default value, the segmentation of the target becomes more precise.
The contour extraction map resulting from the segmentation of the main body of the battery cell using the HSV color space is shown in
Figure 4a. This map is further processed through binarization, and the contours are made continuous to a certain extent using morphological processing. The final contour extraction map of the battery stack is obtained, as shown in
Figure 4b below.
At this stage, although the contour has been processed, it remains intermittent, and traditional edge detection methods, such as Canny, fail to produce satisfactory results. To address this, the method illustrated in
Figure 4b is applied. The x-value range of the contour [x
min, x
max] is obtained by scanning each column to determine the left and right boundaries of the rectangular core. Next, the contour of the battery cell is sampled at intervals by defining detection lines X
n, as shown in Equation (2). The value of i can be adjusted based on specific requirements; in this study, i is set to 5. The sampling points captured by X
n are denoted as points. The sampled data are then classified based on the ordinate values to generate a scatter plot of the upper and lower boundaries of the battery cell. The scatter plot is grouped according to the distribution of the ordinate values, and the mean value of each group is calculated to determine the dividing lines L
i between each cell level, as shown in Equation (3).
where X
n is the current rectangular contour detection line, n is the number of gaps, i is a preset value used to adjust the density of the detection lines, and N is the total number of detected points.
Using X
min, X
max, and L
i, the four boundaries of the simplified rectangular outline corresponding to each cell layer, or box, can be determined. The values of L
tab and L
cell are fixed for each batch of cells, allowing the position of the tab within the simplified outline to be calculated based on these two parameters. Consequently, the area of the cell tab can be identified and defined as the ROI area, as illustrated in
Figure 5.
2.2. Feature Matching Process
This study employs the MicKey algorithm to establish precise 3D-3D correspondences in camera space from 2D images through descriptor matching, addressing two critical challenges in feature detection and matching. Metric Keypoints (MicKey) [
20] directly predict keypoint locations in camera space, forming metric correspondences without requiring explicit depth measurements or global structural information. These correspondences enable the recovery of relative pose between two views through differentiable pose optimization, allowing end-to-end training of MicKey using only image pairs and their ground truth relative poses for supervision.
Unlike traditional methods that rely on depth measurements or structure-from-motion (SfM) reconstruction, MicKey focuses on regions with reliable features, inherently learning depth information only where it is meaningful. This weakly supervised approach eliminates the need for additional overlap information or global structure reconstruction, significantly enhancing accessibility and practicality. For new domains, MicKey requires only pose data for training, bypassing the need for extensive domain-specific preprocessing or calibration.
MicKey adopts a multi-headed network architecture with shared encoders [
21,
22,
23], as illustrated in
Figure 6. The shared encoder leverages a pre-trained DINOv2 [
24] network to extract features. The image is segmented into
blocks, each represented by a feature vector, forming a feature map
, where
and
. The multi-head design facilitates parallel prediction of the following:
(2D offset): Calculates the 2D location of keypoints relative to block centers.
(Confidence): Indicates the reliability of detected keypoints.
(Depth): Estimates depth for each keypoint.
(Descriptor vector): Encodes unique feature descriptors for matching.
This configuration directly outputs 3D Metric Keypoints, enabling descriptor-based matching between images without requiring explicit depth supervision.
Once the keypoint matching data is obtained from an image pair, the Z-Score method is utilized to eliminate outliers. The Z-Score for each data point is calculated using the displacement values
, following Equation (4), which incorporates the mean and standard deviation.
Outliers with (predefined threshold) are excluded, ensuring robust matching. This statistical filtering enhances the reliability of correspondences, especially under varying conditions or noise.
To further validate the reliability of the proposed method, a manual evaluation was conducted on a subset of 100 randomly selected image pairs from the dataset. Three independent experts were tasked with manually annotating the keypoint correspondences and the resulting displacements. The manually annotated displacements were averaged to serve as a reference standard. Comparing the results of the MicKey algorithm with these manually annotated ground truths, the method achieved a mean absolute error of and an RMSE of , consistent with the laser displacement sensor measurements. These findings demonstrate that the MicKey algorithm aligns closely with human judgment and is robust against variations in conditions, further reinforcing its reliability.
In addition, the proposed method was validated using a proprietary dataset to measure displacements during lithium battery cell assembly. Keypoint matches obtained by MicKey were compared against reference measurements from a laser displacement sensor, regarded as ground truth. For each image pair, the pixel displacement
calculated by MicKey was converted to real-world displacement using a calibration factor in Equation (5):
where
is the actual physical dimension,
is the sensor’s pixel size, and
is the focal length.
MicKey demonstrated a mean error of
and an RMSE of
across 625 image pairs, outperforming traditional algorithms such as SURF-FLANN and ORB. Compared to traditional methods, MicKey reduced the measurement error by 52%, achieving superior accuracy and stability. Detailed results are presented in
Section 3.
The MicKey algorithm introduces a weakly supervised framework capable of achieving high precision in feature detection and matching, making it highly suitable for complex industrial applications such as multi-layer displacement detection in lithium battery cell assembly. By combining advanced feature extraction via pre-trained DINOv2, robust outlier filtering using Z-Score, and differentiable pose optimization, MicKey delivers reliable results under challenging conditions. This innovative approach addresses key limitations of existing methods, particularly the reliance on explicit depth measurements or global structure reconstruction, providing a scalable solution for industrial and research applications.
2.3. Determination of the Conversion Factor
In single-camera measurement, displacement information of the structure in both the horizontal and vertical directions can typically be obtained. Since the stack structure primarily experiences displacement along the horizontal and vertical planes during the compression of the cell structure, this study focuses on measuring the vertical pixel displacement (Y
i) in the direction of the applied pressure on the cell, as illustrated in Equation (6).
where
and
and are the y-coordinate differences in the jth point pair in the ith and first image sequences, respectively, and n is the number of correctly matched points in the two images.
In structural displacement measurement, it is essential to establish a conversion ratio between pixel displacement and real-world coordinate displacement, as illustrated in
Figure 7. The red lines represent the displacement of the object as well as the corresponding displacement on the image plane, while the blue line denotes the optical axis of the lens. After the field equipment is calibrated using a spirit level, the camera’s imaging plane is aligned to be parallel to the surface of the object being measured, ensuring that the angle between the camera plane and the target surface is approximately 0°. The conversion factor (CF) is then determined by calculating the ratio of the known physical dimensions of the target surface to its corresponding pixel dimensions, as expressed in Equation (5).
4. Discussion and Outlook
This paper presents a displacement measurement scheme for lithium battery cells based on machine vision. The region of interest (ROI) of the cell tab is extracted using a method based on the HSV color model, enabling pixel-level displacement detection through the MicKey feature point matching process combined with a mismatching elimination approach based on the Z-Score. To determine the real-world displacement of the multi-layer cells during the assembly process, the proposed method employs a conversion factor (CF) to relate physical dimensions to pixel measurements.
To evaluate the performance of the proposed scheme, experiments were conducted to comprehensively analyze the displacement of multi-layer cells during assembly. These experiments included an accuracy test of the algorithm for a four-cell stack assembly and a stability test of multi-tab displacement detection in an eight-cell stack assembly. Based on the experimental results, the following conclusions can be drawn:
Image segmentation of the stack body using the HSV color model effectively produces a binary image of the cell outline with minimal noise, thereby avoiding the higher noise levels typically associated with image segmentation based on the RGB color model [
27].
A comparison between the proposed method and laser sensor measurements indicates that the maximum absolute error is 0.08 mm, while the root mean square error (RMSE) is 0.039 mm. Compared to other existing methods, the approach proposed in this study demonstrates superior performance in feature point detection, feature matching accuracy, and the mitigation of mismatching issues. These results confirm that the proposed method satisfies the engineering requirements for multi-layer cell displacement measurement.
In summary, the displacement measurement method proposed in this paper reliably detects and quantifies the displacement of battery cells during assembly, providing valuable support for the advancement of full health monitoring systems for lithium batteries. Moreover, this method offers a novel solution for detecting the dynamic displacement of multiple battery cells during assembly.
However, future research still faces several challenges, particularly concerning the effects of fixture vibrations and variations in ambient lighting on the imaging process. In practical assembly processes, the vibrations of the fixture can interfere with displacement measurements. Consequently, future studies should focus on enhancing the system’s resilience to these interferences, potentially through the design of more stable fixtures and the optimization of imaging algorithms to minimize errors induced by vibrations. Additionally, variations in lighting conditions can significantly impact the accuracy of vision-based measurements. Future research could explore adaptive lighting adjustment technologies or conduct data training under different lighting conditions to improve the versatility of the algorithms. To ensure the reliability of measurement results, future investigations should also aim to establish standardized detection processes and methodologies to facilitate widespread adoption in various industrial environments. This includes the development of relevant testing standards and protocols to provide consistent references for the industry.