A Registration Method for Historical Maps Based on Self-Supervised Feature Matching
Abstract
:1. Introduction
- (1)
- We construct a dedicated dataset specifically for historical map registration research. In collaboration with relevant museums and archaeological departments, we collected historical maps spanning various dynasties from cities in China such as Beijing, Tianjin, and Shenyang. These scanned map images were organized into city-based sub-datasets for registration studies. Inspired by the processing of cellular images in medical imaging [14], we designed an improved U-Net-based segmentation model [15] to extract the main contours of the maps from the original images, thereby effectively removing creases, stains, and other artifacts.
- (2)
- We propose a new computer vision task focused on historical map registration and outline a corresponding research framework. Drawing inspiration from SAR image registration tasks, which share similarities with map registration, we integrated these approaches with the unique characteristics of historical map data to propose a novel registration architecture. First, we preprocess both the reference image and the moving image to remove noise in the original data. Next, we employ our self-supervised feature extraction method with non-maximum suppression (NMS) [16] to obtain feature points and descriptors. Then, we apply a graph attention mechanism to enrich these feature descriptors with contextual information. Subsequently, we establish feature point correspondences between the two images using the iterative Sinkhorn algorithm. Finally, we filter and refine these matched point pairs to estimate the transformation model, thereby completing the historical map registration task.
- (3)
- We propose a self-supervised feature extraction method. Historical maps are limited in quantity and difficult to annotate manually. However, we observed that the processed map data essentially consist of collections of lines. Leveraging this characteristic, we generated fundamental geometric shapes (such as cubes, intersecting lines, letters, etc.) and applied geometric transformations to produce derivative images. These original and transformed shapes served as our original shape dataset. We then employed a convolutional neural network-based encoder–decoder architecture to train a feature extraction model, establishing a baseline. Subsequently, we applied this baseline model to historical map images for feature extraction and fine-tuned the results, ultimately obtaining a refined feature extraction model.
- (4)
- We propose a feature update module based on a graph attention mechanism. Map images typically exhibit pronounced local correlations while spanning a wide range of scales, making the learning of positional information crucial for improving matching accuracy. Therefore, we transform feature points into nodes in a graph: similarities among feature points within the same image are regarded as intra-graph edges, and similarities among feature points across different images are treated as inter-graph edges. We then construct a graph attention architecture to propagate positional information across the feature maps. This approach enables the learning of local positional information around feature points and the similarity relationships across different scales, thereby enhancing the accuracy of subsequent feature matching.
2. Related Work
2.1. Traditional Methods
2.2. Learning-Based Methods
3. Materials and Methods
3.1. Historical Map Dataset
3.2. Map Preprocessing
3.3. Self-Supervised Map Feature Extraction
Algorithm 1 Algorithm of Non-Maximum Suppression |
|
3.4. Map Feature Matching Based on Graph Neural Networks
3.5. Optimal Transport
4. Results
4.1. Metrics
- . This is the total number of keypoint matching pairs obtained in the feature matching module [32]. A higher means that there are more correspondences between keypoints in different images.
- NOCC (Number of Correct Correspondences [32]). This metric measures the number of correct correspondences between keypoints. A higher number of correct matches leads to a more accurate estimation of the geometric transformation model.
- ROCC (Ratio of Correct Correspondences [33]). This metric evaluates the proportion of accurate matches by assessing the presence of outlier matches among key points during the registration process. A higher ROCC indicates that a greater number of key points have been successfully matched, suggesting that the registration is more resilient to incorrect matches and outliers. The ROCC is calculated as follows:
- RMSE (Root Mean Squared Error [33]). RMSE is a metric used to gauge the reliability and precision of the registration procedure. It quantifies how accurately two images are aligned by inspecting both the forward and inverse geometric transformations. A lower RMSE score indicates greater registration accuracy, reflecting fewer alignment errors.denotes a key point within the sensed image, and N signifies the total number of key points present in the sensed image. The functions f and g correspond to the forward and inverse transformation models. Here, represents a key point in the sensed image, and N indicates the total number of key points within that image. The functions f and g refer to the forward and backward transformation models, respectively.
- RT (Runtime [32]). This means the time complexity of the algorithm. Registration efficiency is also a key focus of our study. By comparing the runtime differences of various methods across different modules, we can clearly illustrate the time complexity of each approach.
4.2. MapSegment
4.3. MapExtraction
4.4. MapMatcher
4.5. MapRegistration
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Orabi, R. Aleppo Pixelated: An Urban Reading through Digitized Historical Maps and High-Resolution Orthomosaics Case Study of al-Aqaba and al-Jallūm Quarters. Digital 2024, 4, 152–168. [Google Scholar] [CrossRef]
- Xia, X.; Zhang, T.; Heitzler, M.; Hurni, L. Vectorizing historical maps with topological consistency: A hybrid approach using transformers and contour-based instance segmentation. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103837. [Google Scholar] [CrossRef]
- Smith, E.S.; Fleet, C.; King, S.; Mackaness, W.; Walker, H.; Scott, C.E. Estimating the density of urban trees in 1890s Leeds and Edinburgh using object detection on historical maps. Comput. Environ. Urban Syst. 2025, 115, 102219. [Google Scholar] [CrossRef]
- Ju, F.; Li, Y.; Zhao, J.; Dong, M. 2D/3D fast fine registration in minimally invasive pelvic surgery. Biomed. Signal Process. Control 2025, 100, 107145. [Google Scholar] [CrossRef]
- Hui, N.; Jiang, Z.; Cai, Z.; Ying, S. Vision-HD: Road change detection and registration using images and high-definition maps. Int. J. Geogr. Inf. Sci. 2024, 38, 454–477. [Google Scholar] [CrossRef]
- Darzi, F.; Bocklitz, T. A Review of Medical Image Registration for Different Modalities. Bioengineering 2024, 11, 786. [Google Scholar] [CrossRef]
- Xie, Z.; Zhang, W.; Wang, L.; Zhou, J.; Li, Z. Optical and SAR Image Registration Based on the Phase Congruency Framework. Appl. Sci. 2023, 13, 5887. [Google Scholar] [CrossRef]
- Hou, Z.; Liu, Y.; Zhang, L. POS-GIFT: A geometric and intensity-invariant feature transformation for multimodal images. Inf. Fusion 2024, 102, 102027. [Google Scholar] [CrossRef]
- Pallotta, L.; Giunta, G.; Clemente, C. Subpixel SAR Image Registration Through Parabolic Interpolation of the 2-D Cross Correlation. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4132–4144. [Google Scholar] [CrossRef]
- Sengupta, D.; Gupta, P.; Biswas, A. A survey on mutual information based medical image registration algorithms. Neurocomputing 2022, 486, 174–188. [Google Scholar] [CrossRef]
- Edstedt, J.; Sun, Q.; Bökman, G.; Wadenbäck, M.; Felsberg, M. RoMa: Robust dense feature matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 19790–19800. [Google Scholar]
- Sarlin, P.E.; DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperGlue: Learning Feature Matching with Graph Neural Networks. arXiv 2020, arXiv:1911.11763. [Google Scholar] [CrossRef]
- Liaghat, A.; Helfroush, M.S.; Norouzi, J.; Danyali, H. Airborne SAR to Optical Image Registration Based on SAR Georeferencing and Deep Learning Approach. IEEE Sensors J. 2023, 23, 26446–26458. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
- Song, Y.; Pan, Q.K.; Gao, L.; Zhang, B. Improved non-maximum suppression for object detection using harmony search algorithm. Appl. Soft Comput. 2019, 81, 105478. [Google Scholar] [CrossRef]
- Boroujeni, S.P.H.; Razi, A. IC-GAN: An Improved Conditional Generative Adversarial Network for RGB-to-IR image translation with applications to forest fire monitoring. Expert Syst. Appl. 2024, 238, 121962. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Arandjelovic, R.; Zisserman, A. Three things everyone should know to improve object retrieval. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2911–2918. [Google Scholar] [CrossRef]
- Morel, J.M.; Yu, G. ASIFT: A New Framework for Fully Affine Invariant Image Comparison. SIAM J. Imaging Sci. 2009, 2, 438–469. [Google Scholar] [CrossRef]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Alcantarilla, P.F.; Bartoli, A.; Davison, A.J. KAZE Features. In Proceedings of the Computer Vision—ECCV 2012, Florence, Italy, 7–13 October 2012; Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 214–227. [Google Scholar] [CrossRef]
- Revaud, J.; De Souza, C.; Humenberger, M.; Weinzaepfel, P. R2D2: Reliable and Repeatable Detector and Descriptor. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Dusmanu, M.; Rocco, I.; Pajdla, T.; Pollefeys, M.; Sivic, J.; Torii, A.; Sattler, T. D2-Net: A Trainable CNN for Joint Detection and Description of Local Features. arXiv 2019, arXiv:1905.03561. [Google Scholar] [CrossRef]
- Edstedt, J.; Bökman, G.; Wadenbäck, M.; Felsberg, M. DeDoDe: Detect, Don’t Describe—Describe, Don’t Detect for Local Feature Matching. In Proceedings of the 2024 International Conference on 3D Vision (3DV), Davos, Switzerland, 18–21 March 2024; pp. 148–157. [Google Scholar]
- Pan, X.; Luo, P.; Shi, J.; Tang, X. Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net. arXiv 2020, arXiv:1807.09441. [Google Scholar] [CrossRef]
- Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial Transformer Networks. arXiv 2016, arXiv:1506.02025. [Google Scholar] [CrossRef]
- Balestriero, R.; Ibrahim, M.; Sobal, V.; Morcos, A.; Shekhar, S.; Goldstein, T.; Bordes, F.; Bardes, A.; Mialon, G.; Tian, Y.; et al. A Cookbook of Self-Supervised Learning. arXiv 2023, arXiv:2304.12210. [Google Scholar] [CrossRef]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. arXiv 2016, arXiv:1609.05158. [Google Scholar] [CrossRef]
- Hosang, J.; Benenson, R.; Schiele, B. Learning non-maximum suppression. arXiv 2017, arXiv:1705.02950. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Cuturi, M. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–8 December 2013; Curran Associates, Inc.: Red Hook, NY, USA, 2013; Volume 26. [Google Scholar]
- Norouzi, J.; Helfroush, M.S.; Liaghat, A.; Danyali, H. A Deep-Based Approach for Multi-Descriptor Feature Extraction: Applications on SAR Image Registration. Expert Syst. Appl. 2024, 254, 124291. [Google Scholar] [CrossRef]
- Ma, W.; Zhang, J.; Wu, Y.; Jiao, L.; Zhu, H.; Zhao, W. A Novel Two-Step Registration Method for Remote Sensing Images Based on Deep and Local Features. IEEE Trans. Geosci. Remote Sens. 2019, 57, 4834–4843. [Google Scholar] [CrossRef]
Parameter Label | Value |
---|---|
Input Image Size | 1440 × 1920 |
Learning Rate | |
Epoch | 50 |
Batch Size | 4 |
Process | Metrics | a–b | a–c | a–d | a–e | a–f | a–g | a–h |
---|---|---|---|---|---|---|---|---|
- | RT | 1.104 s | 1.271 s | 1.086 s | 1.080 s | 1.106 s | 1.091 s | 1.089 s |
Ref | 3622 | 3622 | 3622 | 3622 | 3622 | 3622 | 3622 | |
Sen | 3186 | 4001 | 2819 | 2623 | 3626 | 2921 | 2806 | |
1980 | 66 | 30 | 185 | 53 | 1974 | 888 | ||
MapSegment | RT | 1.079 s | 1.094 s | 1.094 s | 1.063 s | 1.092 s | 1.062 s | 1.067 s |
Ref | 3559 | 3559 | 3559 | 3559 | 3559 | 3559 | 3559 | |
Sen | 2425 | 4144 | 3876 | 2959 | 4034 | 2893 | 3136 | |
2418 | 501 | 303 | 630 | 252 | 2400 | 1829 |
Models | Metrics | a–b | a–c | a–d | a–e | a–f | a–g | a–h |
---|---|---|---|---|---|---|---|---|
RIFT2 | RT | 2.067 s | 2.730 s | 2.321 s | 2.032 s | 3.145 s | 2.659 s | 2.675 s |
Ref | 6099 | 6099 | 6099 | 6099 | 6099 | 6099 | 6099 | |
Sen | 6373 | 6659 | 4999 | 4792 | 5855 | 4765 | 6127 | |
2059 | 986 | 1090 | 1451 | 2237 | 1530 | 1439 | ||
D2Net | RT | 4.320 s | 4.651 s | 4.376 s | 5.011 s | 5.234 s | 4.012 s | 4.368 s |
Ref | 4316 | 4316 | 4316 | 4316 | 4316 | 4316 | 4316 | |
Sen | 3975 | 4399 | 4361 | 3667 | 3872 | 4785 | 4800 | |
293 | 144 | 161 | 314 | 330 | 138 | 499 | ||
SuperGlue | RT | 1.786 s | 1.329 s | 1.345 s | 1.377 s | 2.153 s | 2.356 s | 1.038 s |
Ref | 6223 | 6223 | 6223 | 6223 | 6223 | 6223 | 6223 | |
Sen | 5620 | 5727 | 6971 | 6615 | 5047 | 5713 | 5759 | |
241 | 1796 | 133 | 1762 | 523 | 375 | 357 | ||
DeDoDe | RT | 2.435 s | 3.025 s | 3.152 s | 2.125 s | 2.158 s | 2.568 s | 2.325 s |
Ref | 1620 | 1620 | 1620 | 1620 | 1620 | 1620 | 1620 | |
Sen | 1271 | 2320 | 1197 | 1629 | 1950 | 2739 | 2847 | |
57 | 208 | 76 | 285 | 79 | 231 | 274 | ||
R2D2 | RT | 0.894 s | 0.561 s | 1.357 s | 1.639 s | 0.864 s | 1.173 s | 1.269 s |
Ref | 1518 | 1518 | 1518 | 1518 | 1518 | 1518 | 1518 | |
Sen | 1445 | 2658 | 1871 | 1916 | 2779 | 1141 | 2159 | |
91 | 169 | 50 | 114 | 157 | 138 | 145 | ||
RoMa | RT | 1.939 s | 1.704 s | 1.548 s | 2.492 s | 1.875 s | 1.640 s | 1.801 s |
Ref | 5796 | 5796 | 5796 | 5796 | 5796 | 5796 | 5796 | |
Sen | 6847 | 5369 | 5083 | 5341 | 6422 | 5967 | 5718 | |
1817 | 1394 | 1023 | 1115 | 1317 | 531 | 641 | ||
MapMatcher | RT | 1.079 s | 1.094 s | 1.094 s | 1.063 s | 1.092 s | 1.062 s | 1.067 s |
Ref | 3559 | 3559 | 3559 | 3559 | 3559 | 3559 | 3559 | |
Sen | 2425 | 4144 | 3876 | 2959 | 4034 | 2893 | 3136 | |
2418 | 501 | 303 | 630 | 252 | 2400 | 1829 |
Extractor | Matcher | ROCC | |
---|---|---|---|
SIFT2 | NN + ratio | 1532 | 0.103 |
MapMatcher | 1231 | 0.132 | |
SuperPoint | NN + mutual | 2431 | 0.079 |
SuperGlue | 3101 | 0.197 | |
MapMatcher | 2873 | 0.213 | |
MapExtraction | SuperGlue | 2731 | 0.208 |
MapMatcher | 2313 | 0.310 |
Models | Metrics | a–b | a–c | a–d | a–e | a–f | a–g | a–h |
---|---|---|---|---|---|---|---|---|
SIFT2 | RT | 2.067 s | 2.108 s | 2.183 s | 2.019 s | 2.181 s | 2.053 s | 2.047 s |
MI | 0.0366 | 0.0433 | 0.0364 | 0.0345 | 0.0391 | 0.0359 | 0.0337 | |
ROCC | 0.029 | 0.160 | 0.147 | 0.114 | 0.096 | 0.036 | 0.175 | |
RMSE | 1.369 | 0.120 | 0.951 | 2.756 | 1.376 | 1.765 | 0.753 | |
SuperGlue | RT | 5.765 s | 4.122 s | 3.199 s | 3.150 s | 3.453 s | 4.017 s | 3.240 s |
MI | 0.0764 | 0.0401 | 0.0339 | 0.0472 | 0.0467 | 0.0798 | 0.0614 | |
ROCC | 0.079 | 0.325 | 0.230 | 0.112 | 0.384 | 0.197 | 0.155 | |
RMSE | 0.030 | 0.105 | 2.151 | 3.154 | 2.025 | 0.372 | 0.236 | |
DeDoDe | RT | 3.157 s | 3.544 s | 3.828 s | 2.691 s | 2.898 s | 3.606 s | 3.070 s |
MI | 0.0270 | 0.0206 | 0.0273 | 0.0253 | 0.0210 | 0.0414 | 0.0372 | |
ROCC | 0.286 | 0.426 | 0.295 | 0.294 | 0.387 | 0.207 | 0.396 | |
RMSE | 1.126 | 1.185 | 0.850 | 0.718 | 1.110 | 1.134 | 1.837 | |
RoMa | RT | 4.411 s | 4.399 s | 2.715 s | 4.276 s | 3.639 s | 3.507 s | 4.135 s |
MI | 0.0286 | 0.0256 | 0.0261 | 0.0276 | 0.0270 | 0.0490 | 0.0350 | |
ROCC | 0.089 | 0.035 | 0.078 | 0.079 | 0.060 | 0.030 | 0.088 | |
RMSE | 0.772 | 0.174 | 1.644 | 1.911 | 0.441 | 0.137 | 0.831 | |
ours | RT | 2.025 s | 1.892 s | 1.442 s | 1.640 s | 1.325 s | 1.530 s | 1.363 s |
MI | 0.1030 | 0.0503 | 0.0571 | 0.0498 | 0.0618 | 0.0912 | 0.0720 | |
ROCC | 0.487 | 0.244 | 0.222 | 0.372 | 0.515 | 0.278 | 0.392 | |
RMSE | 0.043 | 0.024 | 0.032 | 0.156 | 0.080 | 0.200 | 0.089 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qin, Z.; Feng, Y.; Wu, G.; Dong, Q.; Han, T. A Registration Method for Historical Maps Based on Self-Supervised Feature Matching. Appl. Sci. 2025, 15, 1472. https://doi.org/10.3390/app15031472
Qin Z, Feng Y, Wu G, Dong Q, Han T. A Registration Method for Historical Maps Based on Self-Supervised Feature Matching. Applied Sciences. 2025; 15(3):1472. https://doi.org/10.3390/app15031472
Chicago/Turabian StyleQin, Zikang, Yumin Feng, Gang Wu, Qing Dong, and Tianxin Han. 2025. "A Registration Method for Historical Maps Based on Self-Supervised Feature Matching" Applied Sciences 15, no. 3: 1472. https://doi.org/10.3390/app15031472
APA StyleQin, Z., Feng, Y., Wu, G., Dong, Q., & Han, T. (2025). A Registration Method for Historical Maps Based on Self-Supervised Feature Matching. Applied Sciences, 15(3), 1472. https://doi.org/10.3390/app15031472