Fast and Accurate Background Reconstruction Using Background Bootstrapping
Abstract
:1. Introduction
- We implement a new consistency criterion for background estimation: The background estimate produced by a background estimation method should not change if we perform the background estimation using only pixels that are considered as background pixels with regards to this background estimate.
- We then show that this consistency criterion can be described as an optimization criterion and that that the associated optimization problem can be efficiently solved using stochastic gradient descent.
2. Related Work
3. Proposed Algorithm for Background Reconstruction
3.1. Motivation
3.2. Bootstrap Weights
3.3. Optical Flow Weights
3.4. Abnormal Image Weights
3.5. Management of Intermittent Motion
3.6. Statement of the Optimization Problem
4. Evaluation of the Proposed Model
4.1. Implementation Details
4.2. Evaluation on SBMnet dataset
- Average Gray-level Error (AGE);
- Percentage of Error Pixels (pEPs);
- Percentage of Clustered Error Pixels (pCEPs);
- Multi-Scale Structural Similarity Index (MS-SSIM);
- Peak-Signal-to-Noise-Ratio (PSNR);
- Color image Quality Measure (CQM).
4.3. Evaluation on SBI Dataset
4.4. Ablation Study
4.5. Computation Time
4.6. Image Samples
4.7. Hyperparameter Tuning
- : the soft threshold used for computing soft foreground mask should be decreased for frame sequences with very low average illumination.
- : the soft threshold used for computing optical flow masks should be decreased for video sequences with high frame rates and increased for sequences with low frame rates, considering that optical flow values are lower for a high frame rate sequences and higher for a low frame rate sequences.
- Optical flow weight : as shown in the ablation study, the use of optical flow weights is only necessary for highly occluded scenes. More precise results may be obtained by setting this parameter to lower value if a high level of occlusion is not expected.
- r: the value of r is associated with the expected sizes of the foreground objects: If it is forecast that the scenes will contain only small foreground objects, this value may be increased on high definition images for faster training.
- Bootstrap coefficient : a lower value of leads to faster training, but decreases the ability to handle occlusions. A higher value of may lead to slower or unstable training and artifacts in the final image.
- Global weight : increasing the value of may be useful to handle low intensity illumination changes.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Piccardi, M. Background subtraction techniques: A review. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics, The Hague, The Netherlands, 10–13 October 2004; Volume 4, pp. 3099–3104. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Cai, Y.; Zhang, M.; Li, H.; Gu, H. Scene background estimation based on temporal median filter with Gaussian filtering. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 132–136. [Google Scholar] [CrossRef]
- Xu, Z.; Min, B.; Cheung, R.C. A robust background initialization algorithm with superpixel motion detection. Signal Process. Image Commun. 2019, 71, 1–12. [Google Scholar] [CrossRef] [Green Version]
- Laugraud, B.; Van Droogenbroeck, M. Is a memoryless motion detection truly relevant for background generation with labgen. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2017; pp. 443–454. [Google Scholar] [CrossRef] [Green Version]
- Djerida, A.; Zhao, Z.; Zhao, J. Robust background generation based on an effective frames selection method and an efficient background estimation procedure (FSBE). Signal Process. Image Commun. 2019, 78, 21–31. [Google Scholar] [CrossRef]
- Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2281. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhou, W.; Deng, Y.; Peng, B.; Liang, D.; Kaneko, S. Co-occurrence background model with superpixels for robust background initialization. arXiv 2020, arXiv:2003.12931. [Google Scholar]
- Laugraud, B.; Piérard, S.; Van Droogenbroeck, M. LaBGen: A method based on motion detection for generating the background of a scene. Pattern Recognit. Lett. 2017, 96, 12–21. [Google Scholar] [CrossRef] [Green Version]
- Laugraud, B.; Piérard, S.; Van Droogenbroeck, M. LaBGen-P: A pixel-level stationary background generation method based on LaBGen. In Proceedings of the International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 107–113. [Google Scholar] [CrossRef] [Green Version]
- Laugraud, B.; Piérard, S.; Van Droogenbroeck, M. Labgen-p-semantic: A first step for leveraging semantic segmentation in background generation. J. Imaging 2018, 4, 86. [Google Scholar] [CrossRef] [Green Version]
- Yu, L.; Guo, W. A Robust Background Initialization Method Based on Stable Image Patches. In Proceedings of the 2018 Chinese Automation Congress (CAC 2018), Xi’an, China, 30 November–2 December 2018; pp. 980–984. [Google Scholar] [CrossRef]
- Cohen, S. Background estimation as a labeling problem. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), Beijing, China, 17–21 October 2005; Volume 1, pp. 1034–1041. [Google Scholar] [CrossRef]
- Xu, X.; Huang, T.S. A loopy belief propagation approach for robust background estimation. In Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, USA, 23–28 June 2008. [Google Scholar] [CrossRef]
- Agarwala, A.; Dontcheva, M.; Agrawala, M.; Drucker, S.; Colburn, A.; Curless, B.; Salesin, D.; Cohen, M. Interactive digital photomontage. In ACM SIGGRAPH 2004 Papers, SIGGRAPH 2004; Association of Computing Machinery: New York, NY, USA, 2004; pp. 294–302. [Google Scholar] [CrossRef] [Green Version]
- Mseddi, W.S.; Jmal, M.; Attia, R. Real-time scene background initialization based on spatio-temporal neighborhood exploration. Multimed. Tools Appl. 2019, 78, 7289–7319. [Google Scholar] [CrossRef]
- Baltieri, D.; Vezzani, R.; Cucchiara, R. Fast background initialization with recursive Hadamard transform. In Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2010), Boston, MA, USA, 29 August–1 September 2010; pp. 165–171. [Google Scholar] [CrossRef]
- Colombari, A.; Fusiello, A. Patch-based background initialization in heavily cluttered video. IEEE Trans. Image Process. 2010, 19, 926–933. [Google Scholar] [CrossRef] [Green Version]
- Hsiao, H.H.; Leou, J.J. Background initialization and foreground segmentation for bootstrapping video sequences. EURASIP J. Image Video Process. 2013, 2013, 12. [Google Scholar] [CrossRef] [Green Version]
- Lin, H.H.; Liu, T.L.; Chuang, J.H. Learning a scene background model via classification. IEEE Trans. Signal Process. 2009, 57, 1641–1654. [Google Scholar] [CrossRef] [Green Version]
- Ortego, D.; SanMiguel, J.C.; Martínez, J.M. Rejection based multipath reconstruction for background estimation in video sequences with stationary objects. Comput. Vis. Image Underst. 2016, 147, 23–37. [Google Scholar] [CrossRef] [Green Version]
- Sanderson, C.; Reddy, V.; Lovell, B.C. A low-complexity algorithm for static background estimation from cluttered image sequences in surveillance contexts. EURASIP J. Image Video Process. 2011. [Google Scholar] [CrossRef] [Green Version]
- Javed, S.; Mahmood, A.; Bouwmans, T.; Jung, S.K. Background-Foreground Modeling Based on Spatiotemporal Sparse Subspace Clustering. IEEE Trans. Image Process. 2017, 26, 5840–5854. [Google Scholar] [CrossRef]
- Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust principal component analysis? J. ACM 2011, 58, 1–37. [Google Scholar] [CrossRef]
- Kajo, I.; Kamel, N.; Ruichek, Y.; Malik, A.S. SVD-Based Tensor-Completion Technique for Background Initialization. IEEE Trans. Image Process. 2018, 27, 3114–3126. [Google Scholar] [CrossRef]
- Kajo, I.; Kamel, N.; Ruichek, Y. Self-Motion-Assisted Tensor Completion Method for Background Initialization in Complex Video Sequences. IEEE Trans. Image Process. 2020, 29, 1915–1928. [Google Scholar] [CrossRef] [PubMed]
- De Gregorio, M.; Giordano, M. Background modeling by weightless neural networks. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2015; Volume 9281, pp. 493–501. [Google Scholar] [CrossRef]
- Maddalena, L.; Petrosino, A. A self-organizing approach to background subtraction for visual surveillance applications. IEEE Trans. Image Process. 2008, 17, 1168–1177. [Google Scholar] [CrossRef]
- Maddalena, L.; Petrosino, A. Extracting a background image by a multi-modal scene background model. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 143–148. [Google Scholar] [CrossRef]
- Maddalena, L.; Petrosino, A. The SOBS algorithm: What are the limits? In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 21–26. [Google Scholar] [CrossRef]
- Halfaoui, I.; Bouzaraa, F.; Urfalioglu, O. CNN-based initial background estimation. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 101–106. [Google Scholar] [CrossRef]
- Tao, Y.; Palasek, P.; Ling, Z.; Patras, I. Background modelling based on generative unet. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2017), Lecce, Italy, 29 August–1 September 2017. [Google Scholar] [CrossRef]
- Sultana, M.; Mahmood, A.; Javed, S.; Jung, S.K. Unsupervised deep context prediction for background estimation and foreground segmentation. Mach. Vis. Appl. 2019, 30, 375–395. [Google Scholar] [CrossRef]
- Yang, C.; Lu, X.; Lin, Z.; Shechtman, E.; Wang, O.; Li, H. High-resolution image inpainting using multi-scale neural patch synthesis. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; Volume 2017, pp. 4076–4084. [Google Scholar] [CrossRef] [Green Version]
- Colombari, A.; Informatica, D.; Cristani, M.; Informatica, D.; Murino, V.; Informatica, D.; Fusiello, A.; Informatica, D. Exemplar-based Background Model Initialization Categories and Subject Descriptors. In Proceedings of the Third ACM International Workshop on Video Surveillance & Sensor Networks, Singapore, 11 November 2005. [Google Scholar] [CrossRef]
- Sobral, A.; Bouwmans, T.; Zahzah, E.H. Comparison of matrix completion algorithms for background initialization in videos. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2015; Volume 9281, pp. 510–518. [Google Scholar] [CrossRef]
- Sobral, A.; Zahzah, E.H. Matrix and tensor completion algorithms for background model initialization: A comparative evaluation. Pattern Recognit. Lett. 2017, 96, 22–33. [Google Scholar] [CrossRef]
- Bouwmans, T.; Maddalena, L.; Petrosino, A. Scene background initialization: A taxonomy. Pattern Recognit. Lett. 2017, 96, 3–11. [Google Scholar] [CrossRef]
- Bouwmans, T.; Javed, S.; Sultana, M.; Jung, S.K. Deep neural network concepts for background subtraction: A systematic review and comparative evaluation. arXiv 2019, arXiv:1811.05255. [Google Scholar]
- Kroeger, T.; Timofte, R.; Dai, D.; Van Gool, L. Fast optical flow using dense inverse search. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2016; pp. 471–488. [Google Scholar] [CrossRef] [Green Version]
- Javed, S.; Jung, S.K.; Mahmood, A.; Bouwmans, T. Motion-Aware Graph Regularized RPCA for background modeling of complex scenes. In Proceedings of the International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 120–125. [Google Scholar] [CrossRef]
- Jodoin, P.M.; Maddalena, L.; Petrosino, A.; Wang, Y. Extensive Benchmark and Survey of Modeling Methods for Scene Background Initialization. IEEE Trans. Image Process. 2017, 26, 5244–5256. [Google Scholar] [CrossRef] [PubMed]
- Maddalena, L.; Petrosino, A. Towards benchmarking scene background initialization. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2015; Volume 9281, pp. 469–476. [Google Scholar] [CrossRef] [Green Version]
- Kingma, D.P.; Ba, J.L. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
Method | Average AGE ↓ | Average pEPs↓ | Average pCPEPs↓ | Average MS-SSIM↑ | Average PSNR↑ | Average CQM↑ |
---|---|---|---|---|---|---|
BB-SGD (ours) | 5.6266 | 0.0447 | 0.0147 | 0.9478 | 30.4016 | 31.2420 |
SPMD [3] | 6.0985 | 0.0487 | 0.0154 | 0.9412 | 29.8439 | 30.6499 |
LabGen-OF [4] | 6.1897 | 0.0566 | 0.0232 | 0.9412 | 29.8957 | 30.7006 |
FSBE [5] | 6.6204 | 0.0605 | 0.0217 | 0.9373 | 29.3378 | 30.1777 |
BEWIS [26] | 6.7094 | 0.0592 | 0.0266 | 0.9282 | 28.7728 | 29.6342 |
NExBI [15] | 6.7778 | 0.0671 | 0.0227 | 0.9196 | 27.9944 | 28.8810 |
Photomontage [14] | 7.1950 | 0.0686 | 0.0257 | 0.9189 | 28.0113 | 28.8719 |
SOBS [28] | 7.5183 | 0.0711 | 0.0242 | 0.9160 | 27.6533 | 28.5601 |
Temporal Median Filter [1] | 8.2761 | 0.0984 | 0.0546 | 0.9130 | 27.5364 | 28.4434 |
Method | Basic | Interm. | Clutter | Jitter | Illumin. | Backgr. | Very | Very |
---|---|---|---|---|---|---|---|---|
Motion | Changes | Motion | Long | Short | ||||
BB-SGD (ours) | 3.7881 | 4.8898 | 3.8776 | 9.5374 | 4.5227 | 8.5607 | 5.6494 | 4.1872 |
SPMD [3] | 3.8141 | 4.1840 | 4.5998 | 9.8095 | 4.4750 | 9.9115 | 6.0926 | 5.9017 |
LabGen-OF [4] | 3.8421 | 4.6433 | 4.1821 | 9.2410 | 8.2200 | 10.0698 | 4.2856 | 5.0338 |
FSBE [5] | 3.8960 | 5.3438 | 4.7660 | 10.3878 | 5.5089 | 10.5862 | 6.9832 | 5.4912 |
BEWIS [26] | 4.0673 | 4.7798 | 10.6714 | 9.4156 | 5.9048 | 9.6776 | 3.9652 | 5.1937 |
Photomontage [14] | 4.4856 | 7.1460 | 6.8195 | 10.1272 | 5.2668 | 12.0930 | 6.6446 | 4.9770 |
SOBS [28] | 4.3598 | 6.2583 | 7.0590 | 10.0232 | 10.3591 | 10.7280 | 6.0638 | 5.2953 |
Temporal Median Filter [1] | 3.8269 | 6.8003 | 12.5316 | 9.0892 | 12.2205 | 9.6479 | 6.9588 | 5.1336 |
Method | Average | Average | Average | Average | Average |
---|---|---|---|---|---|
AGE ↓ | pEPs ↓ | pCEPs ↓ | MS-SSIM ↑ | PSNR ↑ | |
BB-SGD (ours) | 2.4644 | 0.0083 | 0.0058 | 0.9896 | 37.6227 |
LabGen-OF [4] | 2.7191 | 0.0145 | 0.0106 | 0.9824 | 35.9758 |
SS-SVD [24] | 2.7479 | 0.0345 | 0.0907 | 0.9464 | 31.8116 |
LabGen [8] | 2.9945 | 0.0139 | 0.0092 | 0.9764 | 35.2028 |
NExBI [15] | 3.0547 | 0.0077 | 0.0027 | 0.9835 | 35.3078 |
BEWIS [26] | 3.8665 | 0.0242 | 0.0142 | 0.9675 | 32.0143 |
Photomontage [14] | 5.8238 | 0.0469 | 0.0372 | 0.9334 | 31.8573 |
SOBS [28] | 3.5023 | 0.0415 | 0.0222 | 0.9765 | 35.2723 |
Temporal Median Filter [1] | 10.3744 | 0.1340 | 0.1055 | 0.8533 | 28.0044 |
Category | Video | Truncated Model | Full | |||
---|---|---|---|---|---|---|
Version | Model | |||||
v0 | v1 | v2 | v3 | |||
background motion | ||||||
advertisementBoard | 1.61 | 1.62 | 1.60 | 1.34 | 1.71 | |
basic | ||||||
511 | 3.42 | 3.44 | 3.43 | 3.44 | 3.43 | |
Blurred | 1.80 | 1.69 | 1.68 | 1.68 | 1.61 | |
clutter | ||||||
Foliage | 32.87 | 5.86 | 3.62 | 3.41 | 3.37 | |
Board | 21.37 | 6.78 | 7.84 | 7.37 | 7.39 | |
People and Foliage | 31.36 | 9.66 | 3.75 | 2.54 | 2.60 | |
boulevardJam | 21.37 | 15.89 | 19.5 | 11.0 | 2.03 | |
illumination change | ||||||
CameraParameter | 11.49 | 22.19 | 2.16 | 2.81 | 2.95 | |
intermittent motion | ||||||
busStation | 5.31 | 5.40 | 5.47 | 5.67 | 5.32 | |
Candela_m1.10 | 4.93 | 5.09 | 5.18 | 5.21 | 2.81 | |
CaVignal | 12.57 | 12.61 | 13.58 | 14.04 | 2.05 | |
AVSS2007 | 10.98 | 10.32 | 10.25 | 10.01 | 8.73 | |
jitter | ||||||
badminton | 2.62 | 2.00 | 1.93 | 1.74 | 1.84 | |
boulevard | 9.61 | 10.09 | 10.29 | 10.51 | 9.71 | |
very long | ||||||
BusStopMorning | 3.68 | 3.66 | 3.64 | 3.62 | 3.61 | |
very short | ||||||
Toscana | 8.79 | 8.80 | 8.79 | 3.30 | 3.30 | |
DynamicBackground | 6.96 | 6.96 | 6.96 | 8.20 | 8.18 | |
CUHK_Square | 2.77 | 2.77 | 2.77 | 2.99 | 2.98 | |
Average AGE by category | 8.06 | 7.53 | 4.94 | 4.51 | 3.75 |
Number of Iterations | 100 | 250 | 500 | 1000 | 3000 |
---|---|---|---|---|---|
Learning Rate | 0.06 | 0.03 | 0.03 | 0.03 | 0.03 |
Computation time for 79 videos of the SBMnet dataset (seconds) | 337 | 391 | 482 | 666 | 1409 |
Average AGE by category on 18 videos of the SBMnet dataset listed in Table 4 | 4.07 | 3.83 | 3.80 | 3.76 | 3.75 |
Average AGE on SBI dataset | 2.78 | 2.56 | 2.53 | 2.49 | 2.46 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sauvalle, B.; de La Fortelle, A. Fast and Accurate Background Reconstruction Using Background Bootstrapping. J. Imaging 2022, 8, 9. https://doi.org/10.3390/jimaging8010009
Sauvalle B, de La Fortelle A. Fast and Accurate Background Reconstruction Using Background Bootstrapping. Journal of Imaging. 2022; 8(1):9. https://doi.org/10.3390/jimaging8010009
Chicago/Turabian StyleSauvalle, Bruno, and Arnaud de La Fortelle. 2022. "Fast and Accurate Background Reconstruction Using Background Bootstrapping" Journal of Imaging 8, no. 1: 9. https://doi.org/10.3390/jimaging8010009
APA StyleSauvalle, B., & de La Fortelle, A. (2022). Fast and Accurate Background Reconstruction Using Background Bootstrapping. Journal of Imaging, 8(1), 9. https://doi.org/10.3390/jimaging8010009