A New Loss Function for Simultaneous Object Localization and Classification
Abstract
:1. Introduction
- Selectively reuse the set of the most important features from preceding layers;
- Actively update the set of preceding features to increase their utility for later layers, achieving promising performance in image classification (ImageNet) and object detection (MS COCO) in terms of both theoretical efficiency and practical speed.
2. Materials and Methods
2.1. Convolutional Neural Network
2.2. Learning Process
3. Results
Property | Value |
---|---|
Epoch | 200 |
Iteration | 5400 |
Training time | 52 min 17 sec |
Loss | 4.497 |
Classification loss | 0 |
Regression loss | 8.431 |
Validation loss | 6.461 |
Validation classification loss | 0 |
Validation regression loss | 12.922 |
4. Discussion
5. Conclusions
- 1.
- The first network always computes the loss by adding the classification task loss and the localization task loss (Equation (6)), whereas the second only takes into account the localization task loss when there is a pin in the image (Equation (8)).
- 2.
- In the first network, the localization task loss is multiplied by factor (Equation (6)), reducing the importance that this loss has in the overall loss of the network.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
References
- International Federation of Robotics. Available online: https://ifr.org/img/worldrobotics/Executive_Summary_WR_Industrial_Robots_2021.pdf (accessed on 10 December 2022).
- Probst, L.; Frideres, L.; Pedersen, B.; Caputi, C.; Luxembourg, P. Service Innovation for Smart Industry Human-Robot Collaboration Business Innovation Observatory Contract No 190/PP/ENT/CIP/12/C/N03C01; European Commission: Brussels, Belgium, 2015. [Google Scholar]
- Forschungsunion. Recommendations for Implementing the Strategic Initiative INDUSTRIE 4.0 April 2013 Securing the Future of German Manufacturing Industry Final Report of the Industrie 4.0 Working Group; Forschungsunion: Berlin, Germany, 2012. [Google Scholar]
- Villani, V.; Pini, F.; Leali, F.; Secchi, C. Survey on Human–Robot Collaboration in Industrial Settings: Safety, Intuitive Interfaces and Applications. Mechatronics 2018, 55, 248–266. [Google Scholar] [CrossRef]
- Inziarte-Hidalgo, I.; Uriarte, I.; Fernandez-Gamiz, U.; Sorrosal, G.; Zulueta, E. Robotic-Arm-Based Force Control in Neurosurgical Practice. Mathematics 2023, 11, 828. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2323. [Google Scholar] [CrossRef] [Green Version]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Kai, L.; Li, F.-F. ImageNet: A Large-Scale Hierarchical Image Database; Institute of Electrical and Electronics Engineers (IEEE): New York, NY, USA, 2010; pp. 248–255. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. In Proceedings of the Computer Vision–ECCV 2014 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Springer International Publishing: New York, NY, USA, 2014; pp. 818–833. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Wide Residual Networks. arXiv 2016, arXiv:1605.07146. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE Computer Society: Washington, DC, USA, 2015; pp. 1–9. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Yang, L.; Jiang, H.; Cai, R.; Wang, Y.; Song, S.; Huang, G.; Tian, Q. CondenseNet V2: Sparse Feature Reactivation for Deep Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Sultana, F.; Sufian, A.; Dutta, P. Advancements in Image Classification Using Convolutional Neural Network. In Proceedings of the 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), Kolkata, India, 22–23 November 2018. [Google Scholar] [CrossRef] [Green Version]
- Zhao, Z.Q.; Zheng, P.; Xu, S.T.; Wu, X. Object Detection with Deep Learning: A Review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; IEEE Computer Society: Washington, DC, USA, 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; IEEE: Washington, DC, USA, 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fu, L.; Feng, Y.; Majeed, Y.; Zhang, X.; Zhang, J.; Karkee, M.; Zhang, Q. Kiwifruit Detection in Field Images Using Faster R-CNN with ZFNet; Elsevier B.V.: Amsterdam, The Netherlands, 2018; Volume 51, pp. 45–50. [Google Scholar]
- Song, Z.; Fu, L.; Wu, J.; Liu, Z.; Li, R.; Cui, Y. Kiwifruit Detection in Field Images Using Faster R-CNN with VGG16. IFAC Pap. 2019, 52, 76–81. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision–ECCV 2016 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Li, Z.; Peng, C.; Yu, G.; Zhang, X.; Deng, Y.; Sun, J. Light-Head R-CNN: In Defense of Two-Stage Object Detector. arXiv 2017, arXiv:1711.07264. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-Shot Refinement Neural Network for Object Detection. In Proceedings of the Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; IEEE Computer Society: Washington, DC, USA, 2018; pp. 4203–4212. [Google Scholar]
- Wu, X.; Sahoo, D.; Hoi, S.C.H. Recent Advances in Deep Learning for Object Detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef] [Green Version]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Teso-Fz-Betoño, D.; Zulueta, E.; Sánchez-Chica, A.; Fernandez-Gamiz, U.; Saenz-Aguirre, A. Semantic Segmentation to Develop an Indoor Navigation System for an Autonomous Mobile Robot. Mathematics 2020, 8, 855. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Layer Name | Previous Layer | Function | Weight Filter Size /Kernels | Padding | Stride | Output Tensor Size | Learnable Parameters |
---|---|---|---|---|---|---|---|
in | - | - | - | - | 100 × 100 × 1 | - | |
conv1 | in | conv2d | 5 × 5 × 1/16 | same | 1 | 100 × 100 × 16 | 416 |
bn1 | conv1 | - | - | - | 100 × 100 × 16 | 32 | |
relu1 | bn1 | ReLU | - | - | 100 × 100 × 16 | - | |
conv2 | relu1 | conv2d | 3 × 3 × 16/32 | same | 2 | 50 × 50 × 32 | 4608 |
bn2 | conv2 | - | - | - | 50 × 50 × 32 | 64 | |
relu2 | bn2 | ReLU | - | - | 50 × 50 × 32 | - | |
conv3 | relu2 | conv2d | 3 × 3 × 32/32 | same | 1 | 50 × 50 × 32 | 9216 |
bn3 | conv3 | - | - | - | 50 × 50 × 32 | 64 | |
relu3 | bn3 | ReLU | - | - | 50 × 50 × 32 | - | |
fc1 | relu3 | - | - | - | 1 × 1 × 2 | 160 k | |
softmax | fc1 | softmax | - | - | 1 × 1 × 2 | - | |
fc2 | relu3 | - | - | - | 1 × 1 × 2 | 160 k |
Parameter | Value |
---|---|
Learn rate | 0.001 |
Gradient decay factor | 0.9 |
Squared gradient decay factor | 0.999 |
Epsilon * | 10−8 |
Property | Value |
---|---|
Epoch | 200 |
Iteration | 5400 |
Training time | 54 min 37 sec |
Loss | 2.11 |
Classification loss | 0 |
Regression loss | 21.087 |
Validation loss | 1.87 |
Validation classification loss | 2 × 10−4 |
Validation regression loss | 18.699 |
Image Index | T1 | Y1 | T2 | Y2 | Loss | Classification | Regression | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
No Pin | Pin | No Pin | Pin | x | y | x | y | ||||
774 | 0 | 1 | 0 | 1 | 47 | 25 | 44.19 | 26.49 | 0.505 | 0 | 5.053 |
34 | 1 | 0 | 1 | 0 | 0 | 0 | −0.40 | −2.47 | 0.314 | 0 | 3.142 |
816 | 1 | 0 | 1 | 0 | 0 | 0 | 29.24 | 33.41 | 98.58 | 0 | 985.8 |
869 | 0 | 1 | 0 | 1 | 43 | 55 | 37.63 | 47.26 | 4.438 | 0 | 44.384 |
11 | 0 | 1 | 0 | 1 | 60 | 66 | 55.52 | 64.16 | 1.173 | 0 | 11.728 |
165 | 1 | 0 | 1 | 0 | 0 | 0 | 10.94 | 15.25 | 17.624 | 0 | 176.24 |
836 | 1 | 0 | 1 | 0 | 0 | 0 | 10.53 | 7.06 | 8.036 | 0 | 80.364 |
357 | 0 | 1 | 1 | 0 | 31 | 25 | 21.04 | 16.74 | 44.416 | 36.044 | 83.724 |
697 | 0 | 1 | 0 | 1 | 28 | 31 | 33.59 | 40.84 | 6.401 | 0 | 64.014 |
827 | 0 | 1 | 0 | 1 | 28 | 29 | 21.69 | 22.37 | 4.193 | 1.59 × 10−4 | 41.93 |
Image Index | T1 | Y1 | T2 | Y2 | Loss | Classification | Regression | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
No Pin | Pin | No Pin | Pin | x | y | x | y | ||||
774 | 0 | 1 | 0 | 1 | 47 | 25 | 41.62 | 23.61 | 15.414 | 0 | 15.414 |
34 | 1 | 0 | 1 | 0 | 0 | 0 | 3.35 | 1.84 | 0 | 0 | 7.295 |
816 | 1 | 0 | 1 | 0 | 0 | 0 | 17.07 | 21.18 | 0 | 0 | 370.07 |
869 | 0 | 1 | 0 | 1 | 43 | 55 | 40.28 | 45.80 | 46.019 | 0 | 46.019 |
11 | 0 | 1 | 0 | 1 | 60 | 66 | 59.09 | 65.53 | 0.521 | 0 | 0.521 |
165 | 1 | 0 | 1 | 0 | 0 | 0 | −2.39 | 3.31 | 0 | 0 | 8.315 |
836 | 1 | 0 | 1 | 0 | 0 | 0 | 10.77 | 4.42 | 0 | 0 | 67.727 |
357 | 0 | 1 | 0.991 | 0.009 | 31 | 25 | 17.13 | 13.61 | 196.74 | 35.736 | 161 |
697 | 0 | 1 | 0 | 1 | 28 | 31 | 25.48 | 28.55 | 6.170 | 0 | 6.170 |
827 | 0 | 1 | 0 | 1 | 28 | 29 | 25.91 | 29.84 | 2.533 | 0 | 2.533 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sanchez-Chica, A.; Ugartemendia-Telleria, B.; Zulueta, E.; Fernandez-Gamiz, U.; Gomez-Hidalgo, J.M. A New Loss Function for Simultaneous Object Localization and Classification. Mathematics 2023, 11, 1205. https://doi.org/10.3390/math11051205
Sanchez-Chica A, Ugartemendia-Telleria B, Zulueta E, Fernandez-Gamiz U, Gomez-Hidalgo JM. A New Loss Function for Simultaneous Object Localization and Classification. Mathematics. 2023; 11(5):1205. https://doi.org/10.3390/math11051205
Chicago/Turabian StyleSanchez-Chica, Ander, Beñat Ugartemendia-Telleria, Ekaitz Zulueta, Unai Fernandez-Gamiz, and Javier Maria Gomez-Hidalgo. 2023. "A New Loss Function for Simultaneous Object Localization and Classification" Mathematics 11, no. 5: 1205. https://doi.org/10.3390/math11051205
APA StyleSanchez-Chica, A., Ugartemendia-Telleria, B., Zulueta, E., Fernandez-Gamiz, U., & Gomez-Hidalgo, J. M. (2023). A New Loss Function for Simultaneous Object Localization and Classification. Mathematics, 11(5), 1205. https://doi.org/10.3390/math11051205