POST-TRAIN ADAPTIVE U-NET FOR IMAGE SEGMENTATION

Authors

DOI:

https://doi.org/10.32782/IT/2022-2-8

Keywords:

Adaptive convolutional neural networks, image segmentation, inference speed, mobile computing, edge computing, computer vision

Abstract

Many fields benefit from fast and accurate image segmentation. Convolutional neural networks show the best accuracy solving the task. Applications include medical or satellite imaging, autonomous driving, etc. Typical neural network architectures used for image segmentation are expected to be fully configured before the training procedure starts. To change the network architecture additional training steps are required. This is quite limiting as the network might not only be executed on a powerful server, but also on a mobile or edge device. Adaptive neural networks offer a solution to the problem by allowing certain adaptivity after the training process is complete. In this work for the first time, we apply Post-Train Adaptive (PTA) approach to the task of image segmentation. We introduce U-Net+PTA neural network, which can be trained once, and then adapted to different device performance categories. The two key components of the approach are PTA blocks and PTA-sampling training strategy. The PTA blocks were added into the U-Net neural network with MobileNetV2 backbone. The post-train configuration can be done at runtime on any inference device including, but not limited to mobile devices. In addition to post-train neural network configuration, the PTA approach has allowed to improve image segmentation quality (Dice score) on the CamVid dataset. The final trained model can be switched at runtime between 6 PTA configurations. These configurations differ by inference time and quality. Importantly, all of the configurations have better quality than the original U-Net (No PTA) model. The possible future research direction is to expand the inference time difference between heavy and light PTA configurations to allow a single trained PTA-based network to target even more device performance categories.

References

Brostow, G. J., Fauqueur, J., & Cipolla, R. (2009). Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 30(2), 88–97. https://doi.org/10.1016/j.patrec.2008.04.005

Figurnov, M., Collins, M. D., Zhu, Y., Zhang, L., Huang, J., Vetrov, D. P., & Salakhutdinov, R. (2017). Spatially adaptive computation time for residual networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, 1790–1799. https://doi.org/10.1109/CVPR.2017.194

Graves, A. (2016). Adaptive computation time for recurrent neural networks. CoRR, abs/1603.08983. http://arxiv.org/abs/1603.08983

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, 770–778. https://doi.org/10.1109/CVPR.2016.90

Hnatushenko, V. V., Zhernovyi, V., Udovyk, I., & Shevtsova, O. (2021). Intelligent system for building separation on a semantically segmented map. Proceedings of the 2nd International Workshop on Intelligent Information Technologies & Systems of Information Security with CEUR-WS, Khmelnytskyi, Ukraine, March 24-26, 2021, 2853, 1–11. http://ceur-ws.org/Vol-2853/keynote1.pdf

Howard, A., Pang, R., Adam, H., Le, Q. V., Sandler, M., Chen, B., Wang, W., Chen, L.-C., Tan, M., Chu, G., Vasudevan, V., & Zhu, Y. (2019). Searching for MobileNetV3. 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, 1314–1324. https://doi.org/10.1109/ICCV.2019.00140

Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, 7132–7141. https://doi.org/10.1109/CVPR.2018.00745

Khabarlak, K. (2022a). Face detection on mobile: Five implementations and analysis. CoRR, abs/2205.05572. https://doi.org/10.48550/arXiv.2205.05572

Khabarlak, K. (2022b). Post-train adaptive MobileNet for fast anti-spoofing. CEUR Workshop Proceedings, 3156, 44–53. http://ceur-ws.org/Vol-3156/keynote5.pdf

Khabarlak, K. (2022c). Faster optimization-based meta-learning adaptation phase. Radio Electronics, Computer Science, Control, 1, 82–92. https://doi.org/10.15588/1607-3274-2022-1-10

Khabarlak, K., & Koriashkina, L. (2022). Fast facial landmark detection and applications: A survey. Journal of Computer Science and Technology, 22(1), 12–41. https://doi.org/10.24215/16666038.22.e02

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Y. Bengio & Y. LeCun (Eds.), 3rd international conference on learning representations, ICLR 2015, san diego, CA, USA, may 7–9, 2015, conference track proceedings. http://arxiv.org/abs/1412.6980

Lin, T.-Y., Dollár, P., Girshick, R. B., He, K., Hariharan, B., & Belongie, S. J. (2017). Feature pyramid networks for object detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017, 936–944. https://doi.org/10.1109/CVPR.2017.106

Milletari, F., Navab, N., & Ahmadi, S.-A. (2016). V-Net: Fully convolutional neural networks for volumetric medical image segmentation. Fourth International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, October 25-28, 2016, 565–571. https://doi.org/10.1109/3DV.2016.79

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 – 18th International Conference Munich, Germany, October 5–9, 2015, Proceedings, Part III, 9351, 234–241. https://doi.org/10.1007/978-3-319-24574-4_28

Sandler, M., Howard, A. G., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). MobileNetV2: Inverted residuals and linear bottlenecks. 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). Deep high-resolution representation learning for human pose estimation. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, 5693–5703. https://doi.org/10.1109/CVPR.2019.00584

Downloads

Published

2022-12-29