An Analysis of Solutions to the Frame Consistency Problem in Video style Transfer based on GAN
DOI:
https://doi.org/10.54097/21v5a750Keywords:
Video style transfer, Generative Adversarial Networks (GANs), temporal consistency.Abstract
Video style transfer has emerged as a significant area within the realm of neural style transfer, holding promise for diverse applications. In this context, GANs have gained traction for their advantages in addressing temporal consistency concerns. This paper explores various strategies to manage frame consistency within GAN-based video-style conversion. It delves into techniques such as Recurrent Neural Networks (RNNs), 3D convolutions, inter-frame continuity in the discriminator, and time loss functions. The research highlights key studies that employ RNNs to complement GAN-based frameworks for improved temporal consistency. The study also evaluates a range of other approaches, such as introducing inter-frame continuity in the discriminator and applying time loss functions to minimize visual discrepancies. Through this analysis, the paper contributes insights into the evolving landscape of video style transfer techniques, guiding researchers towards effective strategies for achieving frame consistency and advancing the field.
Downloads
References
Thimonier, H. et al. Learning Long Term Style Preserving Blind Video Temporal Consistency. 2021 IEEE International Conference on Multimedia and Expo, 2021, 1 - 10.
Lai, W.-S., Huang, J.-B., Wang, O., Shechtman, E., Ersin Yumer, & Yang, M.-H. Learning Blind Video Temporal Consistency. European Conference on Computer Vision, 2018: 179 – 195.
Patil, P. W., Dudhane, A., & Murala, S. End-to-End Recurrent GAN for Traffic and Surveillance Applications. IEEE Transactions on Vehicular Technology, 2020, 69 (12): 14550 - 14562.
Xiang, L., Kong, G., Duan, X., Long, H., & Wu, Y. CRVC-GAN: combining cross-scale fusion and recursion for video colourization GAN. Journal of Electronic Imaging, 2022, 31 (06).
Xue, Tianfan, et al. Video Enhancement with Task-Oriented Flow. International Journal of Computer Vision, 2019, 127 (8): 1106 – 1125.
Brouwer, E.D., Simm, J., Arany, A., & Moreau, Y. GRU-ODE-Bayes: Continuous modelling of sporadically-observed time series. neural Information Processing Systems, 2019, 32: 7377 - 7388.
Chen, Xin, et al. Singing Voice Conversion with Non-parallel Data. 2019 IEEE Conference on Multimedia Information Processing and Retrieval, 2019: 292 - 296.
Patraucean, V., Handa, A., & Cipolla, R. Spatio-temporal video autoencoder with differentiable memory. arXiv (Cornell University), 2015.
Yan, Mengjia, et al. 3D Convolutional GAN for Detecting Temporal Irregularities in Videos. 2018 24th International Conference on Pattern Recognition, 2018.
Aigner, S., and M. Körner. Futuregan: Anticipating the future frames of video sequences using spatio-temporal 3D convolutions in progressively growing gans. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2019, XLII-2/W16: 3 – 11.
Wang, Jianyi, et al. MW-Gan+ for perceptual quality enhancement on compressed video. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32 (7): 4224 – 4237.
Chan, Caroline, et al. Enhancing Space-time Video Super-resolution 2019 IEEE/CVF International Conference on Computer Vision, 2019: 5932 - 5941.
Li, Wenbo, et al. Evolvement Constrained Adversarial Learning for Video Style Transfer. Computer Vision, Asian Conference on Computer Vision, 2018, 2019: 232 – 248.
Xu, Yiran, et al. Temporally consistent semantic video editing. Lecture Notes in Computer Science, 2022: 357 – 374.
Ruder, Manuel, et al. Artistic style transfer for videos. Lecture Notes in Computer Science, 2016: 26 – 36.
Cramer, Aurora Linh, et al. Look, listen, and learn more: Design choices for Deep Audio Embeddings. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019: 3852 – 3856.
Z. Yue and M. Shi, Enhancing Space-time Video Super-resolution via Spatial-temporal Feature Interaction, arXiv (Cornell University), 2022.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







