Mathematical Models for Audio Generation and Processing Effects in Interactive Media

Junhao Sun

doi:10.54097/1yxzgp35

Authors

Junhao Sun

DOI:

https://doi.org/10.54097/1yxzgp35

Keywords:

Interactive Audio Generation, Neural Audio Processing, Real-Time Diffusion Models.

Abstract

This paper provides a systematic review of mathematical models for audio generation and processing effects in interactive media applications. It first analyzes the core challenges of interactive audio in terms of real-time performance, controllability, and adaptability. Building upon this foundation, the paper focuses on dissecting three major technical approaches. Traditional methods, represented by physical modeling and procedural audio, offer computational efficiency and intuitive interactivity. Sample-based synthesis techniques, such as wavetable and granular synthesis, enable rich real-time variations while maintaining sound quality. Cutting-edge deep generative models and neural audio processing models deliver unprecedented generative diversity and audio fidelity, achieving high-level semantic control through conditional generation and latent space manipulation. Current research is driving interactive audio toward real-time, conditional, and semantic capabilities. Future success hinges on balancing high-performance models with stringent real-time and low-resource constraints.

References

[1] Liu Yao. Research on Real-Time Audio Processing Technology. Deep Research, 2025 (02): 1 - 15.

[2] Andreev A. Generative Models for Speech Enhancement: From Statistical Heuristics to Data-Driven Sampling. HSE Summary Papers, 2024: 1 - 12.

[3] Donahue C, Mcauley J, Puckette M. Adversarial Audio Synthesis. 2018.

[4] Karras T., Laine S., Aila T. A Style-Based Generator Architecture for Generative Adversarial Networks. Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019: 4401 – 4410.

[5] Dhariwal P, Nichol A. Diffusion models beat GANs on image synthesis, Advances in Neural Information Processing Systems 34. 2021: 8780 - 8794.

[6] Engel J, Resnick C, Roberts A, et al. Neural audio synthesis of musical notes with WaveNet autoencoders, International Conference on Machine Learning. PMLR, 2017: 1068 – 1077.

[7] Van den Oord A, Dieleman S, Zen H, et al. WaveNet: A Generative Model for Raw Audio, Proc. 9th ISCA Speech Synthesis Workshop. 2016: 125.

[8] Wright A, Välimäki V. Real-time black-box modelling with recurrent neural networks: Learning guitar amplifiers and distortion circuits, Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2019: 268 - 272.

[9] Engel J, Resnick C, Roberts A, et al. DDSP: Differentiable Digital Signal Processing, International Conference on Learning Representations. 2020.