Not just second-by-second, but millisecond-by-millisecond.
Let’s start with the principle. You might have heard about generative art or generative design. The term literally means capable of production or re-production.
Generative AI refers to programs that can use existing content like text, audio files, or images to create new plausible content.
Generative audio refers to the creation of audio files from often large datasets of audio clips, such as creating phrases and sentences that may have never been actually spoken.
So all synthetic voices can be considered as generative audio? No neural text to speech voices are generative audio. Some voices use a different technology in which a collection of fragments are stitched together, on demand.
This is generally considered an older technology.
Generative audio works differently, using neural networks to learn the statistical properties of speech and audio in general, then reproducing those properties directly in any context, modeling how speech changes over time. Not just second-by-second, but millisecond-by-millisecond.