• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

Adobe's "photoshop for audio"

Status
Not open for further replies.
One day we will be able to perfectly replicate people's voices. After an actor's death, their voice will continue to be heard in full films through the power of audio photoshopping.
 
That audio sample they're using is waaaay longer than the demonstrated sentence. I'm wondering if it's just yanking the typed word from the rest of the sample that was not shown.

EDIT: Ok, "three times" was synthesized.

"Jordan" was not. It does sound sort of unnatural and producing a longer sentence from scratch may be very evidently fake. I would love to tinker with it.
 
That audio sample they're using is waaaay longer than the demonstrated sentence. I'm wondering if it's just yanking the typed word from the rest of the sample that was not shown.

EDIT: Ok, "three times" was synthesized.

"Jordan" was not. It does sound sort of unnatural and producing a longer sentence from scratch may be very evidently fake. I would love to tinker with it.

Imagine combining this with some form of AI, though, that can sort out the "unnatural" portions and make it very Turing-capable. That's some scary shit.
 
Imagine combining this with some form of AI, though, that can sort out the "unnatural" portions and make it very Turing-capable. That's some scary shit.

We already have pretty advanced synthesized voices. Just talk to Siri or something. I'm sure they create that voice in a similar fashion.

I do believe synthesized voice will be indistinguishable from a real voice very soon. If it's not already somewhere, and attaching that to a turing passable AI would be pretty frightening to watch. I agree lol
 
I was at Adobe Max while they were demoing this Thursday. Hilarious and awesome all at the same time.
 
that is pretty neat. I wonder how good the security will be in order to reveal Fake vs Source audio.

someone could essentially have some VO (voco) over b footage material and could make it believable to the public.
 
that is pretty neat. I wonder how good the security will be in order to reveal Fake vs Source audio.

someone could essentially have some VO (voco) over b footage material and could make it believable to the public.

You can layer in an inaudible layer of audio waves that could ID it as fake instantly. But you could also figure out the ID waves, recreate them, and use them to invert the ID waves in the fake audio.

Unless it's some type of encryption style wave form in there, I think it will be impossible to fully prevent people from creating fakes that are mistaken as real.

Even if you can't do all that, playing the fake sample through a speaker, and re-recording it with a microphone/phone will jumble it up enough to where it will not be able to be ID'd as fake.
 
Videos very cool but they took 20mins of voice data before. Interested to see more nonetheless.

You need something to train the algo on to construct the phonemes. It used to take hours and hours of input data to get a passable voice, like they did for Roger Ebert, so this is just the evolution of that tech
 
Oooh, damn, that looks very cool. Obviously it still has a while to go before it's totally seamless and you can't tell, but the seeds are totally there. It's just a matter of time.
 
Even if you can't do all that, playing the fake sample through a speaker, and re-recording it with a microphone/phone will jumble it up enough to where it will not be able to be ID'd as fake.

Depends. Digital watermarks can be designed to survive degraded copies. Cinavia, for example, is specifically designed to survive microphone recording, digital compression, downmixing, etc.
 
This is the ultimate Dr. Sbaitso, but Adobe should keep it coming if they still want my increasingly skeptical $50.00/mo.
 
Depends. Digital watermarks can be designed to survive degraded copies. Cinavia, for example, is specifically designed to survive microphone recording, digital compression, downmixing, etc.

That's for copyrighted materials. There's a difference between a program scanning an audio clip and being able to identify it (Shazam), and creating a new audio clip with an ID wave in it and make it able to survive degradation.
 
I love this. Sooner or later, I can just read about 10-60 minutes worth of written works one time in my life to set-up the algorithm.

Then anytime I want an audio book for some obscure novel, I can just copy and paste an entire novel's text from an ebook into the Adobe software, save it as an audio file, and boom, a nice little .MP3 audio book of the book I using my own voice with my own vocal nuances.

I could create an entire digital library of ebooks using my own voice without ever even reading more than one or two books out loud to setup the algorithm.
 
Imagine the idea of using this as therapy for people with alzheimers or dementia, who've had their spouses or loved ones pass away.

Or at least imagine the Black Mirror episode about basically this.
 
This is seriously incredible . I mean if you could just sample small line , and then right the whole speech in the same dialect , that would be Magical.
 
I'm wondering how clear the sample audio has to be as well. If you have a bunch of phone recordings, I wonder how it would sound. Probably like the person is perpetually in a phone I'd assume.
 
We already have pretty advanced synthesized voices. Just talk to Siri or something. I'm sure they create that voice in a similar fashion.

I do believe synthesized voice will be indistinguishable from a real voice very soon. If it's not already somewhere, and attaching that to a turing passable AI would be pretty frightening to watch. I agree lol

Check out WaveNet which is Google's DeepMind AI doing crazy synthesizing shit. Way above this Adobe demo:

https://deepmind.com/blog/wavenet-generative-model-raw-audio/
 
I was just thinking about something like this after playing with Vocaloid lol!! That's probably a real generalization though.
 
I'm wondering how clear the sample audio has to be as well. If you have a bunch of phone recordings, I wonder how it would sound. Probably like the person is perpetually in a phone I'd assume.

The creative mangling potential of this software is much more interesting to me than the assumed practical implications.
 
This is going to result in a few very public controversies before the public gets used to the concept of doctored audio.
 
Oh boy, the stuff you could do with this, editing some movies/tv shows with different texts is one example I'd do.
 
As someone who works in game dialogue production, a lot of actors (in all elements of media) will probably start having contract stipulations that ban using this sort of technology for post-production editing. The last thing they want is for their voice to be manipulated in a way that could damage their image, and they would lose out on recording pickup takes.

Also, it will be inevitable that some people will feed audio books through this thing to create virtual readings of
erotic
fan fiction by actors related to the involved roles.
 
Status
Not open for further replies.
Top Bottom