Support NeoGAF

Cute Dragon Here · Nov 7, 2016

I... uh... well, this is very interesting, and scary. Editing audio as if it were text...

Got to say, whoever the programmers are at Adobe, sometimes they're frighteningly good...

Onaco · Nov 7, 2016

I'd like to toy around with this one...

PorygonVandal · Nov 7, 2016

This is amazing and creepy at the same time....

plainr_ · Nov 7, 2016

This is some next level shit.

CarpeDeezNutz · Nov 7, 2016

Whoa, that's pretty cool.

Funyarinpa · Nov 7, 2016

I for one welcome our-our-our-our new Adobe overlords

BadHand · Nov 7, 2016

"It wasn't me, I was hacked!"

Rad- · Nov 7, 2016

Damn just imagine prank calls with this.

DiipuSurotu · Nov 7, 2016

One day we will be able to perfectly replicate people's voices. After an actor's death, their voice will continue to be heard in full films through the power of audio photoshopping.

LumpOfCole · Nov 7, 2016

Wasn't this exact thing a major plot device in one of the seasons of 24?

Ray Wonder · Nov 7, 2016

That audio sample they're using is waaaay longer than the demonstrated sentence. I'm wondering if it's just yanking the typed word from the rest of the sample that was not shown.

EDIT: Ok, "three times" was synthesized.

"Jordan" was not. It does sound sort of unnatural and producing a longer sentence from scratch may be very evidently fake. I would love to tinker with it.

Crystalkoen · Nov 7, 2016

Ray Wonder said:
That audio sample they're using is waaaay longer than the demonstrated sentence. I'm wondering if it's just yanking the typed word from the rest of the sample that was not shown.

EDIT: Ok, "three times" was synthesized.

"Jordan" was not. It does sound sort of unnatural and producing a longer sentence from scratch may be very evidently fake. I would love to tinker with it.

Imagine combining this with some form of AI, though, that can sort out the "unnatural" portions and make it very Turing-capable. That's some scary shit.

the better twin · Nov 7, 2016

Videos very cool but they took 20mins of voice data before. Interested to see more nonetheless.

Master Yoshi · Nov 7, 2016

The original audio said "Grab em by the hand!" Believe me! That tape was rigged by the cyber terrorists!

Ray Wonder · Nov 7, 2016

Crystalkoen said:
Imagine combining this with some form of AI, though, that can sort out the "unnatural" portions and make it very Turing-capable. That's some scary shit.

We already have pretty advanced synthesized voices. Just talk to Siri or something. I'm sure they create that voice in a similar fashion.

I do believe synthesized voice will be indistinguishable from a real voice very soon. If it's not already somewhere, and attaching that to a turing passable AI would be pretty frightening to watch. I agree lol

Pagusas · Nov 7, 2016

I was at Adobe Max while they were demoing this Thursday. Hilarious and awesome all at the same time.

Samurai G0SU · Nov 7, 2016

that is pretty neat. I wonder how good the security will be in order to reveal Fake vs Source audio.

someone could essentially have some VO (voco) over b footage material and could make it believable to the public.

Trojan X · Nov 7, 2016

Wow. This is big! Love to hear it string off long sentences. Crossing fingers the sound won't become robotic.

Ray Wonder · Nov 7, 2016

Samurai G0SU said:
that is pretty neat. I wonder how good the security will be in order to reveal Fake vs Source audio.

someone could essentially have some VO (voco) over b footage material and could make it believable to the public.

You can layer in an inaudible layer of audio waves that could ID it as fake instantly. But you could also figure out the ID waves, recreate them, and use them to invert the ID waves in the fake audio.

Unless it's some type of encryption style wave form in there, I think it will be impossible to fully prevent people from creating fakes that are mistaken as real.

Even if you can't do all that, playing the fake sample through a speaker, and re-recording it with a microphone/phone will jumble it up enough to where it will not be able to be ID'd as fake.

sixteen-bit · Nov 7, 2016

wonder what this would mean for comedians who live off impersonations

FyreWulff · Nov 7, 2016

the better twin said:
Videos very cool but they took 20mins of voice data before. Interested to see more nonetheless.

You need something to train the algo on to construct the phonemes. It used to take hours and hours of input data to get a passable voice, like they did for Roger Ebert, so this is just the evolution of that tech

GurgleBot20 · Nov 7, 2016

So glad the election is done on Wednesday.

kai3345 · Nov 7, 2016

damn this shit is wild

Viewt · Nov 7, 2016

Oooh, damn, that looks very cool. Obviously it still has a while to go before it's totally seamless and you can't tell, but the seeds are totally there. It's just a matter of time.

NekoFever · Nov 7, 2016

Ray Wonder said:
Even if you can't do all that, playing the fake sample through a speaker, and re-recording it with a microphone/phone will jumble it up enough to where it will not be able to be ID'd as fake.

Depends. Digital watermarks can be designed to survive degraded copies. Cinavia, for example, is specifically designed to survive microphone recording, digital compression, downmixing, etc.

fixedpoint · Nov 7, 2016

This is the ultimate Dr. Sbaitso, but Adobe should keep it coming if they still want my increasingly skeptical $50.00/mo.

Ray Wonder · Nov 7, 2016

NekoFever said:
Depends. Digital watermarks can be designed to survive degraded copies. Cinavia, for example, is specifically designed to survive microphone recording, digital compression, downmixing, etc.

That's for copyrighted materials. There's a difference between a program scanning an audio clip and being able to identify it (Shazam), and creating a new audio clip with an ID wave in it and make it able to survive degradation.

Kreed · Nov 7, 2016

We had two threads about this:

http://www.neogaf.com/forum/showthread.php?p=223252891

http://www.neogaf.com/forum/showthread.php?t=1307695

RumblingRosco · Nov 7, 2016

I love this. Sooner or later, I can just read about 10-60 minutes worth of written works one time in my life to set-up the algorithm.

Then anytime I want an audio book for some obscure novel, I can just copy and paste an entire novel's text from an ebook into the Adobe software, save it as an audio file, and boom, a nice little .MP3 audio book of the book I using my own voice with my own vocal nuances.

I could create an entire digital library of ebooks using my own voice without ever even reading more than one or two books out loud to setup the algorithm.

BGBW · Nov 7, 2016

Chad Warden videos have reached the next level.

Horsemama1956 · Nov 7, 2016

DiipuSurotu said:
One day we will be able to perfectly replicate people's voices. After an actor's death, their voice will continue to be heard in full films through the power of audio photoshopping.

You mean animation? otherwise who cares about their voice.

chrisPjelly · Nov 7, 2016

BGBW said:
Chad Warden videos have reached the next level.

Silvergunner's gonna love this

meltingparappa · Nov 7, 2016

Imagine the idea of using this as therapy for people with alzheimers or dementia, who've had their spouses or loved ones pass away.

Or at least imagine the Black Mirror episode about basically this.

vivekTO · Nov 7, 2016

This is seriously incredible . I mean if you could just sample small line , and then right the whole speech in the same dialect , that would be Magical.

Window · Nov 7, 2016

Difficult to determine how natural and effective the voice recreation is from that clip but the tools certainly seem cool in how easy they seem to make the process of editing speech.

Horsemama1956 said:
You mean animation? otherwise who cares about their voice.

This kind of already exists

Ray Wonder · Nov 7, 2016

I'm wondering how clear the sample audio has to be as well. If you have a bunch of phone recordings, I wonder how it would sound. Probably like the person is perpetually in a phone I'd assume.

Diancecht · Nov 7, 2016

This is black magic.

Calidor · Nov 7, 2016

Ray Wonder said:
We already have pretty advanced synthesized voices. Just talk to Siri or something. I'm sure they create that voice in a similar fashion.

I do believe synthesized voice will be indistinguishable from a real voice very soon. If it's not already somewhere, and attaching that to a turing passable AI would be pretty frightening to watch. I agree lol

Check out WaveNet which is Google's DeepMind AI doing crazy synthesizing shit. Way above this Adobe demo:

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

Regginator · Nov 7, 2016

DwwwD 2 days ago
Suddenly courts are gonna be filled with recorded proof of shit you didn't say at all

My first reaction to this...

TissueBox · Nov 7, 2016

I was just thinking about something like this after playing with Vocaloid lol!! That's probably a real generalization though.

fixedpoint · Nov 7, 2016

Ray Wonder said:
I'm wondering how clear the sample audio has to be as well. If you have a bunch of phone recordings, I wonder how it would sound. Probably like the person is perpetually in a phone I'd assume.

The creative mangling potential of this software is much more interesting to me than the assumed practical implications.

Gattsu25 · Nov 7, 2016

This is going to result in a few very public controversies before the public gets used to the concept of doctored audio.

Always-honest · Nov 7, 2016

Wasn't this allready posted?

Ray Wonder · Nov 7, 2016

fixedpoint said:
The creative mangling potential of this software is much more interesting to me than the assumed practical implications.

I would love to see what would come out if I threw this in as the sample audio.

https://www.youtube.com/watch?v=p8rTlVjjYxA

Xun · Nov 7, 2016

Frighteningly impressive.

Fancolors · Nov 7, 2016

Can't wait till we get fake audio recordings announcing stuff doesn't exist.

NobleGundam · Nov 7, 2016

The cake is a lie

Ray Wonder · Nov 7, 2016

Could probably grab like 10 hours of Barack Obama samples, and cause a worldwide ruckus about an alien invasion.

Calidor said:
Check out WaveNet which is Google's DeepMind AI doing crazy synthesizing shit. Way above this Adobe demo:

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

See now this is really fuckin cool

Blade30 · Nov 7, 2016

Oh boy, the stuff you could do with this, editing some movies/tv shows with different texts is one example I'd do.

RetroDLC · Nov 7, 2016

As someone who works in game dialogue production, a lot of actors (in all elements of media) will probably start having contract stipulations that ban using this sort of technology for post-production editing. The last thing they want is for their voice to be manipulated in a way that could damage their image, and they would lose out on recording pickup takes.

Also, it will be inevitable that some people will feed audio books through this thing to create virtual readings of

erotic

fan fiction by actors related to the involved roles.

Support NeoGAF

Adobe's "photoshop for audio"

Member

Member

Member

Member

Member

Member

Member

Member

Banned

Member

Founder of the Wounded Tagless Children

Member

Member

Member

Founder of the Wounded Tagless Children

Elden Member

Member

Banned

Founder of the Wounded Tagless Children

Member

Member

Member

Banned

Member

Member

Member

Founder of the Wounded Tagless Children

Member

Member

Maturity, bitches.

Member

Member

Member

Member

Member

Founder of the Wounded Tagless Children

Member

Member

Member

Member

Member

Banned

Banned

Founder of the Wounded Tagless Children

Member

Member

Banned

Founder of the Wounded Tagless Children

Unconfirmed Member

Foundations of Burden

Similar threads