talkies

Technology changes. And culture changes, sometimes because of tech, sometimes despite (but usually the former). Films moved from silent – actually a combination of subtitles and live music accompaniments, often organ – to “talkies” in the late 1920’s, with The Jazz Singer in 1927 being the canonical example. The tech changed both the user experience and the film industry, and rendered old models mostly obsolete while creating entirely new opportunities (eg sound design).

Fast forward to 2023 and generative AI is all the rage. DALL*E got the initial hoopla, with a sea of images generated from text prompts. Spin-offs of similar tech spawned a nascent cottage app industry and the inevitable accompanying ethical/equity issues, such as Lensa tending to create sexualized female images “just because…” Discussions within the art world quickly ramped up – should AI-created images be “allowed” to enter the art ecosystem (galleries, auctions, collections, etc)? Graphic designers and others who create visuals wondered about their future – would this become a plug-in for Photoshop or a tool that automates them out of a job?

Then ChatGPT hit the mainstream press, and soon there were myriad examples of AI-generated text; essays, news articles, scripts, song lyrics, computer code – most anything was fair game and results ranged from amusing to compelling to terrifying. While some maintained AI-generated text was not a threat and would never replace “creative” human outputs, others were not so sure. The debate continues and the technology is and will continue to advance – with or without regulation (that’s another topic that needs attention).

The most recent addition is Microsoft’s VALL-E – an algorithm that will create a text-to-speech model of anyone’s voice using just a short recording of the person talking. It wasn’t so long ago that computer generated speech was the punchline in jokes on TV and in film. More recently, the synthetic models (e.g. Siri) improved markedly, driven largely by various ML techniques. The ability to mimic a “real” human is the next logical step, and apparently we’re pretty much there.

Deep Fakes have been discussed for years but the tools to create them used to be beyond the reach of most people. That started to change recently, and the ability to create them is fairly commoditized. And while humans still think they are good at detecting them, the results are mixed at best (and mixed for machine detection as well).

So DALL*E, ChatGPT and VALL-E – the trifecta of generative AI or just the tip of the iceberg. I’m going with the latter, and we’re not ready as a society or government to deal with the implications. Consider that much of this furor has really hit over the last year, and that these capabilities, while they have been in development for years/decades, are finally hitting a time when the elements of a perfect storm are in place. Namely, powerful central and edge computing, massive networks (both technical and social), and a complex global economy and society that is vulnerable on multiple levels. The old saying is those that ignore history are doomed to repeat it. Perhaps the 21st century analogy is those who ignore generative AI are doomed to be replaced (or governed) by it.

Share your thoughts