Thoughts on Personalized Podcasts
Podcasts have exploded in popularity (About half of Americans have listened to a podcast in the past year). What started as a niche medium has become a platform for everyone from celebrities to politicians to business leaders. We're even seeing podcasts dig into deep technical content, including scientific papers. Recently, we have seen tools such as NotebookLM and Eleven Labs Reader that let you make podcast episodes out of pieces of text, using AI.
Despite this innovation, the core experience of listening to a podcast episode, regardless of how it was generated, remains static. Once the hosts finish recording, editing, and publishing, you get the same show as every other listener. The episode does not adapt to your interests. For example, if you were a long time listener of a podcast, say ATP FM, over time, you get to know the hosts and their personalities. After a certain point, some of the content in a weekly episode might be repetitive to you. The host might be sharing their opinions about a new product that just came out or discussing some news. But because they are catering to new listeners, they understandably have to preface the opinion with some amount of context. With podcasts today, you have two choices – either you listen to it as a whole, or skip the entire chapter (assuming the hosts have enabled chapters) about the product, neither of which is desirable.
Now, imagine an advanced AI system that reconfigures each podcast on the fly — sculpting, editing, and remixing the actual audio in real time to suit your preferences. This goes beyond asking your voice assistant to play/pause/skip chapter, the podcast itself becomes an interactive experience, shaped by your prompts. This post is my thoughts of how this deep integration could work, and what benefits it could provide.
Current Use of AI in Podcasts
Two big AI use-cases have gained attention so far:
- Automatic Generation: Tools like NotebookLM generate a fresh podcast-style dialogue from a document provided. Simon Willison has a few examples in this post.
- Transcript Services: Apple Podcasts recently introduced interactive transcripts, letting you scroll through text and tap any sentence to jump right to that point in the audio. To be fair, this is a great feature. If you remember a specific moment in a show, you can skim the transcript and tap to play from there rather than manually hunting with the scrub bar. The downside is that it still requires you to look at your phone and interact with it.
An AI-integrated Podcast Experience
Imagine a podcast player that is always listening for a wake word/phrase, similar to how your phone listens for "(Hey) Siri". In this deeply integrated system, the AI could remix the content itself on your command in a couple of different ways. For now, I'm assuming the hosts give consent for this player to modify their content in a few limited ways. This could allow,
- Semantically Skipping and Compressing Segments: When a host repeats information that you already know because you're a long time listener, the AI could seamlessly edit that section out, stitching the audio back together (using its own voice) so it sounds natural.
- On-the-Fly Summaries: I imagine most people do not consume podcasts that have long episodes (such as Acquired by Ben Gilbert and David Rosenthal, with episodes that are a whopping 5 hours long) in one session. When you come back into the episode and you're 3 hours in, you may have forgotten what happened until this point. AI could generate a summary when prompted, letting you jump back into the episode right where you left off. Moreover, if in the next bit of the podcast it requires recalling information that was mentioned during the first hour, the AI would know this and could make sure to include this bit into the summary. This would not be possible if one were to manually rewind the podcast by say 30 minutes and start listening. I can think of a few genres where this could apply, such as podcasts covering elaborate historical events (The Rest is History often does multi-part episodes).
- Cut to the Chase: The flip side to on-the-fly summaries could be when you have only, say 10 minutes to listen, but there's 45 minutes left in the podcast. Maybe you got bored of this particular story, or maybe the podcast has a bit too much filler and you want to know the ending to the story. This podcast player could gather the important points from the remaining episode(s), and piece together a clip that concludes in the desired amount of time.
Concerns
The obvious one is that the creators may not be okay with AI rearranging or splicing their content in ways they did not intend. Even if they do give consent to this tool, I feel like the AI-generated bits would probably need to be marked clearly, by using a distinctive voice instead of mimicking the hosts' voices, or by announcing before/after there is an AI-generated clip.
Assuming this is carefully implemented, I do believe it could add value to the listening experience by providing absolute personalization to the user, without taking away the creative voice of the host(s).