ChatCut: Leveraging Large Language Models for Transcript-Driven Documentary Editing and Content Creation

The modern content landscape is increasingly driven by verbal communication. From documentaries and journalism to marketing videos and podcasts, spoken dialogue often forms the skeletal foundation of compelling narratives.

The Problem

Despite this centrality of spoken content, creators face persistent challenges in editing hours of verbal footage efficiently.

Current Solutions

Recent innovations have introduced transcript-based video editing tools that represent spoken content as searchable text.

Evolution

To genuinely transform this workflow, editing systems must evolve beyond mere transcript indexing toward intelligent, interactive tools that collaborate with users to construct narratives.

Research Question

How can we design an editing tool that comprehensively supports creators in rapidly identifying, selecting, and assembling soundbites into cohesive narratives—without compromising their creative agency?

Our Solution

Building upon recent advances in natural language processing and large language models (LLMs), we introduce ChatCut, a transcript-driven video editing system that allows creators to perform complex editing tasks through conversational language commands.

Conclusion

We introduced ChatCut, a documentary video editing tool that enables a novel transcript-driven editing paradigm through dual-mode AI assistance and transcript-synchronized timeline integration.