Film Strip

From Backlog to Accessible: AI Transcription for a Regional Film Archive

Protagona partnered with a nonprofit regional film archive to prove that a serverless AWS pipeline could meet production-grade transcription accuracy on decades-old archival footage — validating a path to scaled accessibility without scaling manual labor.

Industry

Nonprofit

Teams & Services

Cloud Architecture, Back-End Engineering, DevOps, Solution Design

Tech & Tools

Amazon S3, AWS Lambda, Amazon EventBridge, AWS MediaConvert, Amazon Transcribe, Amazon DynamoDB, Amazon CloudFront, Amazon Cognito, Amazon SQS, Amazon SES, AWS API Gateway, AWS Bedrock, Next.js, Terraform

Key Data Points

Transcription accuracy of 93–98% confirmed across archival footage from the 1970s and 1980s, validated against manually prepared ground-truth transcripts and deemed production-ready with light human review.
Named entity recognition accuracy of 90–99%, preserving the people, places, and organizations central to regional film history across both test films.
All five POC milestones completed on time across a focused two-week engagement, producing a prioritized production roadmap covering four distinct operational use cases.

The Vision

A nonprofit regional film archive holds decades of documentary, cultural, and historical footage representing communities across generations. For an organization where discovery and access matter as much as preservation, a film that cannot be searched or shared is only partially preserved. A growing transcription backlog had become a real constraint on that mission, making archival films harder to find, harder to share, and harder to return to the communities they represent.

The Goal

The POC had a clear mandate: prove that automated video transcription could meet production-grade accuracy standards on archival content, at a cost sustainable for a nonprofit. Success meant validated accuracy metrics, a documented cost-per-minute rate, and a concrete recommendation on whether and how to move forward with a full production build.

The Challenge

Archival video presents a harder transcription problem than modern broadcast content. Footage from the 1970s and 1980s carries period-typical audio characteristics: background noise, analog recording artifacts, and speech patterns that differ from the clean studio audio most automated speech recognition systems are optimized for. Achieving production-acceptable accuracy required careful service selection and rigorous evaluation against manually prepared ground-truth transcripts.

Beyond accuracy, the scale of the backlog added a second layer of complexity. A solution that worked on a handful of test files needed a credible path to 400TB of legacy content, ongoing daily accessions, and eventual streaming publication. The POC had to validate not just whether transcription worked, but whether the architecture could support four distinct operational use cases without locking the organization into an inflexible or costly infrastructure model.

The Solution

Protagona designed a fully serverless, event-driven pipeline that automates the journey from raw video upload to time-synchronized transcript with no manual intervention. When a file arrives in cloud storage, the pipeline triggers automatically: the file is transcoded into a streaming-ready format, an audio track is extracted for speech processing, and the transcript is post-processed into a subtitle file ready for synchronized playback or download. A configurable budget gate keeps processing costs controlled at a threshold the organization sets, a critical feature for a nonprofit managing finite operational budgets.

Architecture decisions were made with the full production vision in mind, not just POC convenience. With no fixed server infrastructure, costs scale directly with usage and there is nothing to maintain between processing runs. Storage policies automatically move content into deep archive as access patterns change, a pattern that becomes essential at 400TB scale. Optional capabilities, including AI-assisted transcript correction and bulk legacy ingest, can be activated incrementally as needs evolve. Accuracy testing against two archival films validated the approach: 1981 footage achieved approximately 98% accuracy, and a noisier 1978 film came in at approximately 94%, both confirmed production-ready with light human review.

OUTCOMES

Your data is trying to tell you something

Contact us

... are you listening?