Organizing a large media library into meaningful collections is harder than it sounds. With 1000+ movies, you need something smarter than keyword matching. "Star Wars: A New Hope" and "Star Wars: The Empire Strikes Back" share a franchise, but a text search for "Star Wars" won't catch "Rogue One: A Star Wars Story" if the title varies. Movies about heists aren't all titled "heist." Thematic grouping requires understanding what movies are about, not just what they're called.
The solution is semantic embeddings—converting movie metadata into vectors that capture meaning, then clustering similar vectors together. This post documents how to run sentence embeddings locally in Node.js using Transformers.js, eliminating per-request API costs while maintaining good clustering quality. This is what we are using in the plex-collection-creator