The Traditional AMV Workflow
Every AMV editor knows this process intimately:
- Source your anime. Rip episodes, download clips, or use screen recordings. For a multi-series AMV, you might have 50–100 source episodes — that's 20–40 hours of raw footage.
- Catalog your clips. Scrub through every episode, mentally noting (or manually tagging) the key moments: fight scenes, emotional close-ups, transformation sequences, landscape pans.
- Choose your track. The song defines the AMV's entire structure. Import it into Premiere or Vegas Pro.
- Place beat markers. Listen to the full track, pressing
Mon every beat, accent, and section change. For a 4-minute track, that's 400+ markers. - Match clips to beats. This is the creative core — but also the most time-consuming step. For each beat, you find the right clip, trim it to the right in/out point, and nudge it to land precisely on the marker.
- Add internal sync. The best AMVs sync movement within the frame to the music — a sword swing on a snare, an explosion on a drop. This requires frame-by-frame scrubbing.
A competition-grade AMV takes 40–100 hours of editing for 3–4 minutes of output. Even a quick "fun" AMV takes 8–15 hours.
Why AMV Editing Is Uniquely Painful
AMV editing has two problems that other video editing doesn't:
Problem 1: Source volume. Unlike wedding or gym footage where you shot 1–3 hours, AMV creators work from 20+ hours of anime across multiple series. Finding "the right 3-second clip" means searching a haystack the size of a full season.
Problem 2: Semantic matching. A good AMV doesn't just cut to the beat — it matches the meaning of the visual to the energy of the music. A melancholic verse needs a character looking out a window, not an explosion. This semantic matching is what separates great AMVs from random clip compilations.
The Shortcut: AI-Powered Clip Selection + Beat Sync
Onset Engine attacks both problems simultaneously:
- CLIP semantic analysis: During ingest, every 3-second clip gets a 768-dimensional vector that encodes what's visually happening. The AI knows the difference between "character crying in rain" and "character punching through a wall" — semantically, not just by motion.
- Driver-based tier mapping: Write CLIP text queries in your driver JSON:
"a character looking at the sky, melancholic"for quiet sections,"explosive battle scene, sword clash"for drops. The AI matches clips to musical energy by meaning. - Subject tags: Tag 5 clips as "Goku" and the engine finds every other Goku clip across all ingested episodes via cosine similarity. Use
@Gokuin your driver to force specific characters into specific tiers. - Beat-perfect timing: librosa maps every beat, accent, and energy curve. Cuts land within ±200ms of musical onsets — automatically.
Ingest a full season of anime. Drop your track. Write a 5-tier driver with CLIP queries describing the visual mood for each energy level. Hit render. A semantically-matched, beat-synced AMV rough cut in under 2 minutes.
The AMV Workflow in Onset Engine
- Ingest your anime library — pointer mode indexes episodes without copying files. A full season processes in minutes.
- Load your track — the engine maps every beat and energy contour.
- Write your driver — 5 tiers from calm to explosive, each with CLIP text queries describing the visuals you want.
- Generate + curate — preview the AI's selection. Lock the clips you love. Re-roll the rest with a different random seed.
- Export — render directly, or export
.otioto your NLE for color grading and final internal sync adjustments.
The AI handles the 6-hour clip search. You handle the creative direction. The result is a rough cut that's 80% of the way there in 2 minutes, not 40 hours.