> Canonical: https://www.cliphi.com/tools/active-speaker-detection

# Active Speaker Detection

Cliphi tracks whoever is talking and keeps them in frame, so multi-person videos reframe to vertical cleanly.

## How it works

1. **Paste a link or upload your video** Drop in a supported video link or upload a file. Cliphi reads the frame and the audio.
2. **AI reframes to vertical** Cliphi finds the subject and keeps them centered, tracking the speaker and the action as they move.
3. **Post or keep editing** Get a vertical clip ready to post, or adjust the framing, add captions, and a music bed before you do.

## Why Cliphi

- **No crop keyframing** The old way to follow a moving talker is to keyframe the crop across the whole clip. Cliphi does that automatically, so you're not editing frame by frame for every clip you cut.
- **Comes with the clips** Speaker tracking isn't a separate tool you run first. It's built into the vertical clips Cliphi makes, alongside the captions and a music bed.
- **The subject stays centered** Cliphi tracks the speaker and the action and keeps them in the vertical frame, so you never lose the important part to a fixed crop.

## The frame follows the speaker

In a conversation the person talking moves, leans, gestures, and trades off with the next speaker. A fixed crop cannot keep up, so someone always ends up half out of frame. Cliphi's active speaker detection follows whoever is talking and keeps them centered, and when the speaker changes, the frame changes with them. The vertical clip stays on the right person without you touching it.

## Built for interviews, podcasts, and panels

Multi-person content is exactly where automatic cropping usually falls apart, and it is what speaker detection is for. Cliphi can cut between speakers as they talk or lay two or three people out in a grid, so an interview or panel reads clearly in vertical instead of cramming everyone into a strip.

It works across the aspect ratios you need, 9:16, 1:1, and 16:9, and handles accents and crosstalk in the \[audio it transcribes\]\(/tools/transcribe-youtube-video\). And since Cliphi is a clip tool, speaker tracking comes built into \[the clips it makes\]\(/tools/split-screen-video\), alongside \[captions\]\(/tools/add-captions-to-video\) and music, rather than being a separate tool you run first.

Without speaker tracking, the only ways to handle a moving talker are to crop wide and lose the close-up, or to keyframe the crop by hand, which is the kind of editing nobody wants to do per clip. Cliphi does it automatically for every clip it makes, so a two-person podcast or a four-person panel comes out reframed cleanly without an editor babysitting the crop.

## FAQ

### How does it know who is speaking?

Cliphi detects the active speaker from the video and audio and keeps that person in frame, switching as the conversation moves between people.

### Does it handle a four-person panel?

Yes. Cliphi follows the active speaker as the conversation moves, or lays two or three people out in a grid, so panels and interviews reframe cleanly.

### What aspect ratios can it output?

9:16 for TikTok, Reels, and Shorts, plus 1:1 for the feed and 16:9 for YouTube.

## Related

- [Auto Reframe Video](https://www.cliphi.com/tools/auto-reframe-video.md)
- [Split-Screen Video](https://www.cliphi.com/tools/split-screen-video.md)
- [Podcast Transcription](https://www.cliphi.com/tools/podcast-transcription.md)

## About Cliphi

Track speakers in your video. Paste a link or upload a file and get a tracked vertical clip.

[Get clips](https://www.cliphi.com/)
