Veritone speaker separation technology identifies, classifies, and tracks individual speakers in multi-person conversations. Speaker separation works with transcribed content. The system takes  voice data from an uploaded audio or video file and transcribes it into text. The resulting transcript is then used to create a person-by-person exchange that’s broken into timestamped paragraphs with speaker labeling. A new paragraph begins each time there is a change in speaker to help identify who said what, and exactly when. Once a file has been transcribed, you can easily edit the text, search for it by keyword, and export it in a variety of formats.

The workflow to transcribe a file with speaker separation follows the steps described below. Each of these processes is covered in more detail in other sections of our help documentation.

Upload & Transcribe

Drag-and-drop or upload files into Veritone from your local computer and transcribe the speech in your file into text.


Review and edit your speaker separation and transcript using Veritone’s built-in features and in-app text editor.


Download the transcript in a variety of formats, such as .txt or .ttml.

Get Started

Each of the workflow steps is covered in more detail throughout the Speaker Separation section of our help center. Click on a link below to learn more.

Upload and Transcribe a File

Check the Status of a Transcription Job

Viewing your Output

How to Edit Output

Revert an Edited Transcript to the Original Version

Export and Download a Transcript

Did this answer your question?