Functional Requirements - Transcription
Feature description
Transcription is a new feature that converts voice audio conversations into electronic text transcripts (providing a structured data set of the captured voice audio conversation).
- Transcription is performed automatically when the voice conversation is ended and transcripts of the conversation will be made available via API in JSON format.
- Transcription will provided in multiple languages (72 different languages and transcripts; depending on the set-up within the configuration).
Business Case summary
Transcription feature is essential for data analysis and management (with an accurate transcription of audio conversations data becomes easily searchable for auditing and compliance purposes). We need a 3rd party vendor who offers transcription services that can integrate with our voice recording systems.
- Speechmatics- 3rd Party transcription vendor
Personas/Market segment
Applicable to all Market segments of Red Box.
SYSTEM Prerequisites (Functional & Non-functional requirement)
- Transcriptions MUST not be lost or erased if a recoverable error is encountered.
- System MUST not fail or return error for silent calls (e.g instances where no audio is recorded). In such instances a blank transcript will be generated by the transcription engine
- System MUST provide audit logs for failed transcriptions.
- Transcription MUST be enabled or disabled by administrator
- Transcription service MUST support multiple Speechmatics instances
- Max number of concurrent jobs MUST be configurable per Speechmatics server
- System MUST send the split audio conversation related to each participant separately to Speechmatics service.
- System MUST have Diarisation capabilities and supported by Speechmatics (Diarisation can be enabled within the configuration setup and If enabled, the system will perform the Diarisation process for each call).
- Diarisation MUST be performed based on the configuration within the transcription service irrespective of whether the audio stream (mono or stereo).
- The output of the transcription service should be multiple transcriptions based on the language’s setup within the configuration of the transcription feature.
- The transcription feature SHOULD support 72 different languages and scripts.
- The agent MUST be able to export the transcriptions via Email and HTTP export.
- Transcription of the call MUST be available via the metadata API
Users who have access to a call recording will also have access to the transcript of that call
- System Must support Non-ASCII characters
Use Case title- Transcribe a call (TAC)
Description - A voice audio conversation between two participants has ended and then captured/recorded by the Reb box system in a call centre. The transcription feature should provide transcript of the call.
Actors
1st participant (Call Agent and transcription user): Any individual speaking to a customer and access to transcription service.
2nd participant (Customer): Any individual speaking to a business.
System: Trigger transcription after audio conversation is ended.
Administrator: Any individual with permission transcription configuration
Expected Behaviour
return a blaservice should return a black transcription engine does not f the transcription engine does not spot any words, it will return a blank transcription - in that case the system should not fail or return any errors
If no audio is recorded
System MUST not fail or return error for silent calls(Instances where no audio is recorded). In such instances a blank transcript MUST be generated
if the transcription engine does not spot any words, it will return a blank transcription - in that case the system should not fail or return any errors
Glossary
Diarisation is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity. It enhances the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity. It is used to answer the question "who spoke when?.
Speechmatics- 3rd Party transcription vendor
- Simon Jolly to review initial requirements
- / Henry(PO) to review and sign-off on requirements.
- QA (Simon Parr) to review and sign-off on requirements.
- Team Lead (Jo) to review and sign-off on requirements
Signed off during review session 27/03 by Simon Jolly (Unlicensed) henry (Unlicensed) Simon Parr (Unlicensed) Jodie Brunson (Unlicensed)