/
Functional Requirements - Transcription

Functional Requirements - Transcription

Feature description 

Transcription is a new feature that converts voice audio conversations into electronic text transcripts (providing a structured data set of the captured voice audio conversation).

  • Transcription is performed automatically when the voice conversation is ended and transcripts of the conversation will be made available via API in JSON format.
  • Transcription will provided in multiple languages (72 different languages and transcripts; depending on the set-up within the configuration).


Business Case summary

Transcription feature is essential for data analysis and management (with an accurate transcription of audio conversations data becomes easily searchable for auditing and compliance purposes). We need a 3rd party vendor who offers transcription services that can integrate with our voice recording systems.

  • Speechmatics- 3rd Party transcription vendor


Personas/Market segment

Applicable to all Market segments of Red Box.



SYSTEM Prerequisites (Functional & Non-functional requirement)

  • Transcriptions MUST not be lost or erased if a recoverable error is encountered.
  • System MUST not fail or return error for silent calls (e.g instances where no audio is recorded). In such instances a blank transcript will be generated by the transcription engine
  • System MUST provide audit logs for failed transcriptions.
  • Transcription MUST be enabled or disabled by administrator
  • Transcription service MUST support multiple Speechmatics instances
  • Max number of concurrent jobs MUST be configurable per Speechmatics server
  • System MUST send the split audio conversation related to each participant separately to Speechmatics service.
  • System MUST have Diarisation capabilities and supported by Speechmatics (Diarisation can be enabled within the configuration setup and If enabled, the system will perform the Diarisation process for each call).
  • Diarisation MUST be performed based on the configuration within the transcription service irrespective of whether the audio stream (mono or stereo).
  • The output of the transcription service should be multiple transcriptions based on the language’s setup within the configuration of the transcription feature.
  • The transcription feature SHOULD support 72 different languages and scripts.
  • The agent MUST be able to export the transcriptions via Email and HTTP export.
  • Transcription of the call MUST be available via the metadata API
  • Users who have access to a call recording will also have access to the transcript of that call

  • System Must support Non-ASCII characters  

Use Case title- Transcribe a call (TAC)

Description - A voice audio conversation between two participants has ended and then captured/recorded by the Reb box system in a call centre. The transcription feature should provide transcript of the call.

Actors

1st participant (Call Agent and transcription user): Any individual speaking to a customer and access to transcription service.

2nd participant (Customer): Any individual speaking to a business.

System:  Trigger transcription after audio conversation is ended.

Administrator: Any individual with permission transcription configuration


Expected Behaviour 

  return a blaservice should return a black transcription engine does not  f the transcription engine does not spot any words, it will return a blank transcription - in that case the system should not fail or return any errors

If no audio is recorded 

System MUST not fail or return error for silent calls(Instances where no audio is recorded). In such instances a blank transcript MUST be generated

if the transcription engine does not spot any words, it will return a blank transcription - in that case the system should not fail or return any errors











Glossary

Diarisation is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity. It enhances the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity. It is used to answer the question "who spoke when?.

Speechmatics- 3rd Party transcription vendor


  • Simon Jolly to review initial requirements
  •  (PO) to review and sign-off on requirements.
  • QA (Simon Parr) to review and sign-off on requirements.
  • Team Lead (Jo) to review and sign-off on requirements


Signed off during review session 27/03 by Simon Jolly (Unlicensed) henry (Unlicensed) Simon Parr (Unlicensed) Jodie Brunson (Unlicensed)



Add label