Description:

Extended Architecture’s (EA) Process Manager relies on Redis for processing workflows. As a result of this, when Redis fails or becomes temporarily unavailable, there is a disruption in workflow processing.

The purpose of this feature is to improve EA’s resilience by introducing a failover mechanism to ensure that workflows continue to be processed in event of a Redis failure.

Business Case

This feature will improve the reliability of EA

Personas effected

Hannah James - Senior Operations Analyst
John Williams - Head of Digital Transformation/ CTO/ Strategy

User Stories

User Story 1: As a voice recording system administrator I would like call workflows to continue to be processed if Redis fails, so that the reliability of the system is unaffected by a failure of Redis.

Acceptance Criteria:

Given that EA is running, when there is a REDIS failure, then the system should write error to logs
Given that EA is running, when there is a REDIS failure, then the system should switch to Event Store as the source for processing call workflows
Given that EA is running, when there is a REDIS failure and the system switches to Event Store as the source of processing call workflows, then all call workflows should be processed without interruption.

User Story 2: As a voice recording system administrator, if call workflows are being processed by Event Store due to a Redis failure, once Redis is available, i would like the system to revert to Redis, so that the system is restored to its default status for call workflow processing

Acceptance Criteria:

Given that EA is currently processing call workflows from Event Store, when Redis becomes available, then the system should revert to using Redis to process all call workflows

Functionality

Use Case Title:	Failover to Event Store
Description (GOAL)	If there is a failure of Redis, the system should failover to Event Store
Trigger	Which ever happens sooner: Redis is unavailable for 10 seconds Or 3 successive queries to Redis are unsuccessful
Primary Actors (Personas)	System
Secondary Actors	NA
Stakeholders	Hannah James John Williams
Preconditions	EA system is running Event Store is available
Flow (Main success Scenario)	System receives notification of Redis failure System writes error in logs System failsover to Event Store System process workflows from Event Store
Alternative flows	None
Post-conditions	*Success End condition:* Workflows are processed from Event Store *Failure End condition:* Failure condition logged
Frequency	NA
Priority	Must

Use Case Title:	Revert to Redis
Description (GOAL)	Revert to Redis as the primary data source for workflow processing
Trigger	System notification that Redis is available
Primary Actors (Personas)	System
Secondary Actors	NA
Stakeholders	Hannah James John Williams
Preconditions	EA system is running Redis failure has occurred
Flow (Main success Scenario)	1. System receives notification that Redis is available 2. System switches to Redis
Alternative flows	None
Post-conditions	*Success End condition:* Workflows are processed from Redis *Failure End condition:* Error condition logged System continues processing workflows from Event Store
Frequency	NA
Priority	Must

Non Functional Requirements

Ref	Area	MoSCoW	Requirement	Comments
1	Error-handling	M	*Ease with which the system can degrade gracefully if errors occur - eg does the entire system go down and lose data if the internet goes down*	Capture error in logs
2	Legal and Regulatory		specific legal and regulatory requirements associated with the feature	NA
3	Licensing		new/amended licensing requirements associated with the feature or with introduced 3^rd party components)	NA
4	Localizability		*need to include localised features eg currency; date formats*	NA
5	Performance	M	*ability to meet specific performance standards/requirements*	Failover timeout for Redis should be 10 seconds or 3 query retries (which ever happens first)
6	Concurrency		*Specific concurrency requirements*	NA
7	Resilience	M	*ability to handle failure of an individual component within the system*	Failure of Redis should not affect the operation the system
8	Scalability		*requirements to support increasing numbers of users/concurrency without incurring significant cost*	NA
9	Security	M	*adherence to defined/specified customer/industry security standards*	All connections to Redis and Event store must remain protected
10	Storage		*Specific storage requirements/considerations*	NA
11	Supportability	S	*ease with which Support could/need to access logs etc to diagnose a problem*	Service configuration should support setting number of retries attempts and time interval for status checks
12	Test requirements		*ease with which the functionality could/should be supported by automated testing*	NA
13	Training		*specific training/installation/configuration documentation that is associated with this feature that need to be created/updated*	NA
14	User Experience		*specific user experience requirements that would ensure the functionality is acceptable to customers eg can complete action within x clicks*	NA

Simon Jolly (Unlicensed) (Technical Architect) review and signed-off
Sergey Shafiev (Unlicensed) (Team Lead) to review and sign-off (Signed of by Kirill Zotkin (Unlicensed) on behalf of Sergey)
Vikash Mahabir (Unlicensed) (QA Manager) to review and sign-off