Platform Integrity
This page describes functionality that is at risk of slipping through the net or is behaviour that has no impact on actual user experience but meets other concerns. These are in no order of priority.
- Media file chunk cleanup - (i was reappraised of the situation - the filecomposer does try to delete the chunks, but this should be removed)
- Identity Services Token Revocation.
- User is disabled
- User attribute is changed (e.g. external group membership)
- Long running call management
- Deal with calls with inconsistencies between end of call media and and callend metadata (e.g. there's one but not the other after a certain amount of time)
- Detecting dead calls but ensuring valid ones are left alone - dependent on integration behaviour for long running silent calls e.g. police radio.
- Retry mechanism for eventstore events that can't be processed and dead event stream to send events to then plug this in to health monitoring- user story https://redboxdev.visualstudio.com/Nubis/_workitems/edit/2292
- Example: Deal with files that have failed import. (Configurable Import Engine)
- Making the fields being output by the collector more type specific (with a descritor field in the message) - https://redboxdev.visualstudio.com/Nubis/_workitems/edit/2295
- Recovery for elasticsearch - dependent on the scenario being realistic - if we lost elasticsearch then legislating for recovery by rebuilding the indices from the eventstore (this may also be relevant to upgrades).
Using the token manager so testing can go via the API Gateway (plus also this would support the CLI and using the API or CLI to seed the datastores).- Identity Services: doing a UNION of roles mapped from external group membership AND explicitly defined roles for that user - this subject to "do we need it" or can we stick with either one or the other.
- Configuration Bootstrapping Service
- It appears some API microservices might be missing an OpenAPI (e.g Swagger v2) endpoint - also this needs to be surfaced via the api gateway and work done to aggregate and then be able to cache the swagger definitions for all the APIs
- Move InProgress Call Store to Redis
- Microservice resilience - what if a dependency goes down e.g Try to restore connection to the Event Store if EventStore goes down and if it fails after certain amount of time or retries stop microservice
- The UI app needs to be split into unprotected parts to handle authentication and then the rest of the content should be protected (e.g. token based authentication OR cookie based authentication derived from a token)
- Proper timer microservice implementation that fires a specified event payload and event type to a specified stream after the specified delay - this to support polling mechanisms, delayed processing, retries
- How to cope with replacing an in progress call from one collector with the duplicate being collected by another during the call e.g. CaliId1234124142 is now CallId2362612613246
- "Catch-up" of call messages after an outage - How to maintain the stream of current points in the live calls, plus have new calls not queued/backed up up whilst at the same time "catching up" on the backlog of unsent messages
- True idempotency - most of the message/event flows are not idempotent but some should be assessed for the chance of multiple occurrences of the same message/event/command (this is strongly recommended facet of a distributed system that exhibits eventual consistency but the requirement has not been enforced due to avoiding overload on the delivery process as it formulates).
- Need to think about Protect the Product esp related to licensing (e.g. preventing tampering with key microservices and events).
- Versioning of services and the underlying impact /changes e.g. api signatures, domain data, event payloads, message payloads
- ... more to follow
References to Platform Integrity Issues
, multiple selections available,
Add label