Platform Integrity

Platform Integrity

This page describes functionality that is at risk of slipping through the net or is behaviour that has no impact on actual user experience but meets other concerns. These are in no order of priority.

  1. Media file chunk cleanup - (i was reappraised of the situation - the filecomposer does try to delete the chunks, but this should be removed)
  2. Identity Services Token Revocation.
    1. User is disabled
    2. User attribute is changed (e.g. external group membership)
  3. Long running call management 
  4. Deal with calls with inconsistencies between end of call media and and callend metadata (e.g. there's one but not the other after a certain amount of time)
  5. Detecting dead calls but ensuring valid ones are left alone - dependent on integration behaviour for long running silent calls e.g. police radio.
  6. Retry mechanism for eventstore events that can't be processed  and dead event stream to send events to then plug this in to health monitoring- user story https://redboxdev.visualstudio.com/Nubis/_workitems/edit/2292
    1. Example: Deal with files that have failed import. (Configurable Import Engine)
  7. Making the fields being output by the collector more type specific (with a descritor field in the message) -  https://redboxdev.visualstudio.com/Nubis/_workitems/edit/2295
  8. Recovery for elasticsearch - dependent on the scenario being realistic - if we lost elasticsearch then legislating for recovery by rebuilding the indices from the eventstore (this may also be relevant to upgrades).
  9. Using the token manager so testing can go via the API Gateway (plus also this would support the CLI and using the API or CLI to seed the datastores).
  10. Identity Services: doing a UNION of roles mapped from external group membership AND explicitly defined roles for that user - this subject to  "do we need it" or can we stick with either one or the other.
  11. Configuration Bootstrapping Service
  12. It appears some API microservices might be missing an OpenAPI (e.g Swagger v2) endpoint  - also this needs to be surfaced via the api gateway and work done to aggregate and then be able to cache the swagger definitions for all the APIs
  13. Move InProgress Call Store to Redis
  14. Microservice resilience - what if a dependency goes down e.g Try to restore connection to the Event Store if EventStore goes down and if it fails after certain amount of time or retries stop microservice 
  15. The UI app needs to be split into unprotected parts to handle authentication and then the rest of the content should be protected (e.g. token based authentication OR cookie based authentication derived from a token)
  16. Proper timer microservice implementation that fires a specified event payload and event type to a specified stream after the specified delay - this to support polling mechanisms, delayed processing, retries
  17. How to cope with replacing an in progress call from one collector with the duplicate being collected by another during the call e.g. CaliId1234124142 is now CallId2362612613246
  18. "Catch-up" of call messages after an outage - How to maintain the stream of current points in the live calls, plus have new calls not queued/backed up up whilst at the same time "catching up"  on the backlog of unsent messages
  19. True idempotency - most of the message/event flows are not idempotent but some should be assessed for the chance of multiple occurrences of the same message/event/command (this is strongly recommended facet of a distributed system that exhibits eventual consistency but the requirement has not been enforced due to avoiding overload on the delivery process as it formulates).
  20. Need to think about Protect the Product esp related to licensing (e.g. preventing tampering with key microservices and events).
  21. Versioning of services and the underlying impact /changes e.g. api signatures, domain data, event payloads, message payloads
  22. ... more to follow


References to Platform Integrity Issues

Add label