Removing operating cost of applications has been a long-standing goal in IT industry. Most of the times, developers spend their time on fixing the bugs in the enterprise systems which are not caught during the initial phases of the production and it costs more effort and time to maintain the system. Self Healing Service Management can put an end to this unnecessary consumption of effort and money. It enables systems or environments to detect problems and resolve them automatically. The need for human intervention can be eliminated and can totally be vested upon the systems. Self Healing is better than depending on humans for many reasons. Humans can make errors while resolving issues and there is a possibility of missing issues by mistake. Humans take more time to find bugs.
Self Healing Technologies are less likely to miss issues and make errors, while resolving. It is faster than humans as it is reacting based on rules and machine learning, and the scalability is unlimited. Most of the organizations do not use fully automated services. They are using tools which can only detect problems and notify information about them. Humans have to analyze the information given by the systems and interpret them. Issues can not be solved unless a human is able to detect them and solve them manually. This is the major hurdle in achieving fully continuous delivery. DevOps teams are already taking advantage of using different kinds of automated tools and the use of Self Healing Technologies can make the processes more efficient.
After detecting a failure, a Self Healing system can react and restore itself to the designed state. If a process is dead, corrective actions are taken by the system itself and is returned to its operational state again. Yet there is no industry-defined way of achieving this state. The only direction explored so far is automation. If more processes are automated, work hours and resources can be invested in increasing the performance, rather than spending them to manually solve issues. Self Healing Technologies can enable achieving zero downtime or almost zero downtime, creating a general solution for the issues that can possibly be arised. One of the hardest tasks faced by a developer is provisioning the servers. It is a highly error-prone process when conducted manually. DevOps teams have to configure the servers in multiple dashboards before providing the development, testing and production environments.
Programmatically deploying and configuring the needed servers by using Kubernetes to manage the Docker containers with applications enables to simplify this error-prone process. Backup docker instances of a system deployed in a Kubernetes environment can be identified as a Self Healing environment. Once the live instance does not function properly Kubernetes can turn on another instance and the system can be restored. But there are systems like API gateways whose processes are complicated and coming up with a docker instance is a complex task itself. Rather than depending on another third party system like Kubernetes, if the system can heal itself and join back in operations, it will be the best scenario possible. It will remove the redundancy of depending on a third party system (like kubernetes) and the system will be back in operation in minimum time since a new instance is not required to turn on.
Logging and monitoring can be considered as the keys to Self Healing. Logging and monitoring tools can be picked on the stage of architecture design and should be integrated with solution components. Detailed logs enable to find the roots of the issues and simplify the process of creating response manuals. If a solution for a specific issue is produced once, it can be reused in the future. Therefore, after a large log database is gathered, machine learning algorithms can be applied and the issues can be solved automatically. By using more sophisticated logging and monitoring tools, more information can be included in error reports, rather than showing whether the system is up or down. This allows to simplify the problem-solving process. When errors are detected and solved, appropriate triggers and responses can be created for each situation. This enables to define and solve problems efficiently without the need of human intervention. Setting such triggers for the roots of the issues allows to prevent the problems instead of solving wide-reaching consequences. Constant training is the last step in Self Healing Service Management. The deployed Machine Learning algorithms should be trained constantly against the logs. This process will eventually lead to have fewer errors which requires human supervision.
This enables to have stable performance and the developers can focus more on improving the system, rather than solving issues daily. With the better use of Self Healing Service Management, developers can spend less time on monitoring the systems. They can freely switch between different development environments and the Self Healing tools can be adapted according to the environment. Buffer times will be decreased as the systems are able to fix problems before leading to failure and the infrastructure utilization can be increased as the problems can be fixed faster. Eventually, human intervention can be eradicated from performance and monitoring. Making better use of Self Healing technologies is similar to developing a bullet proof system. Although the resources required to develop such a system are unaffordable to average organizations, there is hope if the industry can create platforms similar to Kubernetes and opensource them. The ultimate goal is to develop a system which can handle any load without a failure. Even though there is no such thing as perfection, with more prominent focus, developers can strive for perfection with Self Healing Technologies.
Exposition Magazine Issue 15
Department of Industrial Management
University of Kelaniya