IaC and DRP Development for Legacy Architecture

Project Description

A client whose key business processes were tied to databases, software modules, and other core architecture components that had lost support several years ago approached us. All servers were configured manually, and changes were made by dozens of different teams without detailed project documentation. As a result of an internal audit, it was decided to eliminate technical debt, as the failure of these components meant a complete shutdown of the client's business for 3 weeks or more. The forecasts for full recovery were negative.

The client's primary need was a complete description of each component of their architecture with the creation of documentation, the development of code for deploying all nodes, and the development of a Disaster Recovery Plan.

Key Metrics

70+ different software components, including custom developments
Less than 20% of components covered by documentation
27 code repositories, with missing source code for some critical components
3+ weeks of complete business downtime in case of failure
200+ pages of detailed project documentation
Zero cost for backup infrastructure for DRP

Project Goals

Description of all components, their troubleshooting, and interconnections in documentation. Considering all possible work scenarios, including planned future reorganization of architecture.
Ensure full coverage of the entire infrastructure with code in accordance with the principles of Infrastructure-as-Code
Develop and test DRP for the entire infrastructure. Ensure near-zero downtime in the event of a total core failure.

Key Challenges and Results

The most challenging part of this project was gathering information. In preparation for the upcoming work, we conducted a series of interviews with each employee to collect as much data as possible on the systems they had personally encountered or heard about from colleagues who had left. We processed, systematized, and described all data in a standard format. Another problem was that the software components' code, configuration files, and other data were stored in 27 different repositories. We organized a search across all accumulated data, creating an index based on the open on-premise solution Sourcegraph. The source codes were a mix of Golang, Python, bash, PHP, and numerous plain-text configuration files. Next, we began the painstaking work of separating individual components from a highly interconnected architecture. The work was carried out in several stages:

Description of a new architectural component in Ansible and Terraform
Deployment of the node in a test environment, conducting integration testing, and equivalence tests to manually configured
Full resetup of the component using IaC.

For components with compatibility issues and lacking source codes, we, in agreement with the client, rewrote them onto a fresher technological stack. Some critical components were left to operate as a "black box," ensuring backup, automatic deployment, and documentation. In the final iteration of our work, we successfully tested the deployment of the client's core architecture on two independent sites. During integration testing, we conducted successful partial and full switching of productive load to the backup. The result of our work was a developed DRP plan with simple, clear documentation and deployment of the entire infrastructure in a few requests through the use of Terraform and Ansible. We made modifications to the client's architecture, achieving the effect of graceful degradation in the event of a complete core failure. In this situation, clients experience a partial reduction in available service functionality, but overall operability is maintained. During this time, within 20 hours, backup infrastructure is deployed on backup sites. This allowed for zero costs for backup infrastructure, which is deployed from scratch only in case of a total failure.

Related services

Comprehensive Infrastructure Monitoring Services 24/7 | WiseOps Team

Enhance your IT infrastructure with our expert monitoring services, including network, cloud, server, and remote monitoring. Proactive and continuous oversight ensures maximum performance and security.

Comprehensive IT Infrastructure Services | Managed, Hybrid, Consulting, and More

Discover top-tier IT infrastructure services, including managed, hybrid, and consulting solutions. Optimize your systems with our comprehensive IT infrastructure management and support services.

Infrastructure as a Code (IaC) Solutions | WiseOps

Discover expert infrastructure as a code (IaC) services with WiseOps. Optimize your IAAC cloud deployment and management processes. Contact us for reliable IAAC infrastructure solutions.