IaC Development for Video Hosting

Project Description

The client provides high-load video hosting services. The entire infrastructure is based on bare-metal servers in Europe and Canada, utilizing UCDN as a backup CDN to smooth out loads in emergency situations. The content is divided into several independent blocks, each served by its own independent group of servers. Most nodes were configured manually and were only covered by basic monitoring.

The main request from the client was to increase manageability, accelerate the addition of new nodes, enhance monitoring transparency, and reduce the frequency of emergencies.

Key Metrics

60+ manually configured servers
3+ petabytes of monthly traffic for each server group
400+ domain names associated with various projects
97.2% availability level, insufficient for the target level of client service

Project Goals

To ensure full coverage of the entire infrastructure with code in accordance with the principles of Infrastructure-as-Code
To reduce the deployment time of a new node from 5 working days to 1 hour of working time.
To increase infrastructure availability from 97.2% to 99.9% and above.
To develop a methodology for predicting the dynamics of reserve capacity and the need for storage expansion.

Key Challenges and Results

We have done a great deal of work with full coverage of the entire infrastructure with code:

The entire server configuration is now deployed fully automatically based on the roles developed in Ansible.
Integrated monitoring based on DataDog. All necessary checks, including the availability of each of the hundreds of domains, are deployed using Ansible. The developed templates for terragrunt/terraform allowed automatic updating of all diagnostic dashboards in accordance with changes in the server composition and a single source of truth - the Ansible configuration.
Developed Terraform modules for managing the client's CDN and automatic switching to backup in case of emergency situations.

As a result, the average time to add a new server group was 45-70 minutes per node, meeting the client's requirements.

Thanks to our changes in the infrastructure, diversification by providers and geography, we managed to increase the availability of the infrastructure to the required 99.9% without increasing the specific cost of data storage and distribution.

We also conducted a large-scale study with simulation of various emergency situations, bot attacks, DDoS, and other types of problems on test nodes. As a result, we were able to create normalized dashboards that show an integral characteristic of the group's reserve resources. Our methodology allowed for precise planning of cluster expansion with minimal overspending on reserve capacities. As a result, we reduced spending on reserves by 38.4% while simultaneously increasing the overall reliability of the system.

Related services

Comprehensive IT and DevOps Audit Services | Boost Efficiency and Security

Enhance your IT operations with our comprehensive audit services, including it audit, it security audit, and devops audit. Ensure compliance, improve performance, and protect your data with our expert solutions.

24/7 DevOps Support Services | Expert DevOps Support Team | WiseOps

Discover top-tier DevOps support services with WiseOps. Our expert DevOps support team provides continuous integration, rapid deployment, and proactive monitoring to ensure seamless IT operations.

Comprehensive Infrastructure Monitoring Services 24/7 | WiseOps Team

Enhance your IT infrastructure with our expert monitoring services, including network, cloud, server, and remote monitoring. Proactive and continuous oversight ensures maximum performance and security.

Comprehensive IT Infrastructure Services | Managed, Hybrid, Consulting, and More

Discover top-tier IT infrastructure services, including managed, hybrid, and consulting solutions. Optimize your systems with our comprehensive IT infrastructure management and support services.

Infrastructure as a Code (IaC) Solutions | WiseOps

Discover expert infrastructure as a code (IaC) services with WiseOps. Optimize your IAAC cloud deployment and management processes. Contact us for reliable IAAC infrastructure solutions.