Reducing Operational Costs for Video Hosting
Project Description
The client provides high-load video hosting services. The entire infrastructure is based on bare-metal servers in Europe and Canada, using UCDN as a backup CDN to smooth out loads in emergency situations. The content is divided into several independent blocks, each served by its own independent server group. The entire infrastructure had previously been covered by code developed by our team (Ansible, Terraform).

The main request from the client was to reduce operational expenses on infrastructure while maintaining data storage reliability and availability.

Key Metrics
  • 54.7% reduction in operational expenses
  • 72% reduction in server fleet with increased reliability
  • 83% achieved level of SSD caching
  • 3+ petabytes of monthly traffic for each server group
  • 400+ domain names associated with various projects
Project Goals
To reduce operational expenses by 30% while maintaining a 99.9% availability level
Key Challenges and Results
We conducted an efficiency analysis of content delivery from nodes, identifying performance by IOPS as a bottleneck. Further analysis showed that during peak user activity, most requests fell on a small portion of the total volume of content. We conducted an experiment and proposed a new node deployment variant:
  • Increased storage density of the disk subsystem by replacing HDDs with high-capacity solutions
  • Increased the SSD on which the OS is installed, allowing the use of free space for caching.
  • Configured a custom version of bcache - a technology that allows creating hybrid storage from slow, but high-capacity HDDs and fast SSDs.
  • Optimized file system parameters after analyzing the storage structure. This allowed for a 12% faster access to a random file on the HDD backend. As a result of modifications in the configuration of a typical server, we achieved a x20 to x30 increase in IOPS performance per node. On average, 83% of all requests were served from the fast SSD. We also conducted an audit and financial optimization of expenses for the guaranteed outgoing channel for content distribution servers. We analyzed databases and were able to develop a plan for defragmenting storage systems by removing unused content and migrating to newer, more efficient servers. The migration was completely seamless, without failures from users, with smooth switching of load for more than 400 domains.
As a result, we were able to radically reduce the number of rented servers by 72% while increasing storage reliability by transitioning from RAID-5 to RAID-6. The final reduction in operational expenses significantly exceeded the planned and amounted to 54.7% savings.

Related services
DevOPS audit
Our DevOps audit services boost your IT operations' efficiency and reliability
24/7 DevOps support
We provide DevOps support services for your business 24/7
Infrastructure monitoring services
We provide services for monitoring the infrastructure of your business
Infrastructure as a code (IaC)
We offer Infrastructure as a code (IaC) for your business