We conducted an efficiency analysis of content delivery from nodes, identifying performance by IOPS as a bottleneck. Further analysis showed that during peak user activity, most requests fell on a small portion of the total volume of content. We conducted an experiment and proposed a new node deployment variant:
- Increased storage density of the disk subsystem by replacing HDDs with high-capacity solutions
- Increased the SSD on which the OS is installed, allowing the use of free space for caching.
- Configured a custom version of bcache - a technology that allows creating hybrid storage from slow, but high-capacity HDDs and fast SSDs.
- Optimized file system parameters after analyzing the storage structure. This allowed for a 12% faster access to a random file on the HDD backend. As a result of modifications in the configuration of a typical server, we achieved a x20 to x30 increase in IOPS performance per node. On average, 83% of all requests were served from the fast SSD. We also conducted an audit and financial optimization of expenses for the guaranteed outgoing channel for content distribution servers. We analyzed databases and were able to develop a plan for defragmenting storage systems by removing unused content and migrating to newer, more efficient servers. The migration was completely seamless, without failures from users, with smooth switching of load for more than 400 domains.
As a result, we were able to radically reduce the number of rented servers by 72% while increasing storage reliability by transitioning from RAID-5 to RAID-6. The final reduction in operational expenses significantly exceeded the planned and amounted to 54.7% savings.