How to Reduce Cloud Costs Without Reducing Performance

How to Reduce Cloud Costs Without Reducing Performance

Introduction: Moving Beyond the Utilization Mindset

In cloud financial operations (FinOps), the dominant narrative revolves around utilization. Idle CPU? Downsize. Low disk throughput? Cut capacity or switch disk type. Most vendors build their tooling and recommendations on this principle. But this view misses a critical element: resource health. At Serra Labs, we argue that healthy infrastructure—not just busy infrastructure—is foundational to cost-effective and performant cloud operations.

Imagine a VM whose utilization looks normal at around 50% across CPU, memory, and network. Traditional tooling might recommend downsizing it. But what if that instance at the same time is battling stolen CPU cycles, memory contention, or disk I/O bottlenecks? If this is a VM used for delivering a user-critical application, one should focus on how to improve its health.

The Multi-Tenancy Compromise Underlying Cloud

The main reason for health being poor even when utilization is low is because cloud resources are fundamentally shared. Just because a VM has a high-powered CPU, lots of memory, and a high-performance disk does not mean that physically the CPU, memory, and disk are not shared across VMs. Thus, poor health despite low utilization is not a surprise, it is to be expected.

Cloud providers oversubscribe CPU and network on the assumption that not everyone peaks simultaneously; that’s how on-demand pricing stays low. The hypervisor throttles VMs so that noise does not starve neighbors, but short bursts of contention still leak through as stolen CPU cycles. VM’s “8 vCPUs” might hop across physical cores or share an SMT sibling with a crypto-mining neighbor. IOPS-1 EBS Disks or Azure Premium Disks ride shared fabric; congestion elsewhere shows up as unexplained latency in your VM.

What Is Resource Health?

Resource health is the ability of a compute, storage, or network resource to deliver expected performance. Unlike utilization, which measures used capacity, health is reflected in delivered performance.

Key health indicators might include:

  • CPU run-queue depth

  • Disk I/O wait time

  • Memory stall cycles

  • GPU core saturation

  • Packet loss or retransmit rates

These metrics go beyond how "busy" a resource is. They tell us how well it is coping with demand—and whether it is on the verge of failure or in degradation.

The Pitfall of Inverse Utilization

Still many FinOps platforms treat health as inversely proportional to utilization—assuming high utilization equals low health and vice versa. While much more likely to be true in truly dedicated environments, this assumption breaks down in practice:

  • A disk running at 85% can be completely healthy under sequential loads.

  • A CPU at 35% might be experiencing severe context-switch thrashing.

  • A memory subsystem with low utilization could still suffer from page faults or swap pressure.

Performance bottlenecks often emerge before utilization crosses a threshold. Health reveals these issues. Utilization by itself can obscure them.

Serra Labs CPO: Balancing Cloud Cost and Performance

Serra Labs' Cloud Performance Optimizer (CPO) is designed with a simple philosophy: optimize for cost while allowing lower performance, preserving current performance, or enhancing performance, using health as a guidepost.

CPO ingests historical telemetry from across your cloud stack. It examines time-series data at a minute resolution, extracting indications about resource strain, congestion, and headroom. Then, it runs multi-objective optimization scenarios aligned with your strategic intent:

1. Economical Goal

Minimize cost while guaranteeing a user-defined minimum performance threshold.

Ideal for batch jobs or background workloads. CPO identifies the lowest-cost configuration that avoids breaching mandated minimum performance levels.

2. Balanced Goal

Reduce costs as long as performance is not expected to be degraded.

Perfect for production services where uptime and latency matter. CPO cuts costs cautiously—so that performance levels are not adversely affected.

3. Enhanced Goal

Improve performance as much as possible within a defined cost ceiling.

Use this for critical paths where responsiveness or throughput translates directly to revenue. CPO explores enhanced configurations like bigger instances or faster storage—but only within your cost guardrails.

These are not mere toggles. They represent distinct constraint models used by our optimizer, tuned for different business needs.

Why Health Must Be First-Class, Just Like Utilization

Health is not a function of utilization—it is effectively orthogonal to utilization. Here is why you cannot "fake" health awareness:

  • An under-utilized VM may still exhibit considerable CPU stolen cycles, many times driven by the choice of VM SKU being used such as the B series in Azure and T series in AWS.  Deciding to reduce the cost is precisely the wrong thing to do. By considering its poor health, one can avoid worsening performance in the name of saving cost and depending on its criticality, even spend to use superior CPUs.

  • A highly utilized disk may not show poor health in terms of high disk latency and large queue length, making it a candidate for cost cutting by using a less expensive disk or a less high-powered VM to drive disk traffic.

  • Many VMs exhibit normal health except during peak periods when utilization surges and health deteriorates.  Depending on the criticality of the VM, they can a candidate for cost cutting or performance enhancing, options that CPO provides as alternate recommendations.

Health-conscious optimization prevents false economies—situations where you save on cloud bills but pay more in support tickets, customer churn, or degraded user experience.

Real-World Impact: Flink Streaming Case Study

A media company running Apache Flink on AWS r6g.4xlarge instances was flagged by their FinOps tool for underutilization. They downsized to r6g.2xlarge, saving 38%. But during various times, including peak viewing hours, viewers noticed playback stuttering.

Serra Labs CPO analysis found the root cause: during times of peak usage, while utilization was unaffected, health in terms of CPU stolen cycles and high disk latencies caused viewing degradation. Under the enhanced goal, CPO recommended the use of a superior EC2 instance and larger GP3 disk with more IOPS and Throughput capacity to deal with unexpected surges.

It highlights how utilization by itself can lead one to exactly the wrong decision on resizing. Looking at both utilization and health like CPO does is critical to arriving at the right decision, which in this case meant reverting back to r6g.4xlarge instances with superior disks.

Redefining FinOps Success

In the race to reduce cloud spend, we must not forget why we spend in the first place: to deliver performant, reliable experiences.

Serra Labs CPO embraces the notion that cost cutting must be informed by a combination of utilization and health, not just utilization as has been the case with extant FinOps solutions.

By balancing utilization and health within business-specific constraints, organizations can:

  • Reduce waste

  • Avoid invisible slowdowns

  • Future-proof their infrastructure

Utilization-aware Health-conscious optimization is not just a better tactic—it is a necessary shift in cloud FinOps.

Do not just pursue lower bills. Pursue optimal performance per dollar.

© Serra Labs Inc. 2019-2024