Cloud Optimization • Search Complexity

Cost Optimization Is Easy — Until Performance Matters

Cost Optimization Is Easy — Until Performance Matters

Cost-only optimization is nearly trivial. Add performance to the mix and finding the best configuration becomes a multidimensional search across millions of candidates — and that changes everything about how it must be solved.

Cost-only optimization is nearly trivial. Add performance to the mix and finding the best configuration becomes a multidimensional search across millions of candidates — and that changes everything about how it must be solved.

Published June 2026

~

“It only becomes hard when performance has to survive the move.”

Cost optimization, on its own, is not a hard problem. If all you want is a smaller bill and you do not care what happens to the workload, the answer is nearly trivial: find the cheapest resource the workload nominally fits on, and move it there. No performance guardrail, no constraint to honor, nothing to trade off. Pick the cheapest thing that runs. Done.

The moment performance becomes a guardrail

The instant you say “reduce cost, but performance must not drop below this floor,” you stop solving one problem and start solving two at once. You are no longer minimizing a single number. You are navigating a trade-off between cost and performance — and a trade-off between two quantities is, by definition, a multidimensional problem.

And the dimensions are not abstract. A VM is not a single “size” you slide up and down. It is a vector:

  • CPU — count, generation, architecture (x86 vs ARM), clock

  • Memory — capacity and bandwidth

  • GPU — present or not, type, count, on-board memory

  • Network — bandwidth and throughput ceilings, and their cost

  • Disk — size, type, and IOPS, for one or more disks in combination

Each of these is an axis. A configuration is a point in the space they span. Change CPU and the right amount of memory changes with it. Add a second disk and the network and IOPS picture shifts. The axes interact, which means you cannot optimize them one at a time and staple the answers together.

Why this becomes millions of choices

Multiply the options out. Dozens of instance families, several sizes within each, multiple disk types and counts, network tiers, GPU options. For a single workload the candidate set already runs into the thousands. Account for multi-disk combinations and the interaction between resources and it climbs into the millions. Across a fleet it is larger still.

You cannot eyeball millions of points and pick the best one. You have to search.

Exhaustive when you can, near-optimal when you must

How you search depends on how big the space is.

When the space is small enough, you enumerate it. Exhaustive search evaluates every candidate and returns the provably optimal one — no approximation, no guesswork, because you checked everything. For a single workload with a bounded set of options, this is entirely feasible, and it is the right thing to do.

But the space grows the wrong way. Every dimension you add — another disk option, another instance family, another workload with its own interactions — multiplies the candidate set rather than adding to it. The cost of exhaustive search therefore grows exponentially, not linearly. For an arbitrarily large space, guaranteeing the true optimum demands exponentially more compute and time, quickly outrunning any practical budget. That is the formal reason exhaustive search has a hard ceiling: not that it is slow, but that its cost explodes faster than resources can keep up.

Picking the best configuration means choosing among discrete, countable options — which family, which size, how many disks of which type — while balancing several objectives at once. Formally, that makes it a multidimensional discrete optimization problem, and that class is NP-hard: there is no known algorithm that finds the guaranteed-best answer without its running time growing exponentially as the problem scales.

The discreteness is what hurts. There are no gradients to follow toward the optimum the way a smooth, continuous problem allows; the conflicting objectives produce a rugged, spiky landscape with no reliable directional clues; and the integer restrictions — whole disks, whole instances — are exactly what turn an otherwise easy continuous problem into a hard one.

Past that ceiling you move to near-optimal methods from multidimensional optimization — search strategies that return a solution provably close to optimal without evaluating every point. This is not a cop-out or a lazy shortcut. It is the correct and standard response to a search space that exceeds what brute force can cover. The skill is in choosing methods that stay near the true optimum rather than drifting away from it.

That is the distinction that matters. The question is never “heuristics good” or “heuristics bad.” It is: use exact search where the space allows it, use principled near-optimal search where it does not — and never confuse either one with a one-dimensional shortcut that ignores most of the space entirely.

What you are actually optimizing

There is no single objective function, because “optimal” depends on what the business is asking for. In practice it takes one of three forms:

  1. Lowest cost at a performance floor. Minimize spend subject to performance staying at or above a defined minimum. The classic “save money without breaking anything.”

  2. A strict improvement on one axis without sacrificing the other. Lower cost at the same performance floor, or better performance at the same spend. These are Pareto improvements — you move in one direction while giving up nothing in the other.

  3. Highest performance under a cost ceiling. Maximize performance subject to spend staying at or below a budget. The “we have this much to spend, now get me the most out of it” case.

Same search space, three different objectives. Which one applies is a business decision, not a technical one — and a real optimizer has to support all three, rather than assuming everyone wants the first.

Where peak-utilization sizing fails

Most cost tooling does none of this. It takes peak CPU and memory, finds the cheapest instance that exceeds them, and stops. That is not multidimensional search. It is a one-dimensional shortcut: it looks at two resource numbers, ignores GPU, network, and disk combinations and their interactions, scores on resource fit rather than on performance, and runs once against a snapshot.

“It is not searching the space. It is picking the first candidate a narrow filter does not happen to reject.”

That is the entire gap between cost cutting and cost-and-performance optimization.

The Serra Labs Platform

Cost Performance Optimizer

Cost Performance Optimizer (CPO) was built around this framing. The objective function carries performance health, computed from the workload itself rather than inferred from resource utilization. Peak utilization is calculated automatically rather than entered by hand. The search treats the full configuration vector — compute, memory, GPU, network, and disk combinations — not a single resource slice. It runs exhaustively where the space permits and shifts to near-optimal multidimensional methods where the space is too large to enumerate. And it supports all three objective formulations, because which one you want is a decision the operator gets to make.

That is the actual problem to solve. Anything less is a one-dimensional shortcut wearing the costume of a search.



Right Mode. Right Time. Every Workload.

The Serra Labs Platform searches the full configuration space — compute, memory, GPU, network, and disk — and applies the right optimization mode for what each workload needs. Try it on your AWS environment today.

Right Mode. Right Time. Every Workload.

The Serra Labs Platform searches the full configuration space — compute, memory, GPU, network, and disk — and applies the right optimization mode for what each workload needs. Try it on your AWS environment today.

Right Mode. Right Time. Every Workload.

The Serra Labs Platform searches the full configuration space — compute, memory, GPU, network, and disk — and applies the right optimization mode for what each workload needs. Try it on your AWS environment today.

© Serra Labs Inc. 2019-2026