NOS: Cut K8s Costs in Half and Rebuilt Trust in Optimization

Meet NOS

NOS is one of Portugal’s leading telecom providers, serving millions with its network, internet, and entertainment services. The company made cost optimization a central part of its infrastructure strategy almost a decade ago. At the time, systems ran on virtual machines, which were stable but inefficient and expensive. The engineering team began experimenting with Docker and Rancher to reduce overhead and move faster.

That journey led to a company-wide adoption of Kubernetes, rolled out across a hybrid setup: telco-specific systems remain on-prem, while scalable workloads run on Google Cloud. The mission has always been the same: deliver agility and performance while keeping infrastructure spend under control. As the architecture matured, so did the complexity and optimization became one of the most difficult problems to solve.

The brief

Safely optimizing Kubernetes at scale quickly became a major pain point. NOS teams used observability tools like Prometheus and Grafana, but these only provided raw metrics. They lacked the insights and recommendations to make confident, evidence-backed decisions, especially when performance or SLAs were at risk.

Without clear guidance from tools, engineers were left guessing. After several failed attempts to rightsize manually, teams pulled back. Manual optimization had become a liability, eroding trust and consuming engineering hours with a high cost of mistakes. “Developers tried to make changes based on what they thought made sense, and it backfired,” said Joao Soares, Platform Engineering Lead at NOS.

The key challenges were clear. Manual and time-consuming optimization created errors because observability tools lacked actionable recommendations. Limited cost visibility meant NOS could not easily identify overprovisioning or pinpoint waste without risking performance. Fragmented data forced FinOps teams to spend two to three days each month stitching together cost reports, leaving them poorly prepared for budget reviews. And most importantly, earlier failures broke trust in optimization, making engineers hesitant to attempt changes for fear of crashes or SLA breaches.

By the time of KubeCon Europe 2024 in Paris, Soares was looking for one thing above all: a safer way to optimize Kubernetes resources without risking SLAs.

The Solution

NOS turned to PerfectScale by DoiT, a platform that enables safe, intelligent, and automated optimization. It was first deployed in development clusters, where automation quietly ran for four months. During this period, resource efficiency improved, and no incidents or complaints arose, building operational confidence.

With that success, NOS expanded its use of PerfectScale to automate platform-level components like ingress controllers, cert managers, and the observability stack. In production, SREs still apply recommendations manually, but with PerfectScale providing contextual, risk-aware evidence, trust in changes steadily grows.

“I believed in the product the first time I saw it. I still show it to everyone,” said Soares. “It was the only solution that combined smart automation with real cost savings, without putting performance at risk.”

PerfectScale stood out because it offered intelligent automation with safety controls, enabling teams to optimize confidently at scale without fear of downtime or breaches.

The Results

The impact was significant across both cost and performance. NOS reduced overprovisioning and spend by over 50% on its largest and most critical cluster, the main API cluster, while meeting performance SLOs. This cluster had been heavily overprovisioned to avoid risk, but with precise recommendations and automated rightsizing, the team could scale back safely without compromising performance.

PerfectScale also helped uncover overlooked performance issues, including out-of-memory kills and CPU throttling. With contextual recommendations, teams addressed issues directly, right-sizing resources where needed without reverting to overprovisioning.

InfraFit provided smarter node selection, allowing Cluster Autoscaler to work more efficiently and in sync with user traffic. “We’re seeing the sine wave we wanted, scaling up and down perfectly with user traffic,” said Soares. Several main node pools now operate with zero percent idle resources, a benchmark that had not been possible before.

Beyond cost and performance gains, FinOps efficiency improved dramatically. By centralizing cost data in one place, PerfectScale eliminated manual data stitching, saving two to three engineering days per month and improving budget readiness. “Now I walk into meetings knowing everything’s fine,” said Soares.

What's Next?

PerfectScale is now part of NOS’s operating model, used across teams to standardize safe optimization from development to production. Developers now use its recommendations to provision resources more appropriately from development to production. SREs rely on it for business-critical services. Platform engineers automate cluster and resource management. Financial controllers monitor budgets and track spend with greater accuracy. Management benefits from shared visibility, improved collaboration across teams, and more efficient staffing decisions.

For Soares, PerfectScale delivered not just results but renewed confidence. “I was specifically searching for a tool to help us try and reduce spend and help developers and SREs identify what applications were doing and what resources they effectively needed,” he said. “PerfectScale was the right mix of doing what I wanted, with risk and resiliency safeguards, and the pricing was right.”

NOS has rebuilt trust in optimization. What was once risky and error-prone is now safe, effective, and sustainable, enabling confident resource decisions. With PerfectScale, NOS continues to optimize safely, improve performance, and expand intelligent automation across the organization.

Joao Soares, Platform Engineering Lead at NOS

I was specifically searching for a tool to help us try and reduce spend and help specifically developers and SREs to identify what the applications were doing and what resources they effectively needed. PerfectScale was the right mix of doing what I wanted, with risk and resiliency safeguards, and the pricing was right.

Eliminate Snowflake Waste. DoiT acquires SELECT

50%

cost reduction on NOS’s largest cluster

0%

idle resources using InfraFit and rightsizing automation

2-3

full days/month of FinOps meetings saved

Industry

Region

Country

Spotlight

NOS: Cut K8s Costs in Half and Rebuilt Trust in Optimization

Meet NOS

The brief

The Solution

The Results

What's Next?

50%

cost reduction on NOS’s largest cluster

0%

idle resources using InfraFit and rightsizing automation

2-3

full days/month of FinOps meetings saved

Industry

Region

Country

Spotlight

NOS: Cut K8s Costs in Half and Rebuilt Trust in Optimization

Meet NOS

The brief

The Solution

The Results

What's Next?

Schedule a call with our team