NOS: Cut K8s Costs in Half and Rebuilt Trust in Optimization

Meet NOS
NOS is one of Portugal’s leading telecom providers, serving millions with its network, internet, and entertainment services. The company made cost optimization a central part of its infrastructure strategy almost a decade ago. At the time, systems ran on virtual machines, which were stable but inefficient and expensive. The engineering team began experimenting with Docker and Rancher to reduce overhead and move faster.
That journey led to a company-wide adoption of Kubernetes, rolled out across a hybrid setup: telco-specific systems remain on-prem, while scalable workloads run on Google Cloud. The mission has always been the same: deliver agility and performance while keeping infrastructure spend under control. As the architecture matured, so did the complexity and optimization became one of the most difficult problems to solve.
The brief
Optimizing Kubernetes at scale quickly became a major pain point. NOS teams used observability tools like Prometheus and Grafana, but these only provided raw metrics. They lacked the insights and recommendations to make safe, confident decisions, especially when performance or SLAs were at risk.
Without clear guidance from tools, engineers were left guessing. After several failed attempts to rightsize manually, teams pulled back. Manual optimization had become a liability, and the cost of mistakes was high. “Developers tried to make changes based on what they thought made sense, and it backfired,” said Joao Soares, Platform Engineering Lead at NOS.
The key challenges were clear. Manual and time-consuming optimization created errors because observability tools lacked actionable recommendations. Limited cost visibility meant NOS could not easily identify overprovisioning or pinpoint waste without risking performance. Fragmented data forced FinOps teams to spend two to three days each month stitching together cost reports, leaving them poorly prepared for budget reviews. And most importantly, earlier failures broke trust in optimization, making engineers hesitant to attempt changes for fear of crashes or SLA breaches.
By the time of KubeCon Europe 2024 in Paris, Soares was looking for one thing above all: a safer way to cut Kubernetes costs.
The Solution
NOS turned to PerfectScale by DoiT, a platform that enables safe and intelligent optimization. It was first deployed in development clusters, where automation quietly ran for four months. During this period, costs consistently decreased, and no single issue or complaint arose.
With that success, NOS expanded its use of PerfectScale to automate platform-level components like ingress controllers, cert managers, and the observability stack. In production, SREs still apply recommendations manually, but with PerfectScale providing contextual evidence, trust in those changes steadily grows.
“I believed in the product the first time I saw it. I still show it to everyone,” said Soares. “It was the only solution that combined smart automation with real cost savings, without putting performance at risk.”
PerfectScale stood out because it offered exactly what NOS needed: intelligent automation combined with safety controls, enabling teams to optimize without fear of downtime or breaches.
The Results
The impact was significant across both cost and performance. NOS reduced costs by over 50% on its largest and most critical cluster, the main API cluster. This cluster had been heavily overprovisioned to avoid risk, but with precise recommendations and automated rightsizing, the team could scale back safely without compromising performance.
PerfectScale also helped uncover overlooked performance issues, including out-of-memory kills and CPU throttling. With clear recommendations, teams could address these issues directly, increasing resources where necessary without reverting to overprovisioning.
InfraFit provided smarter node selection, allowing Cluster Autoscaler to work more efficiently and in sync with user traffic. “We’re seeing the sine wave we wanted, scaling up and down perfectly with user traffic,” said Soares. Several main node pools now operate with zero percent idle resources, a benchmark that had not been possible before.
Beyond cost and performance gains, FinOps efficiency improved dramatically. By centralizing cost data in one place, PerfectScale eliminated the need for manual data stitching, saving two to three full days each month. “Now I walk into meetings knowing everything’s fine,” said Soares.
What's Next?
PerfectScale has moved beyond a platform engineering initiative to become part of NOS’s company-wide operations. Developers now use its recommendations to provision resources more appropriately from development to production. SREs rely on it for business-critical services. Platform engineers automate cluster and resource management. Financial controllers monitor budgets and track spend with greater accuracy. Management benefits from shared visibility, improved collaboration across teams, and more efficient staffing decisions.
For Soares, PerfectScale delivered not just results but renewed confidence. “I was specifically searching for a tool to help us try and reduce spend and help developers and SREs identify what applications were doing and what resources they effectively needed,” he said. “PerfectScale was the right mix of doing what I wanted, with risk and resiliency safeguards, and the pricing was right.”
NOS has rebuilt trust in optimization. What was once risky and error-prone is now safe, effective, and sustainable. With PerfectScale, NOS continues to cut costs, improve performance, and lay the foundation for even broader adoption of intelligent automation across the organization.