โ‚ฌ500K

in annual verified savings

>30%

Kubernetes workloads onboarded

100%

of shared cloud-native stack running under automation

Industry

Travel & Hospitality

Region

EMEA

Country

France

How SNCF used PerfectScale by DoiT to cut Kubernetes waste and increase reliability at scale

Meet SNCF

SNCF is one of Europeโ€™s largest transportation groups, operating Franceโ€™s national rail network and global mobility services through brands such as TGV, OUIGO, Eurostar, TER, Transilien and Keolis. With over 270,000 employees and โ‚ฌ40B+ in annual revenue, SNCF relies on high-availability digital systems to power ticketing, timetables, onboard services, and real-time operational logistics. Kubernetes underpins many of these services across hundreds of clusters running in mission-critical environments.

As part of a company-wide push toward digital modernization and efficiency, SNCF needed to control the escalating cost of running these clusters without sacrificing resilience. Rightsizing had been attempted manually via traditional observability solutions (eg: Datadog, Prometheus etc. and FinOps workshops; however, the approach could not scale across 200+ projects and up to 250 clusters, some of which hosted more than 1,000 workloads. Over-provisioning was widespread, as engineers understandably defaulted to safety in the face of uncertainty, particularly in production environments, when enforcing optimization carried real risk of service disruption.

โ€œOur challenge wasnโ€™t just to reduce spend. We needed to reduce waste in a way that was safe for production services and sustainable at our scale.โ€ Thomas Comtet, Senior Staff Engineer, SNCF

The Challenge

The platform team was caught between Kubernetesโ€™ FinOps promises (strategic for adoption), risk tolerance from developers (reliability, uptime, etc.) and increasing financial pressure from leadership. Manual rightsizing required tribal knowledge, lengthy meetings and restarts, and every optimization effort had to be restarted each time ownership changed or engineers rotated. Datadog-based analysis often overestimated usage due to aggregation effects, leading to mistrust in the recommendations. The work was inconsistent, slow, and could never be completed across the full footprint.

SNCF needed a system that would:

  • Provide trustworthy, behavior-based rightsizing intelligence
  • Work safely in production environments
  • Reduce cost without compromising reliability
  • Operate continuously, not episodically
  • Scale across clusters and teams without friction

 

Following the Rugby World Cup and Olympic Games held in France in 2024, SNCF was mandated to refocus on cost efficiency in digital operations without slowing modernization of its core rail platforms.

The Solution

Adopting PerfectScale by DoiT as a production-grade optimization control plane
SNCF discovered PerfectScale, now โ€œPerfectScale by DoiTโ€, at KubeCon and began an engagement with the PerfectScale team. The key differentiator wasnโ€™t another dashboard, it was the ability to generate risk-aware, in-place rightsizing recommendations that could be safely applied in live production environments. โ€œWhat convinced us was that PerfectScale did not ask us to trust theory. It showed us exactly what could change without hurting stability.โ€ Thomas Comtet

Moving from recommendations to automation with ArgoCD and CR-based control
PerfectScale by DoiT integrates via Custom Resources and ArgoCD so that Autopilot can be activated at the namespace level as a feature flag. SNCF established a standard Autopilot configuration and deployed it across non-production environments automatically, while allowing fine-grained overrides in edge cases. For production, SNCF introduced automation gradually, starting with the entire cloud-native stack (Datadog, Kyverno, KEDA, AWS Load Balancer Controller, Karpenter), and validated reliability before extending to application namespaces.

Embedding optimization as governance, not a one-time project
Rather than running a finite rightsizing initiative, SNCF turned optimization into a continuous operating behavior, enforced automatically through governance with PerfectScale by DoiT. Because recommendations are grounded in observed workload behavior and enforced through automation, they are no longer debated but executed as policy.

Anticipating in-place resizing and future gains
The PerfectScale team anticipated that in-place resizing would remove the hidden cost of restart-based optimizations. Without PerfectScale, SNCF would have needed to custom-engineer this capability or train dozens of teams to apply it safely. By standardizing with PerfectScale by DoiT, SNCF avoided the engineering burden and accelerated its roadmap toward safer optimization in production.

The Results

Sustained cost savings while scaling 30% more workloads
Since adoption, SNCFโ€™s Kubernetes usage increased by roughly 30% without increasing cloud cost. In other words, without PerfectScale by DoiT, the bill would have risen significantly to support those workloads. Instead, actual cloud billing in September 2025 was lower than in January 2025, despite the higher volume. โ€œPerfectScale allowed us to grow capacity without growing cost. We effectively absorbed 30% more usage for free.โ€ Thomas Comtet

Annualized savings are estimated at ~โ‚ฌ500K per year, with the majority (~โ‚ฌ350K) coming from non-production environments via automation. In production, the choice to maintain conservative headroom policies means that automating for efficiency still delivers a net benefit to resilience, while optimizing for cost only when safe to do so.

Automation adoption across critical estate
SNCF currently activates automation on 45% of non-production namespaces, 1% of production namespaces, and 100% of the cloud-native stack in both environments. This means the most critical shared infrastructure across clusters is governed automatically. โ€œThe real impact is cultural. Engineers stopped guessing. Optimization is no longer a negotiation, itโ€™s a governed, automated behavior embedded into how teams work.โ€ Thomas Comtet

Governance and stability, not just savings
Cost reduction alone would not have been acceptable if it degraded operations. Instead, cluster stability improved as PerfectScale by DoiT redistributed resources based on demand curves and failure risk, thereby reducing the probability of CPU starvation while eliminating excess capacity. โ€œThe savings were real, but the stability gain is what got internal teams to trust it. We could optimize without fear.โ€ Thomas Comtet

What's Next?

SNCF plans to continue expanding automation across additional non-production namespaces and gradually into selected production environments for early adopters. The team is also evaluating in-place pod rightsizing to further minimize restart-based disruptions and improve workload stability.

Beyond its own adoption roadmap, SNCF has become an active contributor to PerfectScaleโ€™s product evolution, regularly sharing feature requests, many of which have already been implemented. Recent enhancements, such as Java workload support and in-place pod rightsizing, were directly influenced by SNCFโ€™s feedback and quickly adopted by their engineering teams.

This ongoing collaboration highlights not only the strength of the solution itself but also the responsiveness and partnership-driven approach of the PerfectScale team. โ€œWe proved it in production,โ€ said Thomas Comtet. โ€œNow weโ€™re scaling what works and helping make it even better.โ€

Thomas Comtet, Senior Staff Engineer, SNCF
โ€œPerfectScale by DoiT gave us what dashboards and meetings never could: a safe, automated way to reduce Kubernetes waste without risking uptime. It enables us to scale 30% more projects without increasing spend, while making optimization a built-in behavior rather than a manual effort. We now treat optimization as governance, not guesswork.โ€

Schedule a call with our team

You will receive a calendar invite to the email address provided below for a 15-minute call with one of our team members to discuss your needs.

You will be presented with date and time options on the next step