How Rapyd solved observability gaps to cut Kubernetes costs by 40%
Meet Rapyd
Rapyd is a leading fintech company that unifies global payment technologies and networks on a single platform. The company enables businesses to easily accept, process, and disburse payments across more than 100 countries and enables payouts in over 190 countries.
By integrating a wide variety of payment methods and card issuance (both virtual and physical), foreign exchange, and money management capabilities, the Rapyd platform enables fast, secure, and seamless global commerce for a foundation of innovation and scale in fintech.
The Challenge
Rapyd set out to migrate from AWS EC2 to EKS to optimize performance and efficiency—without compromising the resilience essential to fintech operations. Kubernetes was already central to Rapyd’s infrastructure, supporting more than 15 clusters and enabling the speed and agility needed to innovate at scale.
To ensure a smooth migration, the CI/CD, Infrastructure, and SRE teams initially over-provisioned resources across nodes, pods, and clusters. While this safeguarded stability, it also limited the elasticity and operational efficiency Kubernetes could offer. Existing observability tools lacked the actionable depth to support precise optimization, leaving potential efficiency gains on the table.
To fully realize their performance and efficiency goals, Rapyd needed a solution that could proactively detect and safely right-size Kubernetes resource consumption with actionable, granular insights.
The Solution
Under the leadership of DevOps Team Leader Boris Isakov, Rapyd turned to PerfectScale by DoiT for a data-driven, scalable approach to Kubernetes optimization.
Smarter Optimization, Built-In
PerfectScale was introduced early in the service lifecycle, allowing new services to baseline for a few days before initiating optimization. This workflow, combined with PerfectScale’s tools like PodFit for workload rightsizing and InfraFit for node optimization, enabled Rapyd to quickly achieve safer, data-backed cloud resource utilization.
“The initial implementation of PerfectScale to our global environments was a revelation. It exposed the vast extent of our resource waste, significant deviations from Kubernetes best practices,” said Boris Isakov. “With PerfectScale, we are now on a path to efficient scaling and optimization.”
AI-Driven Performance Gains
Beyond spend reduction, PerfectScale enhanced the performance and resilience of Rapyd’s platform. Data-driven and infrastructure-focused recommendations empowered the team to safely fine-tune resource requests and limits to improve performance while reducing spend.
“We understood that it’s not all about the cost. It’s about the performance of the cluster,” Boris shared. “The cost is going down, and performance is getting better.”
PerfectScale’s advanced alerting system proactively identified and prioritized issues before they impacted end users. Integrated with Slack, severity-based alerts highlighted critical problems, like out-of-memory errors, that were previously missed by other observability tools.
“We got alerts from PerfectScale that we did not get from our other solutions. These allowed us to address issues proactively across our entire environment – averting issues before they affected our customers.”
This proactive approach ensured business continuity in a high-stakes industry where milliseconds matter.
A Trusted Partnership
Throughout the process, the PerfectScale team delivered hands-on guidance and deep Kubernetes expertise that accelerated best-practice adoption across teams. This support was key in helping Rapyd adopt and scale best practices across teams.
“The PerfectScale team was very professional, helped explain the features, and helped guide us through the optimization process.”
The Results
Rapyd’s collaboration with PerfectScale is driving a smarter, more resilient Kubernetes strategy that balances performance, efficiency, and scalability.
Key Outcomes:
- 35–40% reduction in EKS spend
Optimization led to significant savings through data-backed resource allocation. - Workload rightsizing & node optimization
PodFit and InfraFit helped Rapyd fine-tune clusters with precision and confidence. - Improved platform performance
Adjustments based on PerfectScale’s insights enhanced platform performance and service reliability. - Advanced alerting system
Critical issues were resolved faster with severity-based Slack alerts, often before service impact.