Case Study

Solidus Labs reduces Kubernetes resilience issues by 90%

Client
Solidus Labs
Industry

Financial Services & Insurance

Region
North America
Country
USA
Features
Kubernetes
Technologies
Amazon Elastic Kubernetes Services

90%

Reduction in issues raised

Meet Solidus Labs

Solidus Labs aims to enable safer crypto trading throughout the investment journey across all centralized and DeFi markets. As the founder of industry-leading initiatives, Solidus is deeply committed to ushering in tomorrow’s financial markets.

To support the rapid growth in the crypto markets and meet the ever-increasing demand from its clients, Solidus leverages Amazon Elastic Kubernetes Service as the foundation of its application infrastructure. To ensure its environment will scale as the company grows, Solidus utilizes the expertise and services of Develeap.

Develeap, one of the largest DevOps consultancies in Israel, was responsible for building the initial architecture and providing ongoing support and maintenance of the Solidus environment. This included setting up monitoring, observability, and alerting, as well as optimizing the environment’s costs.

Partnering with Develeap has allowed Solidus to scale its environment to a dozen multi-regional clusters, extending its services to clients worldwide.

The Challenge

Despite its robust DevOps foundation, Solidus Labs faced a recurring challenge: the inability to right-size pod resources. While tools like KEDA managed horizontal pod autoscaling, Solidus still encountered frequent CPU throttling and out-of-memory (OOM) issues that disrupted performance.

Its infrastructure had to adapt to constant change as releases happened hourly. Some clients sent massive data batches, while others required real-time processing, making predicting and meeting performance demands difficult.

Solidus Labs, in partnership with Develeap, spent countless hours fine-tuning resources manually. While this temporarily stabilized the environment, changes were short-lived, and unnecessary resource waste increased across smaller clusters.

The Solution

Sustaining resilience during rapid growth
Solidus Labs had already implemented several capabilities to keep its Kubernetes environment running smoothly and efficiently. But it wasn’t until it introduced PerfectScale by DoiT that it could comprehensively right-size its pod resources and address the root cause of recurring CPU throttling and out-of-memory (OOM) issues.

PerfectScale by DoiT helped Solidus navigate an infrastructure that’s in a constant state of change. “R&D is releasing changes on an hourly basis due to the nature of our business,” Ben Hoffman, R&D Director at Solidus Labs. “Some of our clients send data in large batches, while others use us as a real-time service, making it hard to predict the load fluctuations on our services.”

By automating resource recommendations and scaling decisions, PerfectScale enabled the team to shift away from reactive, manual interventions. Previously, they spent hours stabilizing their largest cluster and replicating configurations across others—only to see results quickly degrade. With PerfectScale, that effort became unnecessary, and resource waste across smaller clusters was eliminated.

“I would jump into a Grafana and pull in metrics from Prometheus and logs from Logz.io, and make adjustments to the requests based on the different peaks of our environment,” Shemtov Fisher, DevOps Engineer at Solidus Labs/Develeap. “Then a few weeks would pass, and we’d start seeing throttling and memory issues resurface, leading to a second round of adjustments. When I jumped in a third time, I knew we needed a solution in place to help automate this process. PerfectScale by DoiT is the exact solution we needed to fill this gap.”

Improving Kubernetes stability by reducing CPU throttling and OOM issues by 90%

Shortly after implementing PerfectScale by DoiT, Solidus could proactively “right-scale” its pod resources, significantly reducing CPU throttling and OOM issues.

“We went from multiple issues a day, to maybe one or two issues in the last month,” said Hoffman. “With PerfectScale, we have seen over a 90% reduction helping us ensure our applications have the capacity to meet our customer demand.”

Additionally, PerfectScale has drastically reduced the mean-time-to-resolution (MTTR) for capacity-related issues.

“Before PerfectScale, the DevOps team would get an alert when an issue occurred, then we would triage the issue to the proper service owner to resolve,” Barak Arzuan, DevOps Engineer Solidus Labs/Develeap. “Depending on the criticality, it could take hours or even more for the service owners to evaluate the issue and provide us with the proper resource requirements. With PerfectScale, we can immediately provide the service providers with evidence on why the issue is happening along with precise recommendations on how to resolve it. This has helped a lot with our day-to-day operations.”

No more continuous manual work for system health and cost-efficiency.

Adding additional capacity to improve system resilience and stability comes with a price. To mitigate any extra costs, the team leveraged PerfectScale’s cost-optimization capabilities to move unused resources to areas that needed additional capacity.

“In some of our clusters, we found significant cost savings opportunities,” explained Arzuan. “We were able to reinvest these savings into our clusters that were lacking resources. This resulted in a fully stable, resilient, and cost-effective environment with no impacts on our budget.”

“We have a large number of clients, each using our application slightly differently. Keeping our Kubernetes environment optimized is essential for Solidus Labs to ensure our applications have the resources they need to support our customers today, and as our company continues to grow in the future,” said Hoffman. “PerfectScale is removing time-consuming manual tasks we have faced in the past, making it easy to continuously maintain our system’s health and cost-effectiveness.”

The Results

With PerfectScale, Solidus Labs transformed its Kubernetes environment from reactive troubleshooting to proactive optimization. By intelligently rightsizing resources, Solidus eliminated the majority of performance bottlenecks and reclaimed valuable engineering time.

  • 90% reduction in SLA-impacting issues, virtually eliminating CPU throttling and OOM errors
  • A significant drop in MTTR for capacity-related incidents, allowing engineers to resolve issues faster with actionable insights
  • Efficient reallocation of resources, with cost savings from overprovisioned clusters being reinvested into resource-constrained ones
  • Confident scalability, as Solidus is now equipped to support real-time and batch clients alike, even as demand grows

PerfectScale by DoiT provided the stability and flexibility Solidus Labs needed to scale confidently without sacrificing cost efficiency or developer velocity.

Ben Hoffman, R&D Director at Solidus Labs
“PerfectScale by DoiT is removing time-consuming manual tasks we have faced in the past, making it easy to continuously maintain our system’s health and cost-effectiveness.”

Learn more about how DoiT can help you

Latest case studies

Schedule a call with our team

You will receive a calendar invite to the email address provided below for a 15-minute call with one of our team members to discuss your needs.

You will be presented with date and time options on the next step