Blog

5 Key Ingredients for a Successful Cloud Journey

how-to-use-the-cloud-successfully

I spent the majority of my career in startup companies where you are often running on a thin budget, understaffed, and have a deadline looming. It’s very easy to succumb to the pressure to cut corners in order to deliver something today, but the price is that in a couple of years, you may end up with such a technological mess that it will make you hate what you do and burn you out.

Unfortunately I speak from experience here. I once did a job of three people and after three years of such pace, I got to the point where I cared little about anything and just wanted to quietly stare at the ceiling the whole day.

Don’t worry, that was a while ago, and I have fully recovered since. Moreover, with diligence I managed to avoid the same thing happening to me again in my next career iterations. Overtime, I distilled several key ingredients to keep my drive to innovate revving high in the long run.

Now, I’m fortunate to have a role where I can share my experience and help others bring their dreams into action.

This is my story and my experience. You surely have your own and I’m keen to hear them.

The dream team

Imagine a software development team of 5–10 people that

  • Keeps releasing new features in a timely manner
  • Knows about downtimes before your customers do
  • Bugs are fixed quickly — by “fixed” I mean fixed in production, not in your git main branch
  • Has the capacity to research and branch into new technologies

And manages to do this for years without running out of steam!

Sounds too good to be true? — Let’s find out. Here is how I would start a fresh project today. It will be a cloud-based project, GCP in my case.

1. Know your premises

Security is often overlooked. Developers consider it hard and study it just enough to get past minimal cloud defaults to start building stuff. After all, your customers do not buy security — they buy features; and in their eyes, the security of the product is a “given”.

Unfortunately, such an approach quickly leads to bad habits — “security keys in public git,” anyone? : )

Key points:

  • Maintain a security baseline within the team with regards to the technology you use. For example, if starting on a new cloud, make sure that everyone (and not just DevOps folk), know the basics. With GCP, for example, an average developer can be accustomed to the GCP security model in several hours.
  • Have a go-to person to mentor about best security practices. If you don’t have such a person within your team, Google, in our example, has partners who are paid to make sure you always have a sounding board on the subject.

Remember the clouds’ shared responsibility model — they provide secure building blocks, but it’s up to you to compose them in a secure way.

Once you understand your security premises, you can start architecting.

2. Architect but be real

We all dream of our companies becoming the next Twitter, Uber, or Google. But those didn’t become global-scale companies from the get-go.

The challenge is to start small but always have space to grow. To achieve that, I suggest having a small task force within your team who is in charge of making sure the team doesn’t get too succumbed to tactical thinking — which happens naturally when you go into “churning out features” cadence. This task force will be on guard that the right trade-offs are made, for example:

  • I once developed a batch-job manager for the ElasticSearch database. It would’ve taken x10 effort to implement it in an active/active, scale-out manner, on the other hand a singleton version of it could hold x100 load growth; and it’s fine if it goes offline for 10–15 minutes due to a bug/maintenance issue.
    In this case, we consciously chose to have a single point of failure component in our setup because it was good enough for our business needs.
  • As a strategy, you may want to develop a product that is cloud-agnostic. You can choose Kubernetes (K8s) as your application platform, and that alone achieves ~80% of your goal. However you still interface to extra-K8s services of your cloud providers, e.g. Load Balancers, Message Queues etc.; and will be up to your architecture task force to make sure that no provider-unique service is chosen in the heat of a dead-line. And when it does happen then it’s a conscious well documented choice.

Your more senior tech leads will naturally assume this role, but it’s best to facilitate it more formally and make sure their voices are being heard. Post-mortem discussions starting with “I told you half a year ago it will shatter” are a sign that your architecture task force needs fine tuning. “Trusting teams” is the theme.

For the architectural task force to succeed, and for the whole team to benefit, it’s crucial that Product Owners and management trust that developers are acting with their best intentions. Accountable developers tend to succumb to a feature-pressure quite easily which quickly deteriorates into “I told you…” mode.

3. Start your coding with CI/CD part

The above may sound a bit radical but I hope I captured your attention. The classic pitfall is “features now, their tests later”. Under deadline pressure, tests are usually the first item to get slashed.

Let me share a personal story. I had a project where we had a great architecture with only one piece missing — we invested zero time in thinking about how we would test our product. On our 5th scrum sprint we realized that we constantly spend 2–3 days before sprint review in “integration frenzy” and it’s getting worse from sprint to sprint. Our second mistake was that we could not pull out of feature-frenzy and clearly communicate to the product management that we need to get back to the drawing board and do design adjustments with regards to CI (RE: the trusting teams notes in the previous chapter). A year down into the project we ended up with a bolt-on CI that broke more times than it didn’t, and was a constant source of frustration for the developers. This big ball of mud got so big that even given a carte blanche on investment to fix it, we couldn’t agree within the team about the best way to rearchitect it.

Two things happen when you incorporate both CI and CD into your core architecture:

  • You will start asking yourself how do I know that the system is working. This will naturally lead you to start instrumenting your code to obtain operational metrics.
  • When you’ll get to debug nightly CI failures, you will develop both skills and tools (e.g. proper logs and tracing data collection) to analyze and fix problems that have already happened. These are exactly the kind of skills and techniques your team will need to debug production outages.

As a side effect, you will have logging and monitoring systems in place even before going to production. One thing to warn you about monitoring and logging frameworks is their operation cost, be it SaaS or roll your own, can become significant very quickly — I strongly recommend to estimate potential costs and factor that into decision.

Once you have your CI practice established to your satisfaction, only then you can occasionally indulge with “features now, their tests later”. And by “later” I mean “next sprint” unless this is a disposable, one-off, demo functionality you are going to disable with feature flags.

Each cloud has its own native monitoring and logging services that may or may not be the best choice for you. With K8s, the CD part is mostly streamlined, however the CI part will require good research on your side. Each of the big 3 cloud hyperscalers provide CI tools but frankly, it’s not their bread and butter.

4. You build it — you run it

“You build it — you run it” is a given in small companies since there is simply no SRE fence to throw the stuff over. So the question is not who owns the production but rather what’s the experience of owning the production workload.

The beauty is that if you have catered for the previous steps, production ownership will be very natural for your team. Let’s see how:

  • You started with security, so you already must have perimeter separation in place. You know where to look for issues; i.e. situations where development code accidentally connects to the production database and wreaks havoc are simply off the table.
  • Your architecture is real and fits your team. It’s clear and easy to understand. E.g. you don’t maintain complex distributed systems unless you have to and can give them a run for their money. Armed with your observability tools, you can find problem sources quickly.
  • Your CI is stable and trustworthy, and your CD is smooth. Hence, when you have a bug to fix, the test & deploy processes are quick, easy, and free of cognitive load and nerve-wracking. This minimizes the drudgery caused by bugs and prevents the situation of Continuous Maintenance.

Your applications already have observability built in, so all you need to do is configure monitoring dashboards and setup alerts.

A less obvious aspect of production ownership, particularly for developers, is the cost of running your product. I personally believe developers should be knowledgeable about cloud billing models and how much money their software consumes for the below reasons:

Avoid bill surprises

You need to understand the cloud billing model to properly design your system (bill shock, anyone?) — both the application architecture and CI/CD architecture. To close the loop, set up billing budgets and alerts.

Know your Unit Cost

It’s beneficial for product and sales people to know your “unit cost”. For example, if your product monitors edge devices, then the “unit cost” is how much each monitored device contributes to your cloud bill. The unit cost figure is very useful in developing business models and taking guesswork out of capacity planning.

With your application metrics in place, it’s usually easy to attribute your app logic to costs through a deeper look at your billing data.

5. People work for people

We love our bits, bytes, and formal logic, but in the end, we are a bunch of humans working together.

We join startups because we believe in something and are enthusiastic about it.

Managers, trust that your developers are acting with the best intentions. Challenge their ideas but leave them space to innovate. Watch carefully for signs that your devs are going into defensive mode — from there, every feature suddenly becomes complex to implement and your project starts to stagnate.

We can talk about clouds and technology all day, but if work is not enjoyable, innovation won’t last.

Subscribe to updates, news and more.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related blogs

Connect With Us