Skip to content
  • Products
    • Portfolio overview >

      Flexsave™

      Automatically manage cloud compute for optimized costs and agility

      Cloud Analytics

      Make sense of spend and chargeback to align with your business

      google cloud msp

      BigQuery Lens

      Optimize BigQuery costs with actionable recommendations and usage insights

      Spot Scaling

      Maximize AWS Spot savings and minimize disruptions for optimized scaling

      Anomaly Detection

      Autonomously identify cost spikes early, with zero configuration

      Platform overview >

      Organize your billing data for better business decisions

  • Services
    • Services overview >

      How we work

      Learn how we’re redefining support with our customer reliability engineering

      Stats

      View our live support and customer satisfaction statistics in real-time

      Cloud solutions

      Proven solutions to cloud complexity

      Areas of expertise

      Cloud Architecture

      Ensure your cloud architecture is future-ready and built for success

      Cloud Cost Optimization

      Identify opportunities to optimize costs and target spend for added value

      Cloud Migration

      Realize greater efficiency and innovation with successful cloud migration

      Cloud Security

      Center security in your cloud strategy to ensure ongoing efficacy and growth

      Data and Analytics

      Harness the potential of big data and analytics to gain a competitive edge

      Data Management

      Build your data practice with expert guidance tailored to your business goals

      DevOps Jump Start

      Accelerate your AWS workloads & release pipelines while also increasing automation, monitoring & reliability

      Infrastructure

      Maximize the full suite capabilities from your cloud infrastructure

      Kubernetes

      Manage the complexity of Kubernetes to enable innovation and scalability

      Location-Based Services

      Transform geolocational data into real-world, real-time intelligence

      Machine Learning

      Level-up key data with ML capabilities that accelerate innovation

      Multicloud

      Create meaningful business value with a robust multicloud strategy

      Training

      Build skills and capability across teams with certified, expert-led training

  • Partners
    • Alliances

      Proud to be an award‒winning multicloud partner to top‒tier cloud providers

      doit-together

      DoiT Together

      Enabling cloud growth and unlocking revenue through expert partnership

      ISV Go-Global

      Accelerate new customer growth and Marketplace integration on AWS and GCP

  • Resources
    • Resources hub >

      Blog

      Read the latest insights, tips and perspectives from our team of cloud experts

      Case Studies

      See how we’ve helped thousands of public cloud customers achieve their goals

      Ebooks and Guides

      Discover foundational expertise and future-ready recommendations for the cloud

      Events and Webinars

      Tech talks and interactive expert sessions delivered both virtually and in person

      GCPInstances.info

      Google Cloud Compute Engine instance comparison

      Help center

      Read documentation, product updates, and more

      Newsroom

      See what's new from DoiT in our latest news and announcements

      Trust Center

      How we focus on security, compliance, and privacy

      Videos

      Watch product demos, interviews and more from our cloud experts

  • About
    • About DoiT >

      Careers

      Browse our open positions and learn more about what it takes to be a Do’er

      Leadership

      Meet the team leading DoiT and our customers on a journey of hypergrowth

      Newsroom

      See what's new from DoiT in our latest news and announcements

  • Pricing
  • Contact us
  • Sign In
  • Products
    • Flexsave ™
    • Cloud Analytics
    • Spot Scaling
    • BigQuery Lens
    • Anomaly Detection
    • DoiT Platform
  • Services
    • How We Work
    • Stats
    • Cloud Solutions
    • Areas of expertise
      • Cloud Architecture
      • Cloud Cost Optimization
      • Cloud Migration Consulting Services
      • Cloud Security
      • Data and Analytics
      • Data Management
      • DevOps with AWS & DoiT
      • Infrastructure
      • Kubernetes
      • Location Based Services
      • Machine Learning
      • Multicloud
      • Training
  • Partners
    • ISV Go-Global
    • Award-winning public cloud partner
    • DoiT Together
  • Resources
    • Blog
    • Case Studies
    • Ebooks and Guides
    • Events and Webinars
    • GCPInstances.info
    • Help center
    • Newsroom
    • Trust Center
    • Videos
  • Pricing
  • About
    • Careers
    • Leadership
    • Newsroom
  • Contact us
  • Sign In
Contact us
Sign in

Blog

Vertex AI Vizier for fewer repetitions of costly ML training

  • Joshua Fox Joshua Fox
  • Date: July 31, 2023

Part 1

You have an optimization process where each trial is costly in time or money. It might be Machine Learning (ML) training, where every run takes hours and tens of dollars, or A/B testing where each iteration could take a day and potentially thousands of dollars of lost revenue. Or it might be supply chain management for a factory, where choosing the day’s inputs for the lowest cost and greatest output is a gamble: A wrong guess can cost a day and a lot of money.

If a trial doesn’t come out good enough, you try again. But you don’t want to guess at parameters, when each trial is costly; you want to converge on the best feasible parameters in as few trials as possible.

This series of three blogs will explain a new approach to reducing cost for ML training and other optimization processes: the Black Box optimization workflow with Google Vertex AI Vizier.

It’s essential to remember that Vizier is not just for Machine Learning. It does not even know you are doing ML; it doesn’t get involved in the trials themselves. Vizier only knows what you tell it: The parameters and the objective measurement (how well the trials came out). You are the expert, and you run the ML training or other process under optimization, while Vizier is an advisor that sits on the outside and gives you suggestions for the next round.

For ML training, you can use Vertex AI Vizier to help choose the best possible hyperparameters — those that configure the training run as a whole — such as regularization, learning rates, or the number or size of layers. Moreover, the type of model to be used in the next training run is itself a parameter to be explored: For example, whether a binary classification problem should be solved with Naive Bayes, Logistic Regression, or XGBoost.

Because Vertex AI Vizier is just advising on training from the outside, you can use it with any kind of ML. You could be using Google Vertex SDK to launch custom training jobs, or you could be training your own TensorFlow or PyTorch model inside a custom container.

How to use Vizier

You may be familiar with hyperparameter tuner libraries like those of Scikit and Hyperopt. These run repeated ML trials, tweaking the hyperparameters each time, trying to converge to the best result.

However, a hyperparameter tuner service like those of Google and AWS radically simplify the process as they eliminate the need to set up the libraries and the compute infrastructure.

Vertex AI Vizier takes this one step further with the Black Box process that I will describe in this series. The basic difference is the usage pattern: With Vizier you run interactive trials, in which you ask Vertex AI Vizier for advice on each round, and it sends back suggestions, based only on how good the results were on previous trials.

The process

These are the high-level steps for implementing the workflow with Vertex AI Vizier: :

  1. First, create a Vertex AI Vizier study, in which there will be a number of trials. The study is defined with a study configuration, which states the goals (the measurements/metrics) and input values (hyperparameters) of your experiments, called trials.
  2. Run a trial: training a Machine Learning model or running another process you need to optimize.
  3. Invoke Vertex AI Vizier, by sending the parameters along with the optimization measurement from that trial. In the case of training an ML Model, that metric might be, or example, cross-entropy loss or balanced accuracy on your validation data, after the usual train/validation/test split.
  4. The response to that request is suggestions for the next trial. This includes one or more sets of parameters, which you can use in defining an upcoming a trial.
  5. Run the next trial, typically based on the suggested parameters. But they are just suggestions, and you can use your own choice of parameters. Either way, Vertex AI Vizier will keep learning.
  6. Keep going with the next trial.
  7. Stop iterating after you have done enough trials (details on that to follow).

Vertex AI Vizier

Flow chart of interaction with Vizier

Vertex AI Vizier is software-as-a-service. You just call a REST service through a convenient Python API, which I will discuss later.

What an API Call Looks Like

Here is an example of the input and output. It’s a bit simplified, but it presents the essence of the REST invocation.​​ (In practice, you would use a Python client, which we discuss later.)

Input:

{
 "Params": {"flour": 0.21, "recipe": "Pam's chocolate chip", "eggs": 2 },
 "rating": 8.1
}

Output:

{"flour": 0.32, "recipe": "Chocoflake", "eggs": 1}

This JSON is an example of a cookie-recipe optimization, which was actually carried out by the research team: Cookie baking is so expensive and time-consuming that you don’t want to do it hundreds of times!

The parameters include the amounts per ingredient: We have a float and an integer parameter. Another parameter is the choice of recipe, which is a categorical parameter (one that has a few options, with no ordering between them).

The input also includes the success measurement, which is the taste-testers’ rating of the cookies.

The output, the response from the REST call, is a suggestion (in practice, potentially multiple suggestions) for the parameters to use on the next round of baking.

Running the trials in a study

You can do the trials iteratively, but you can also run multiple trials in parallel. You ask Vertex AI Vizier for multiple suggestions, say five, and then simultaneously run five trials. Parallelism has a trade-off: On the one hand, it speeds things up, but on the other hand, there’s an advantage to doing trials iteratively, because then Vizier can improve its recommendations for each trial based on all previous ones.

The number of trials can be predetermined, say twenty: The recommendation is ten times the number of parameters. Alternatively, Vizier can suggest when it is time to stop, based on whether results are converging. This frugality in the number of trials is important, since each is costly.

Where we go from here

Recently, in February 2023, Vizier became available in an open-source variant. I recommend that you explore the implementation. However, the open-source does not contain the core Bayesian optimization algorithm, and the Vertex AI Vizier service provides ease of use, scalability, and robustness. In the blog post series, we will focus only on Vertex AI Vizier.

In this first blog post, we provided a summary of how to use Vertex AI Vizier. The second blog post will explain the advantages of this “Black Box” approach: By leaving responsibility for the heavyweight optimization processes with you, you do what you do best, such as structuring your ML training, or other costly processes. Meanwhile, Vertex AI Vizier does what it does best: Guiding you to get the best outcome in the smallest number of trials.

Next: When should you choose Vertex AI Vizier rather than a hyperparameter tuner?

See Part 2: “The advantages of the Black Box approach when you’re optimizing slow, costly Processes.”

Subscribe to updates, news and more.

Subscribe

Subscribe to updates, news and more.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Related blogs

Ramp Plans Resource Hub Header1

Monitor your cloud commit attainment with DoiT Ramp Plans

DoiT Ramp Plans help you visualize, manage, and track your commit attainment so you can make sure you spend what you committed to, and act proactively.

Keep reading 
Secure-access-to-GCP-services-in-GitLab-Pipelines-with-Workload-Identity-Federation-DoiT-International

Secure access to GCP services in GitLab Pipelines with Workload Identity Federation

Traditionally, when using services like Google Cloud in a non-GCP environment (e.g., a CI/CD environment like GitLab pipelines), developers

Keep reading 
August 2023 Product Updates Resource Grid

[August 2023] DoiT Product Release Notes

We’re excited to share some recent updates we made in August to DoiT’s product portfolio. If you prefer watching

Keep reading 
View all blogs
Let’s do it

From cost optimization to cloud migration, machine learning and CloudOps, we’re here to make the public cloud easy — without the costs.

Ready to get started?

Get in touch

Company

  • About us
  • Blog
  • Careers
  • MS-HT statement
  • Newsroom
  • Privacy policy
  • Terms

Offering

  • Compliance
  • Products
  • Services
  • Solutions
  • Resources

Support

  • Sign in
  • Help center
  • Open ticket
  • Contact us

Never miss an update.

Subscribe to our newsletter

Subscribe to updates, news and more.