Cloud Masters Episode #105
Five data pipeline transgressions costing you money
Covering five costly mistakes data engineers make when building their data pipelines, and what you should be doing instead.
Cloud Masters Episode #105

With DoiT Spot Scaling, automate your AWS Spot Instances to save up to 90% on compute spend without compromising reliability.

Cloud Masters
Cloud Masters
Five data pipeline transgressions costing you money
Loading
/
Cloud Masters
Cloud Masters
Five data pipeline transgressions costing you money
Loading
/

Episode notes

Jon Osborn, Field CTO at Ascend.io, joined us to share some of the costliest mistakes he sees data engineers making when building their cloud data pipelines.

Key Moments

3:46: [Transgression #1] Overpaying for data ingestion
11:13: [Transgression #2] Using Spark with Snowflake, rather than Snowpark
15:10: [Transgression #3] Not using partitioning
24:34: [Transgression #4] Re-running the whole pipeline every time
28:13: [Transgression #5] Using the same-sized warehouse for every workload

About the guests

Jon Osborn
With over 20 years of experience in data and technology, as the Field CTO at Ascend.io, Jon supports the Ascend.io platform with deep executive and enterprise architecture experience, working with customers across healthcare, insurance, and retail sectors to automate data processing in a new way. At Ascend.io, he collaborates with the engineering team to build the only data platform that takes on the hard infrastructure and process stuff so engineering teams can focus on the code that matters the most. He also contributes to the product vision, roadmap, and backlog, ensuring that customer feedback becomes actual features.
Matthew Richardson
Matthew Richardson is a Senior Cloud Architect at DoiT International specializing in the Data & Analytics space, with certifications across all major cloud providers — Google Cloud, AWS, Azure — as well as Snowflake, DBT, Python, Tableau and SAS. He has deep experience in the use of a variety of Data Engineering, ETL, BI reporting & programming tool sets including the above products plus Matillion, Talend, Cognos, Teradata and Business Objects in addition. Matthew currently works with customers in optimizing their Data Modelling strategies particularly in BigQuery, Redshift or Snowflake, ensuring customers are getting the most out of their data on a consistent basis.
Jon Osborn
With over 20 years of experience in data and technology, as the Field CTO at Ascend.io, Jon supports the Ascend.io platform with deep executive and enterprise architecture experience, working with customers across healthcare, insurance, and retail sectors to automate data processing in a new way. At Ascend.io, he collaborates with the engineering team to build the only data platform that takes on the hard infrastructure and process stuff so engineering teams can focus on the code that matters the most. He also contributes to the product vision, roadmap, and backlog, ensuring that customer feedback becomes actual features.
Matthew Richardson is a Senior Cloud Architect at DoiT International specializing in the Data & Analytics space, with certifications across all major cloud providers — Google Cloud, AWS, Azure — as well as Snowflake, DBT, Python, Tableau and SAS. He has deep experience in the use of a variety of Data Engineering, ETL, BI reporting & programming tool sets including the above products plus Matillion, Talend, Cognos, Teradata and Business Objects in addition. Matthew currently works with customers in optimizing their Data Modelling strategies particularly in BigQuery, Redshift or Snowflake, ensuring customers are getting the most out of their data on a consistent basis.

Related content

Cloud Data Pipeline Bake-Off: Ascend.io versus dbt
Evaluating two data transformation tools used to build cloud data pipelines, head to head.
Leverage Malloy and Looker for a Unified, Future-Proof Data Warehouse
SQL has downsides that limits collaboration around analyzing complex datasets. Here’s how Malloy addresses SQL’s faults to help you operate at scale.
BigQuery — keep data fresh while avoiding large-scale mutations
BigQuery — keep data fresh while avoiding large-scale mutations
We demonstrate how to keep your data fresh and updated while avoiding large mutations.