Modern Analytics — A Data Stack For Successful Reporting

Photo Credit: Suzanne D Williams

What exactly is a data stack? A data stack is a process of collecting, storing, and analyzing data in order to make better judgments. Data stacks are made up of many components that work together to build an end-to-end solution for reporting on your company’s most critical data.

These components include ETL (Extract, Transform, and Load), which extracts raw data from multiple sources such as social media or portals, transforms it into useful formats such as tables with columns, and loads it into a data warehouse for analysis by business users using BI tools such as Tableau or Power BI. ELT (Extract, Load, and Transform) does the same thing, but generally automatically rather than manually, allowing businesses to save time by not having to do all of the data preparation themselves.

If a human operator needs to touch your system during normal operations, you have a bug. The definition of normal changes as your systems grow — Carla Geisser, Google SRE

The Modern Data Stack

Data Ingestion

A data warehouse is only as good as the data that it holds, and it’s only valuable if you can actually get useful data into it. So in order to make this happen, companies are employing data ingestion tools. These extract the raw datasets from different sources and load them into a central location where they can be transformed by ELT or ETL processes. Two of the most popular data ingestion tools are Fivetran and Airbyte

Fivetran: A pipeline that allows you to connect your existing database infrastructure into leading data warehouses, visual analytics tools and other applications. Stitch: Connects directly with over 100+ sources in the cloud or on-premise for fast, easy, secure access to all of your critical business data.

Airbyte: An open-source tool created by Netflix that helps with connecting their 500+ databases into one location.

Pro Tip: You may need to use more than one data ingestion tool at the same time in order to get all your data sets into a single place — so don’t put too much effort into just choosing one when there’s no clear winner yet!

Transformation

No self-respecting data engineer can describe the modern data stack without discussing Data Built Tool, abbreviated as DBT. dbt is a language and tool that helps you transform your data into the exact format that you need it in for reporting or analysis. DBT is given raw data. changes it and generates curated datasets for use by analytics tools or machine learning models. You can then schedule this change to run monthly, weekly, daily, or even hourly, depending on your use case. The fact that everything is done via SQL statements is what makes DBT so impressive. It’s used by companies like Spotify, Lyft, and Airbnb to clean up and prepare their data before loading it into their data warehouses.

Spotify: Uses dbt to help them with transforming their music data into a format that they can use for reporting and analysis.

Lyft: Uses dbt to help them with transforming their ride data into a format that they can use for reporting and analysis.

Airbnb: Uses dbt to help them with transforming their guest data into a format that they can use for reporting and analysis.

The extracted data is subsequently saved centrally in a data warehouse (Redshift, Snowflake, BigQuery), where it is transformed and loaded with the Bata build tool DBT.

Operational Analytics

Once you have your data in a data warehouse, it’s ready for use by BI tools or machine learning models. But some companies also use their data warehouses for operational analytics, which is the process of using real-time data to make decisions about things like product pricing or inventory levels.

Netflix: Uses their data warehouse for operational analytics to make decisions about their content library.

Pro Tip: Beyond just storing the data, one of the biggest benefits of a data warehouse is that it makes querying your data extremely easy — which means you shouldn’t be afraid to ask any question that could possibly come up!

The modern business stack continues with how data is accessed and used. The shift from ETL to ELT allows for data extraction, transformation and loading (ETL) to be mostly automated which saves time by not having to do all the data prep themselves. Data ingestion tools like Fivetran or Airbyte help get the raw datasets into a central location where they can be transformed by ELT or ETL processes like dbt. Companies then use the data in their data warehouses for operational analytics (making decisions about product pricing, inventory levels) and BI tools/machine learning models to extract insights from it.

Key Takeaways: Modern business stack is composed of extracting data from multiple sources using loaders, centrally stored transformed datasets by ELT or ETL processes and using the data in a data warehouse for operational analytics and BI tools/machine learning models. The use of ELT instead of ETL allows for data extraction, transformation and loading to be mostly automated which saves time by not having to do all the data prep themselves. Data ingestion tools like Fivetran or Airbyte help get the raw datasets into a central location where they can be transformed by ELT or ETL processes like dbt. Companies then use the data in their data warehouses for operational analytics (making decisions about product pricing, inventory levels) and BI tools/machine learning models to extract insights from it.

--

--

--

I'd like to think of myself as someone who analyzes data, deduces meaning, and then threads it all together to create coherent visual narrative.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

what is the relationship between noise and outliers ?

Interaction effects in machine learning

City of Toledo establishes online open data portal

Your Work From Home Data Science Plan

Virtue Signals or News Reading? An exploration on why Senators follow each other on Twitter

Artificiality Bites 💊 Issue #47

Reflections on three years of informing decisions with data and evidence

Generating Poisson Distributions for a Fixed Number of Events

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
cengkuru michael

cengkuru michael

I'd like to think of myself as someone who analyzes data, deduces meaning, and then threads it all together to create coherent visual narrative.

More from Medium

Understanding Data Strategy

Scentbird Analytics 1.0. Analytics for dummies

Capturing Data Analytics Workflows and System Requirements

Unlock the Potential of Data with SAP Data Warehouse Cloud