Storytelling with Data using Databricks

Storytelling with Data using Databricks

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You'll discover the power of storytelling and the way to make data a pivotal point in your story. The lessons in this illuminative text are grounded in theory but made accessible through numerous real-world examples - ready for immediate application to your next graph or presentation.


This book provides a comprehensive introduction to data visualization and is particularly oriented toward professionals in business and related fields who must present quantitative information in an accessible, insightful manner. Here's a broad overview of its content:

  1. Understanding the Importance of Context: The book starts with a discussion on understanding the context before diving into data. It emphasizes the need to know your audience and tailor your message accordingly.
  2. Choosing an Effective Visual: Knaflic talks about common types of data visualizations and when to use each type.
  3. Eliminating Clutter: The book discusses the concept of decluttering your visualizations to make them more effective. This includes removing unnecessary decorations and simplifying color schemes.
  4. Focusing Attention Where You Want It: This section talks about using pre-attentive attributes, like color and position, to guide your audience's attention to key parts of your data.
  5. Thinking Like a Designer: Knaflic applies principles from graphic design to data visualization, including the use of whitespace and typography.
  6. Telling a Story: The final part of the book talks about how to weave your data into a compelling narrative.


Exploratory vs. explanatory analysis

Exploratory analysis delves into data to uncover patterns and form hypotheses, primarily for the analyst's understanding, while explanatory analysis communicates specific insights to others, often weaving data into a compelling narrative to guide decision-making.

Comparing the two side-by-side

No alt text provided for this image
Apologies for using an image. LinkedIn Articles have some limitations


No alt text provided for this image


Applying best practices from the book using Databricks

With Databricks, you can come a long way, but it will always be in the exploratory analysis realm. Here’s an example when trying to show: Sales Number by Month with historical data from 2 previous years and including a forecast.

The result

No alt text provided for this image

The settings

  • Using a line with x having the months y having the metrics and group by being the year
  • for two metrics: sales_numbers and sales_numbers_latest_estimate
  • Legend to the bottom
  • Do not display missing and null value

This will result in 6 series: each year (2021, 2022 & 2023 times each metric), which we correct later on.

No alt text provided for this image

Series

Give each series a proper name. In this case we don’t want to show the latest estimate for 2021 and 2022 as these are noise. Instead, add a space to solve for this.

No alt text provided for this image

Colors

Set the right colors. We want to draw attention to 2023 and the latest estimate. 2021 and 2022 are merely supporting the visual, and we made them gray. Set the colors for latest estimate in 2021 and 2022 to white (#FFFFFF) to make them disappears on the chart as having white on white won’t be seen by people.

No alt text provided for this image

Limitations for Data Labels

  • You can only enable or disable data labels, which applies to all data points
  • You can not enable data labels for a single series, let’s say 2023 and/or the latest estimate.
  • You can not enable data labels for specific combinations of series and x-axis values such as only label 2023 the past 3 months (March, April & May) and Latest Estimate the current and upcoming month (only May & June)

No alt text provided for this image

In summary

Though having some limitations, I’m happy with the overall result, and the data is now easier to consume.

 

Gabriel Quinche

Associate Data Scientist

11mo

The applying part is not clear to me. I'm not experienced with data bricks, but I don't really see anything from the visual part, even when I deal with this kind of charts daily, you should highlight in the images what you want the user to pay attention, precisely following what the book advocates, else users new to databricks don't understand what you are trying to show in the graphs.

Like
Reply
Rawan Alhassan

Statistician | Data Analytics | Data Quality | Market Research | Survey Research

1y

A valuable post! Thank you.

Like
Reply

To view or add a comment, sign in

More articles by Ralph K.

  • Data & AI Summit 2023 - Day 1

    The Data & AI summit started yesterday and all socials channels are flooded with updated. The theme "Generation AI" is…

    1 Comment
  • Take a walk: the pleasure of audiobooks

    In 2020, I set myself a goal. A goal of reading one book a month, twelve a year.

    5 Comments

Insights from the community

Others also viewed

Explore topics