Optimizing the Performance of LookML Queries

Ideally, you want to avoid re-running massive queries if nothing has changed, and instead, append new data to existing results to reduce repetitive requests

The most commonly used methods to optimize query performance in Looker: 1)persistent derived tables,

2)aggregate awareness, and

3)performantly joining views.

Persistent derived tables (PDTs)

Looker allows you to write SQL and LookML queries to your database as a temporary table. When this table is cached or persisted, it's called a PDT. 

By saving these queries as a table, you have control over when or how they are built. Tables can rebuild every morning, once a month, or only when new data is added. Ideally, you configure your derived tables to reflect the nature of your data.

You can also create incremental PDTs to append new data without rebuilding the entire table. Applying incremental changes works well for large tables in which existing (older) data is not frequently updated because the primary update to the table is new records.

Aggregate awareness

For very large tables in your database, Looker's aggregate awareness can create smaller aggregate tables of data grouped by various combinations of attributes. The aggregate tables act as "rollups" or summary tables that Looker can use instead of the original large table for queries whenever possible. Aggregate awareness allows Looker to find the smallest, most efficient table available in your database to run a query while still maintaining accuracy

No alt text provided for this image

Join views in a performant manner

  1. only the views that you need while defining a new Explore
  2. use base fields instead of concatenated fields as the primary keys
  3. use many_to_one joins: joining views from the most granular level to the highest level of detail (many_to_one) typically provides the best query performance in Looker.
  4. Include Explore filters.(ALways Filter, conditionally filter)
  5. implementing Caching policies ( persist with, for data groups)




Prashant Khadatkar

Data Engineering & BI | Google Looker | Apache Airflow | SQL | Redshift | Bigquery | Python | 12+ Years exp.

1y

Well said, I have few points to add here. 1. Avoid using Incremental PDT's if your source table get update in certain period of time. Incremental PDT's best with create only table otherwise you will end up having duplicate records in destination. 2. Model level filter is good option while filtering a data.

To view or add a comment, sign in

More articles by Pradeep k

  • Simple DAX Scenarios:

    Weighted Average Always plays a vital role in our calculations. Simple averages don't work for our analysis.

  • Cache management in Looker

    let's break down the correlation between `cache`, `sql_trigger`, `max_cache_age`, and `persist_for` in Looker. 1.

    3 Comments
  • Alteryx Best Practices : Part1

    These are some best practices to follow while writing your workflows. 1) Remove all the browse tools once your workflow…

Insights from the community

Others also viewed

Explore topics