SlideShare a Scribd company logo
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
PostgreSQL Indexing
Dublin, 2013
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
Scope of this session:
- What a basic index does
- The PostgreSQL optimizer (cost model)
- Classical B-tree Indexes
- Partial / functional indexes
- Different types of indexes
- Full-Text-Search
- Fuzzy matching
- Writing your own indexing strategy
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Generating test data:
- for the purpose of this session we need a
table consisting of two columns:
test=# CREATE TABLE t_test (id serial, name text);
CREATE TABLE
test=# INSERT INTO t_test (name) VALUES ('hans');
INSERT 0 1
test=# INSERT INTO t_test (name) VALUES ('paul');
INSERT 0 1
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A lot more test data ...
- Let us create some more test data
by repeating the process
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2
...
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2097152
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A lot more test data ...
- Let us create some more test data
by repeating the process
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2
...
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2097152
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Reading some data:
- Let us see, how PostgreSQL executes a simple query:
test=# SELECT count(*) FROM t_test;
count
---------
4194304
(1 row)
Time: 431.192 ms
test=# explain analyze SELECT count(*) FROM t_test;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=977.865..977.865 rows=1 loops=1)
-> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0)
(actual time=0.013..531.448 rows=4194304 loops=1)
Total runtime: 977.917 ms
(3 rows)
Time: 1045.065 ms
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Reading some data:
- Let us add a filter:
test=# SELECT count(*) FROM t_test WHERE id = 421234;
count
-------
1
(1 row)
Time: 476.965 ms
test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=495.134..495.135 rows=1 loops=1)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=1 width=0)
(actual time=53.405..495.126 rows=1 loops=1)
Filter: (id = 421234)
Rows Removed by Filter: 4194303
Total runtime: 495.175 ms
(5 rows)
Time: 520.659 ms
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Sequentially reading data:
- In case you like reading the phone book sequentially
we are basically done.
- Sequentially reading the phone book is technically ok
=> but socially not accepted
- Defining an index is the desired solution
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Creating an index
test=# h CREATE INDEX
Command: CREATE INDEX
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ]
ON table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ]
[ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]
- At the end of the day all clauses will be
covered by this training
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A typical index:
test=# CREATE INDEX idx_id ON t_test (id);
CREATE INDEX
Time: 7357.663 ms
- This gives us a standard btree index
- PostgreSQL provides “High-Concurrency B-Trees”
(Lehman-Yao, 1981)
- Many people can modify the index at the same time
- Highly efficient B+ tree
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- How a btree works:
8k
Root Node
...
Sorted
...
Forward chaining
Tabelle
Index
8k ...
Row
linp
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Indexing is beneficial
test=# explain analyze SELECT count(*)
FROM t_test
WHERE id = 421234;
QUERY PLAN
------------------------------------------------------------------------------
Aggregate (cost=8.73..8.74 rows=1 width=0)
(actual time=0.024..0.024 rows=1 loops=1)
-> Index Only Scan using idx_id on t_test (cost=0.00..8.73 rows=1 width=0)
(actual time=0.019..0.020 rows=1 loops=1)
Index Cond: (id = 421234)
Heap Fetches: 1
Total runtime: 0.057 ms
(5 rows)
Time: 0.395 ms
- A lot faster :).
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Still slow ...
test=# SELECT count(*) FROM t_test WHERE name = 'hans';
count
---------
2097152
(1 row)
Time: 787.407 ms
- This is still slow. Let us create an index ...
test=# CREATE INDEX idx_name ON t_test (name);
CREATE INDEX
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- The benefit is exactly zero:
test=# SELECT count(*) FROM t_test WHERE name = 'hans';
count
---------
2097152
(1 row)
Time: 782.443 ms
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
- The index won't be used
- Too many identical values (“not selective”)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- The cost is far from zero:
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
177 MB
(1 row)
test=# SELECT pg_size_pretty(pg_relation_size('idx_id'));
pg_size_pretty
----------------
90 MB
(1 row)
test=# SELECT pg_size_pretty(pg_relation_size('idx_name'));
pg_size_pretty
----------------
90 MB
(1 row)
- Indexes need a fair amount of space
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Input values DO make a difference:
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2';
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=7.74..7.75 rows=1 width=0)
-> Index Only Scan using idx_name on t_test (cost=0.00..7.74 rows=1 width=0)
Index Cond: (name = 'hans2'::text)
(3 rows)
- PostgreSQL will decide depending on the input value
=> cost based optimization
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Partial indexes:
- In our example the index is only used in case
of rare or non-existing values
- What is the point of an index when its entire
content is totally useless?
=> a more selective strategy is needed
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Partial indexes:
test=# DROP INDEX idx_name;
DROP INDEX
test=# CREATE INDEX idx_name ON t_test (name)
WHERE name NOT IN ('hans', 'paul');
CREATE INDEX
test=# SELECT pg_size_pretty(pg_relation_size('idx_name'));
pg_size_pretty
----------------
8192 bytes
(1 row)
- A partial index reduces space consumption
- Benefit is still the same
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Equal benefit – lower cost:
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2';
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=7.28..7.29 rows=1 width=0)
-> Index Only Scan using idx_name on t_test (cost=0.00..7.28 rows=1 width=0)
Index Cond: (name = 'hans2'::text)
(3 rows)
- This is exactly the same as before !
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- What about functions?
test=# CREATE INDEX idx_cos ON t_test ( cos(id) );
CREATE INDEX
Time: 16867.228 ms
test=# explain SELECT count(*) FROM t_test WHERE cos(id) = 17;
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=23960.99..23961.00 rows=1 width=0)
-> Bitmap Heap Scan on t_test (cost=395.25..23908.56 rows=20972 width=0)
Recheck Cond: (cos((id)::double precision) = 17::double precision)
-> Bitmap Index Scan on idx_cos (cost=0.00..390.01 rows=20972 width=0)
Index Cond: (cos((id)::double precision) = 17::double precision)
(5 rows)
- PostgreSQL provides functional indexes
- VERY nice to avoid additional columns
- Gives a lot of extra flexibility
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Type of functions allowed
- Functions must be deterministic
=> “immutable”
=> Functions can be written in almost any language
=> This is highly performance sensitive
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- How does PostgreSQL decide on
index vs. no index?
- PostgreSQL uses statistics to estimate the number of
rows coming back
- Each operation will be assigned to costs
=> costs are just a number to compare
different options inside the planner
- Costs parameters can be changed at runtime
or globally
=> be careful, it can go against you
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- pg_stats is your friend:
test=# d pg_stats
View "pg_catalog.pg_stats"
Column | Type | Modifiers
-------------------------------+-----------+-----------
schemaname | name |
tablename | name |
attname | name |
inherited | boolean |
null_frac | real |
avg_width | integer |
n_distinct | real |
most_common_vals | anyarray |
most_common_freqs | real[] |
histogram_bounds | anyarray |
correlation | real |
most_common_elems | anyarray |
most_common_elem_freqs | real[] |
elem_count_histogram | real[] |
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Updating statistics
- System statistics are updated by ANALYZE:
test=# h ANALYZE
Command: ANALYZE
Description: collect statistics about a database
Syntax:
ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ]
- In most setups autovacuum is in charge
of updating pg_statistic
- In most cases statistics are not an issue
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- How does PostgreSQL estimate costs?
- seq_page_cost = 1
- random_page_cost = 4
- cpu_tuple_cost = 0.01
- cpu_operator_cost = 0.0025
- cpu_index_tuple_cost = 0.005
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (1):
test=# explain SELECT count(*) FROM t_test;
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0)
(2 rows)
- total costs are at 75100.81
- costs are composed of I/O and CPU costs
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (2):
test=# SELECT pg_relation_size('t_test') / 8192;
?column?
----------
22672
(1 row)
- our table consists of 22672 blocks
- each block is 8kb in size
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (3):
The seq scan:
I/O cost = 22672 * seq_page_cost = 22672
4.194.304 * cpu_tuple_cost = 41943.04
= 64615.04 for the seq scan
The aggregate:
4.194.304 * cpu_operator_cost = 10485.76
Total costs => 75.100.80 + cpu_operator_cost
(we have to display the tuple)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Inflation at work:
test=# SET seq_page_cost TO 10;
SET
test=# explain SELECT count(*) FROM t_test;
QUERY PLAN
-----------------------------------------------------------------------
Aggregate (cost=279148.80..279148.81 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..268663.04 rows=4194304 width=0)
(2 rows)
- Costs can be changed at runtime to fine tune
index usage
=> only do this if you are fully aware of what
you are doing. It can have unintended side
effects
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Spinning disks vs. SSDs
- Traditional disks are fast sequentially
and pretty bad when doing random
I/O
- SSDs fixed the problem.
=> consider changing random_page_cost
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Abusing tablespaces:
test=# ALTER TABLESPACE pg_default
SET (random_page_cost = 1);
ALTER TABLESPACE
- Allows different cost settings for various
disk subsystems
- It also allows to split “cached” and “uncached”
data -> ugly but useful
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Correlation and disk layout
test=# CREATE TABLE t_random AS SELECT *
FROM t_test
ORDER BY random();
SELECT 4194304
test=# CREATE INDEX idx_random ON t_random(id);
CREATE INDEX
test=# ANALYZE t_random;
ANALYZE
- The PostgreSQL optimizer considers the
physical order of rows on disk
- High-correlation will make indexes ways
more likely as the optimizer reduces its
estimates for I/O costs.
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Correlation and disk layout
test=# explain SELECT count(*) FROM t_test WHERE id < 1000;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=75.35..75.36 rows=1 width=0)
-> Index Only Scan using idx_id on t_test
(cost=0.00..72.72 rows=1049 width=0)
Index Cond: (id < 1000)
(3 rows)
test=# explain SELECT count(*) FROM t_random WHERE id < 1000;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=950.31..950.32 rows=1 width=0)
-> Index Only Scan using idx_random on t_random
(cost=0.00..947.94 rows=947 width=0)
Index Cond: (id < 1000)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Implications:
- This is why different plans can pop up
EVEN if the data is the same
- There is no fixed amount of data making
PostgreSQL switch from index to
sequential scan
- High correlation can improve performance
=> consider clustering the table
test=# h CLUSTER
Command: CLUSTER
Description: cluster a table according to an index
Syntax:
CLUSTER [VERBOSE] table_name [ USING index_name ]
CLUSTER [VERBOSE]
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Using OR / AND:
- PostgreSQL can use more than one index per
table per query
- PostgreSQL provides multi-column indexes
- What you might see is a so called “Bitmap Scan”
=> don't mix it up with Oracle Bitmap Indexes
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Bitmap scans:
test=# explain SELECT * FROM t_test WHERE id = 2343 OR id = 423423;
QUERY PLAN
---------------------------------------------------------------------------
Bitmap Heap Scan on t_test (cost=9.44..17.41 rows=2 width=9)
Recheck Cond: ((id = 2343) OR (id = 423423))
-> BitmapOr (cost=9.44..9.44 rows=2 width=0)
-> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0)
Index Cond: (id = 2343)
-> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0)
Index Cond: (id = 423423)
(7 rows)
- PostgreSQL will scan the index twice
- PostgreSQL will look for blocks in the underlying table
- The condition has to be re-evaluated
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Bitmap scans:
test=# explain SELECT * FROM t_test WHERE id = 2343 AND name = 'josef';
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using idx_name on t_test (cost=0.00..8.27 rows=1 width=9)
Index Cond: (name = 'josef'::text)
Filter: (id = 2343)
(3 rows)
- PostgreSQL does not always use two indexes
when you have 2 quals
- The more selective index might be enough
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Multicolumn indexes:
test=# DROP INDEX idx_id;
DROP INDEX
test=# CREATE INDEX idx_combined ON t_test (id, name);
CREATE INDEX
test=# explain SELECT * FROM t_test WHERE id = 10;
QUERY PLAN
--------------------------------------------------------------------------------
Index Only Scan using idx_combined on t_test (cost=0.00..8.91 rows=1 width=9)
Index Cond: (id = 10)
(2 rows)
- PostgreSQL can use parts of those column IF they are
in the first part(s) of the index
- Imagine a phone book; it is just liked a combined index
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Many indexes or combined indexes?
- It depends on what you want to query
- If you always use the first conditions in the index
a combined index might be a good idea
- Many indexes are more flexible but maybe not perfect
- Sometimes a mixed-strategy can be useful
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
4. Indexes to provide order
- b-tress can be used for more than searching
- Binary trees provide you with order.
- Order helps to avoid repeated sorting.
test=# explain SELECT * FROM t_test ORDER BY id LIMIT 10;
QUERY PLAN
--------------------------------------------------------------------------------------
Limit (cost=0.00..0.31 rows=10 width=9)
-> Index Scan using idx_id on t_test (cost=0.00..131602.27 rows=4194304 width=9)
(2 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
5. Dealing with upper / lowercase
- Upper and lower case searches are common:
- If you want to do case-insensitive, don't use
a functional index
- Consider using “citext”
test=# CREATE EXTENSION citext;
CREATE EXTENSION
test=# SELECT 'ABC'::citext = 'abc'::citext;
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- PostgreSQL supports more than just btrees
- B-Trees are fine if you are interested in things
which can be sorted
- Try to sort polygons => you won't find them
- Geometric data and Full-Text-Search need
different algorithms
NOTE: This is not about, which index is faster.
This is about the correct ALGORITHM
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- Index types provided by PostgreSQL
- B-Trees
- Gist: Generalized Search Tree
- Gin: Generalized Inverted Index
- Sp-Gist: Space Partitioned Gist
- Hash
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- Indexes and algorithms
- B-Trees: numbers, text, dates, etc.
- Gist: Generalized Search Tree
- Gin: Generalized Inverted Index
- Sp-Gist: Space Partitioned Gist
- Hash
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. Gist indexes
- Gist operates on different principles
than btree
- it supports “contains”, “left of”, “overlaps”, etc.
- “contains”, etc. are good for
=> Full Text Search
=> Geometric operations (PostGIS, etc.)
=> Finding genome sequences
=> Handling ranges (time, etc.)
=> Fuzzy search
- Gist allows KNN-search
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. Gist indexes
- How it works internally ...
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. GIN indexes
- Gist is a so called inverted index
- Used for Full Text Search
- If you have 1 mio documents containing the word
“house”. Do you really want to have house inside
the index 1 mio times?
=> Binary tree for words
=> A document list for each word
=> Classical approach to text search
- FTS is not about “=”, it is about “contains”
=> forget btree
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. GIN indexes
- GIN internal workings:
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. SP-Gist indexes
- SP-Gist is a space partitioned index
- Can be used for a variety of algorithms, which use
space partitioning
=> quad trees
=> suffix trees
=> k-d trees
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. SP-Gist indexes
- Quad trees: A prototype example ...
- We want to insert ... (6, 4) and (2, 8)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Stemming:
- Before searching, it makes sense to perform
“stemming”
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car');
to_tsvector
-----------------------------------------
'better':5 'car':3,11 'mani':2 'one':10
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Stemming is language dependent:
- Stemming works nicely for “roman” languages
=> it is hard to do this for chinese and so on
test=# SELECT to_tsvector('english', 'i am'),
to_tsvector('german', 'i am'),
to_tsvector('dutch', 'i am');
to_tsvector | to_tsvector | to_tsvector
-------------+-------------+--------------
| 'i':1 | 'am':2 'i':1
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- “contains” is your friend:
- ts_query compares a search string with a so called
ts_vector:
test=# SELECT to_tsvector('english', 'having many cars is better
than to have just one car')
@@ to_tsquery('english', 'car');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- “contains” is your friend:
- ts_query compares a search string with a so called
ts_vector:
test=# SELECT to_tsvector('english', 'having many cars is better
than to have just one car')
@@ to_tsquery('english', 'car');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Indexing is easy:
- All you need is a functional index
- Alternatively the stemmed content can be
“materialized” in a separate column
CREATE INDEX idx_fti ON t_test
USING gist (to_tsvector('german', name));
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- ts_vector and ts_query magic
- PostgreSQL allows you to use “and” (&)
and “or” (|)
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car')
@@ to_tsquery('english', 'car & truck');
?column?
----------
f
(1 row)
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car')
@@ to_tsquery('english', '(car | truck) & many');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- A stupid question: What is a “word”?
- PostgreSQL is NOT limited to textual search
- Remember, it is all about “contains” ...
- Create yourself your own parser:
test=# h CREATE TEXT SEARCH PARSER
Command: CREATE TEXT SEARCH PARSER
Description: define a new text search parser
Syntax:
CREATE TEXT SEARCH PARSER name (
START = start_function ,
GETTOKEN = gettoken_function ,
END = end_function ,
LEXTYPES = lextypes_function
[, HEADLINE = headline_function ]
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Even more flexibility (2):
test=# h CREATE TEXT SEARCH CONFIGURATION
Command: CREATE TEXT SEARCH CONFIGURATION
Description: define a new text search configuration
Syntax:
CREATE TEXT SEARCH CONFIGURATION name (
PARSER = parser_name |
COPY = source_config
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Even more flexibility:
test=# h CREATE TEXT SEARCH DICTIONARY
Command: CREATE TEXT SEARCH DICTIONARY
Description: define a new text search dictionary
Syntax:
CREATE TEXT SEARCH DICTIONARY name (
TEMPLATE = template
[, option = value [, ... ]]
)
test=# h CREATE TEXT SEARCH TEMPLATE
Command: CREATE TEXT SEARCH TEMPLATE
Description: define a new text search template
Syntax:
CREATE TEXT SEARCH TEMPLATE name (
[ INIT = init_function , ]
LEXIZE = lexize_function
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- What does it take to organize a btree?
Operator Strategy number
< 1
<= 2
= 3
>= 4
< 5
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Why care?
- The way numbers are treated is pretty “common”
- How about sorting this one?
“2305 09 04 78”
“4353 07 06 77”
=> it seems the sort order is correct as shown
=> it isn't – it is an Austrian social security number
=> 1977 was before 1978 and not other way round
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Defining indexing strategies
- We can write our own operators
- Those operators can be assigned to an operator
class, which will tell the index how to “behave”
“2305 09 04 78”
“4353 07 06 77”
=> it seems the sort order is correct as shown
=> it isn't – it is an Austrian social security number
=> 1977 was before 1978 and not other way round
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Writing an operator (1):
test=# CREATE OR REPLACE FUNCTION normalize_si(text)
RETURNS text AS $$
BEGIN
RETURN substring($1, 9, 2) ||
substring($1, 7, 2) ||
substring($1, 5, 2) ||
substring($1, 1, 4);
END; $$
LANGUAGE 'plpgsql' IMMUTABLE;
CREATE FUNCTION
test=# SELECT normalize_si('2305090478');
normalize_si
--------------
7804092305
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Writing an operator (2):
test=# CREATE OR REPLACE FUNCTION si_lt(text, text)
RETURNS boolean AS
$$
BEGIN
RETURN normalize_si($1) < normalize_si($2);
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
test=# CREATE OPERATOR <# (
PROCEDURE=si_lt,
LEFTARG=text,
RIGHTARG=text);
CREATE OPERATOR
CREATE FUNCTION
test=# SELECT '2305090478'::text <# '4353070677'::text;
?column?
----------
f
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Creating the operator class:
- write operators for all operations needed
- write “support functions” (= “same”, etc.)
- make sure that the most important strategies
have proper operators
test=# h CREATE OPERATOR CLASS
Command: CREATE OPERATOR CLASS
Description: define a new operator class
Syntax:
CREATE OPERATOR CLASS name [ DEFAULT ] FOR TYPE data_type
USING index_method [ FAMILY family_name ] AS
{ OPERATOR strategy_number operator_name [ ( op_type, op_type ) ]
[ FOR SEARCH | FOR ORDER BY sort_family_name ]
| FUNCTION support_number [ ( op_type [ , op_type ] ) ]
function_name ( argument_type [, ...] )
| STORAGE storage_type
} [, ... ]
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- pg_trgm
- Trigrams are perfect to perform fuzzy matching
- Trigrams can be used nicely along with KNN-search
- pg_trgm is available as extension to PostgreSQL
test=# CREATE EXTENSION pg_trgm;
CREATE EXTENSION
- Problem: “What is the proper way to spell the name of this
village?
“gramatneusiedl” vs. “grammatneusiedel”?
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- Testing pg_trgm
test=# CREATE TABLE t_search AS
SELECT relname::text
FROM pg_class;
SELECT 303
test=# CREATE INDEX idx_trgm
ON t_search USING gist(relname gist_trgm_ops);
CREATE INDEX
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- Testing pg_trgm (2):
test=# SELECT *, 'pgclass' <-> relname
FROM t_search
ORDER BY 'pgclass' <-> relname
LIMIT 10;
relname | ?column?
--------------------------------+----------
pg_class | 0.454545
pg_opclass | 0.538462
pg_class_oid_index | 0.714286
pg_opclass_oid_index | 0.727273
pg_class_relname_nsp_index | 0.793103
pg_opclass_am_name_nsp_index | 0.8
pg_seclabel | 0.823529
pg_am | 0.833333
pg_seclabels | 0.833333
pg_shseclabel | 0.842105
(10 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- KNN in action:
test=# explain SELECT *, 'pgclass' <-> relname
FROM t_search
ORDER BY 'pgclass' <-> relname
LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=0.14..1.40 rows=10 width=19)
-> Index Scan using idx_trgm on t_search (cost=0.14..38.20 rows=303 width=19)
Order By: (relname <-> 'pgclass'::text)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
11. Traditional LIKE
- LIKE can be indexed in some cases:
- The PostgreSQL optimizer can rewrite queries featuring LIKE
in a fancy and efficient way
=> The goal is to find the “next character” in line
and query for a range
- This kind of rewrite only works when the next character
Is actually knows to PostgreSQL
- Special operator classes might be needed
=> varchar_pattern_ops, text_pattern_ops
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
11. Traditional LIKE
- An example:
test=# CREATE INDEX idx_relname
ON t_search (relname);
CREATE INDEX
test=# SET enable_seqscan TO off;
SET
test=# explain SELECT relname
FROM t_search
WHERE relname LIKE 'abc%';
QUERY PLAN
----------------------------------------------------------------------------------
Index Only Scan using idx_relname on t_search (cost=0.27..8.29 rows=1 width=19)
Index Cond: ((relname >= 'abc'::text) AND (relname < 'abd'::text))
Filter: (relname ~~ 'abc%'::text)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
12. Indexing MIN / MAX
- An example:
- MIN / MAX works by reading the index from left
and right (backward scan)
test=# explain SELECT min(relname), max(relname) FROM t_search;
QUERY PLAN
----------------------------------------------------------------------------------
Result (cost=0.74..0.75 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.27..0.37 rows=1 width=19)
-> Index Only Scan using idx_relname on t_search
(cost=0.27..29.57 rows=303 width=19)
Index Cond: (relname IS NOT NULL)
InitPlan 2 (returns $1)
-> Limit (cost=0.27..0.37 rows=1 width=19)
-> Index Only Scan Backward using idx_relname on
t_search t_search_1 (cost=0.27..29.57 rows=303 width=19)
Index Cond: (relname IS NOT NULL)
(9 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
Any question?
Thanks you for your attention
Any question?
Ad

More Related Content

What's hot (20)

Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuning
Federico Campoli
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
Command Prompt., Inc
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
EXEM
 
Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)
Noriyoshi Shinoda
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
PGConf APAC
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
MongoDB
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
Karwin Software Solutions LLC
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
Zalando Technology
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
DataStax
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
Amir Reza Hashemi
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internal
mysqlops
 
Deep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesDeep dive to PostgreSQL Indexes
Deep dive to PostgreSQL Indexes
Ibrar Ahmed
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
Georgi Sotirov
 
How the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksHow the Postgres Query Optimizer Works
How the Postgres Query Optimizer Works
EDB
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
Alexey Lesovsky
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
Databricks
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuning
Federico Campoli
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
EXEM
 
Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)Let's scale-out PostgreSQL using Citus (English)
Let's scale-out PostgreSQL using Citus (English)
Noriyoshi Shinoda
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
PGConf APAC
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
MongoDB
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
Zalando Technology
 
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
Advanced Cassandra Operations via JMX (Nate McCall, The Last Pickle) | C* Sum...
DataStax
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
Amir Reza Hashemi
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internal
mysqlops
 
Deep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesDeep dive to PostgreSQL Indexes
Deep dive to PostgreSQL Indexes
Ibrar Ahmed
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
Georgi Sotirov
 
How the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksHow the Postgres Query Optimizer Works
How the Postgres Query Optimizer Works
EDB
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
Alexey Lesovsky
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
Databricks
 

Viewers also liked (20)

Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
Jonathan Katz
 
Database index
Database indexDatabase index
Database index
Riteshkiit
 
Advanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL ServerAdvanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL Server
Confio Software
 
Geek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing BasicsGeek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing Basics
IDERA Software
 
Les11 Including Constraints
Les11 Including ConstraintsLes11 Including Constraints
Les11 Including Constraints
NETsolutions Asia: NSA – Thailand, Sripatum University: SPU
 
Indexing basics
Indexing basicsIndexing basics
Indexing basics
Sourabh Agarwal
 
Advanced User Privileges
Advanced User PrivilegesAdvanced User Privileges
Advanced User Privileges
Arena PLM
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
metsarin
 
Less07 Users
Less07 UsersLess07 Users
Less07 Users
vivaankumar
 
Writing optimal queries
Writing optimal queriesWriting optimal queries
Writing optimal queries
Sourabh Agarwal
 
Postgre sql unleashed
Postgre sql unleashedPostgre sql unleashed
Postgre sql unleashed
Marian Marinov
 
5min analyse
5min analyse5min analyse
5min analyse
Hans-Jürgen Schönig
 
PostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenPostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreiben
Hans-Jürgen Schönig
 
Walbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction logWalbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction log
Hans-Jürgen Schönig
 
Explain explain
Explain explainExplain explain
Explain explain
Hans-Jürgen Schönig
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
Hans-Jürgen Schönig
 
PostgreSQL: The NoSQL way
PostgreSQL: The NoSQL wayPostgreSQL: The NoSQL way
PostgreSQL: The NoSQL way
Hans-Jürgen Schönig
 
Constraints In Sql
Constraints In SqlConstraints In Sql
Constraints In Sql
Anurag
 
Indexes
IndexesIndexes
Indexes
Randy Riness @ South Puget Sound Community College
 
PostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityPostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database security
Hans-Jürgen Schönig
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
Jonathan Katz
 
Database index
Database indexDatabase index
Database index
Riteshkiit
 
Advanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL ServerAdvanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL Server
Confio Software
 
Geek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing BasicsGeek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing Basics
IDERA Software
 
Advanced User Privileges
Advanced User PrivilegesAdvanced User Privileges
Advanced User Privileges
Arena PLM
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
metsarin
 
PostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenPostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreiben
Hans-Jürgen Schönig
 
Walbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction logWalbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction log
Hans-Jürgen Schönig
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
Hans-Jürgen Schönig
 
Constraints In Sql
Constraints In SqlConstraints In Sql
Constraints In Sql
Anurag
 
PostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityPostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database security
Hans-Jürgen Schönig
 
Ad

Similar to PostgreSQL: Advanced indexing (20)

Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009
mattsmiley
 
Basic Query Tuning Primer
Basic Query Tuning PrimerBasic Query Tuning Primer
Basic Query Tuning Primer
Command Prompt., Inc
 
Checking clustering factor to detect row migration
Checking clustering factor to detect row migrationChecking clustering factor to detect row migration
Checking clustering factor to detect row migration
Heribertus Bramundito
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
Satoshi Nagayasu
 
Top 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tipsTop 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tips
Nirav Shah
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
Tomas Vondra
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Ontico
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
alexbrasetvik
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
Eric Xiao
 
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Dave Stokes
 
Chapter15
Chapter15Chapter15
Chapter15
gourab87
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_Cheatsheet
Lucian Oprea
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
Biju Nair
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
Fwdays
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdf
garos1
 
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source SummitMySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
Dave Stokes
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
paulguerin
 
Introduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalIntroduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-final
M Malai
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizer
Mydbops
 
Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009
mattsmiley
 
Checking clustering factor to detect row migration
Checking clustering factor to detect row migrationChecking clustering factor to detect row migration
Checking clustering factor to detect row migration
Heribertus Bramundito
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
Satoshi Nagayasu
 
Top 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tipsTop 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tips
Nirav Shah
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
Tomas Vondra
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Ontico
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
alexbrasetvik
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
Eric Xiao
 
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Dave Stokes
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
PostgresOpen
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_Cheatsheet
Lucian Oprea
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
Biju Nair
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
Fwdays
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdf
garos1
 
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source SummitMySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
Dave Stokes
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
paulguerin
 
Introduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalIntroduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-final
M Malai
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizer
Mydbops
 
Ad

Recently uploaded (20)

wAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptxwAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptx
SimonedeGijt
 
Wilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For WindowsWilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For Windows
Google
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025
Web Designer
 
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdfProtect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
株式会社クライム
 
[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts
Dimitrios Platis
 
Why Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card ProvidersWhy Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card Providers
Tapitag
 
What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?
HireME
 
Beyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraftBeyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraft
Dmitrii Ivanov
 
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by AjathMobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Ajath Infotech Technologies LLC
 
AI in Business Software: Smarter Systems or Hidden Risks?
AI in Business Software: Smarter Systems or Hidden Risks?AI in Business Software: Smarter Systems or Hidden Risks?
AI in Business Software: Smarter Systems or Hidden Risks?
Amara Nielson
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
How to Troubleshoot 9 Types of OutOfMemoryError
How to Troubleshoot 9 Types of OutOfMemoryErrorHow to Troubleshoot 9 Types of OutOfMemoryError
How to Troubleshoot 9 Types of OutOfMemoryError
Tier1 app
 
Solar-wind hybrid engery a system sustainable power
Solar-wind  hybrid engery a system sustainable powerSolar-wind  hybrid engery a system sustainable power
Solar-wind hybrid engery a system sustainable power
bhoomigowda12345
 
Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025
Phil Eaton
 
How to avoid IT Asset Management mistakes during implementation_PDF.pdf
How to avoid IT Asset Management mistakes during implementation_PDF.pdfHow to avoid IT Asset Management mistakes during implementation_PDF.pdf
How to avoid IT Asset Management mistakes during implementation_PDF.pdf
victordsane
 
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEMGDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
philipnathen82
 
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with PrometheusMeet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Eric D. Schabell
 
Buy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training techBuy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training tech
Rustici Software
 
Time Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project TechniquesTime Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project Techniques
Livetecs LLC
 
wAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptxwAIred_LearnWithOutAI_JCON_14052025.pptx
wAIred_LearnWithOutAI_JCON_14052025.pptx
SimonedeGijt
 
Wilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For WindowsWilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For Windows
Google
 
Best HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRMBest HR and Payroll Software in Bangladesh - accordHRM
Best HR and Payroll Software in Bangladesh - accordHRM
accordHRM
 
Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025Wilcom Embroidery Studio Crack Free Latest 2025
Wilcom Embroidery Studio Crack Free Latest 2025
Web Designer
 
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdfProtect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
Protect HPE VM Essentials using Veeam Agents-a50012338enw.pdf
株式会社クライム
 
[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts[gbgcpp] Let's get comfortable with concepts
[gbgcpp] Let's get comfortable with concepts
Dimitrios Platis
 
Why Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card ProvidersWhy Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card Providers
Tapitag
 
What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?What Do Candidates Really Think About AI-Powered Recruitment Tools?
What Do Candidates Really Think About AI-Powered Recruitment Tools?
HireME
 
Beyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraftBeyond the code. Complexity - 2025.05 - SwiftCraft
Beyond the code. Complexity - 2025.05 - SwiftCraft
Dmitrii Ivanov
 
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by AjathMobile Application Developer Dubai | Custom App Solutions by Ajath
Mobile Application Developer Dubai | Custom App Solutions by Ajath
Ajath Infotech Technologies LLC
 
AI in Business Software: Smarter Systems or Hidden Risks?
AI in Business Software: Smarter Systems or Hidden Risks?AI in Business Software: Smarter Systems or Hidden Risks?
AI in Business Software: Smarter Systems or Hidden Risks?
Amara Nielson
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
How to Troubleshoot 9 Types of OutOfMemoryError
How to Troubleshoot 9 Types of OutOfMemoryErrorHow to Troubleshoot 9 Types of OutOfMemoryError
How to Troubleshoot 9 Types of OutOfMemoryError
Tier1 app
 
Solar-wind hybrid engery a system sustainable power
Solar-wind  hybrid engery a system sustainable powerSolar-wind  hybrid engery a system sustainable power
Solar-wind hybrid engery a system sustainable power
bhoomigowda12345
 
Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025Memory Management and Leaks in Postgres from pgext.day 2025
Memory Management and Leaks in Postgres from pgext.day 2025
Phil Eaton
 
How to avoid IT Asset Management mistakes during implementation_PDF.pdf
How to avoid IT Asset Management mistakes during implementation_PDF.pdfHow to avoid IT Asset Management mistakes during implementation_PDF.pdf
How to avoid IT Asset Management mistakes during implementation_PDF.pdf
victordsane
 
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEMGDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
philipnathen82
 
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with PrometheusMeet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Meet the New Kid in the Sandbox - Integrating Visualization with Prometheus
Eric D. Schabell
 
Buy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training techBuy vs. Build: Unlocking the right path for your training tech
Buy vs. Build: Unlocking the right path for your training tech
Rustici Software
 
Time Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project TechniquesTime Estimation: Expert Tips & Proven Project Techniques
Time Estimation: Expert Tips & Proven Project Techniques
Livetecs LLC
 

PostgreSQL: Advanced indexing

  • 1. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de PostgreSQL Indexing Dublin, 2013 Hans-Jürgen Schönig
  • 2. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de Scope of this session: - What a basic index does - The PostgreSQL optimizer (cost model) - Classical B-tree Indexes - Partial / functional indexes - Different types of indexes - Full-Text-Search - Fuzzy matching - Writing your own indexing strategy
  • 3. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Generating test data: - for the purpose of this session we need a table consisting of two columns: test=# CREATE TABLE t_test (id serial, name text); CREATE TABLE test=# INSERT INTO t_test (name) VALUES ('hans'); INSERT 0 1 test=# INSERT INTO t_test (name) VALUES ('paul'); INSERT 0 1
  • 4. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A lot more test data ... - Let us create some more test data by repeating the process test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2 ... test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2097152
  • 5. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A lot more test data ... - Let us create some more test data by repeating the process test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2 ... test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2097152
  • 6. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Reading some data: - Let us see, how PostgreSQL executes a simple query: test=# SELECT count(*) FROM t_test; count --------- 4194304 (1 row) Time: 431.192 ms test=# explain analyze SELECT count(*) FROM t_test; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=977.865..977.865 rows=1 loops=1) -> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0) (actual time=0.013..531.448 rows=4194304 loops=1) Total runtime: 977.917 ms (3 rows) Time: 1045.065 ms
  • 7. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Reading some data: - Let us add a filter: test=# SELECT count(*) FROM t_test WHERE id = 421234; count ------- 1 (1 row) Time: 476.965 ms test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=495.134..495.135 rows=1 loops=1) -> Seq Scan on t_test (cost=0.00..75100.80 rows=1 width=0) (actual time=53.405..495.126 rows=1 loops=1) Filter: (id = 421234) Rows Removed by Filter: 4194303 Total runtime: 495.175 ms (5 rows) Time: 520.659 ms
  • 8. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Sequentially reading data: - In case you like reading the phone book sequentially we are basically done. - Sequentially reading the phone book is technically ok => but socially not accepted - Defining an index is the desired solution
  • 9. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Creating an index test=# h CREATE INDEX Command: CREATE INDEX Description: define a new index Syntax: CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table_name [ USING method ] ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ] [ TABLESPACE tablespace_name ] [ WHERE predicate ] - At the end of the day all clauses will be covered by this training
  • 10. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A typical index: test=# CREATE INDEX idx_id ON t_test (id); CREATE INDEX Time: 7357.663 ms - This gives us a standard btree index - PostgreSQL provides “High-Concurrency B-Trees” (Lehman-Yao, 1981) - Many people can modify the index at the same time - Highly efficient B+ tree
  • 11. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - How a btree works: 8k Root Node ... Sorted ... Forward chaining Tabelle Index 8k ... Row linp
  • 12. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Indexing is beneficial test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234; QUERY PLAN ------------------------------------------------------------------------------ Aggregate (cost=8.73..8.74 rows=1 width=0) (actual time=0.024..0.024 rows=1 loops=1) -> Index Only Scan using idx_id on t_test (cost=0.00..8.73 rows=1 width=0) (actual time=0.019..0.020 rows=1 loops=1) Index Cond: (id = 421234) Heap Fetches: 1 Total runtime: 0.057 ms (5 rows) Time: 0.395 ms - A lot faster :).
  • 13. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Still slow ... test=# SELECT count(*) FROM t_test WHERE name = 'hans'; count --------- 2097152 (1 row) Time: 787.407 ms - This is still slow. Let us create an index ... test=# CREATE INDEX idx_name ON t_test (name); CREATE INDEX
  • 14. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - The benefit is exactly zero: test=# SELECT count(*) FROM t_test WHERE name = 'hans'; count --------- 2097152 (1 row) Time: 782.443 ms test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) - The index won't be used - Too many identical values (“not selective”)
  • 15. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - The cost is far from zero: test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 177 MB (1 row) test=# SELECT pg_size_pretty(pg_relation_size('idx_id')); pg_size_pretty ---------------- 90 MB (1 row) test=# SELECT pg_size_pretty(pg_relation_size('idx_name')); pg_size_pretty ---------------- 90 MB (1 row) - Indexes need a fair amount of space
  • 16. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Input values DO make a difference: test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2'; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=7.74..7.75 rows=1 width=0) -> Index Only Scan using idx_name on t_test (cost=0.00..7.74 rows=1 width=0) Index Cond: (name = 'hans2'::text) (3 rows) - PostgreSQL will decide depending on the input value => cost based optimization
  • 17. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Partial indexes: - In our example the index is only used in case of rare or non-existing values - What is the point of an index when its entire content is totally useless? => a more selective strategy is needed
  • 18. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Partial indexes: test=# DROP INDEX idx_name; DROP INDEX test=# CREATE INDEX idx_name ON t_test (name) WHERE name NOT IN ('hans', 'paul'); CREATE INDEX test=# SELECT pg_size_pretty(pg_relation_size('idx_name')); pg_size_pretty ---------------- 8192 bytes (1 row) - A partial index reduces space consumption - Benefit is still the same
  • 19. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Equal benefit – lower cost: test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2'; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=7.28..7.29 rows=1 width=0) -> Index Only Scan using idx_name on t_test (cost=0.00..7.28 rows=1 width=0) Index Cond: (name = 'hans2'::text) (3 rows) - This is exactly the same as before !
  • 20. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - What about functions? test=# CREATE INDEX idx_cos ON t_test ( cos(id) ); CREATE INDEX Time: 16867.228 ms test=# explain SELECT count(*) FROM t_test WHERE cos(id) = 17; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=23960.99..23961.00 rows=1 width=0) -> Bitmap Heap Scan on t_test (cost=395.25..23908.56 rows=20972 width=0) Recheck Cond: (cos((id)::double precision) = 17::double precision) -> Bitmap Index Scan on idx_cos (cost=0.00..390.01 rows=20972 width=0) Index Cond: (cos((id)::double precision) = 17::double precision) (5 rows) - PostgreSQL provides functional indexes - VERY nice to avoid additional columns - Gives a lot of extra flexibility
  • 21. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Type of functions allowed - Functions must be deterministic => “immutable” => Functions can be written in almost any language => This is highly performance sensitive
  • 22. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - How does PostgreSQL decide on index vs. no index? - PostgreSQL uses statistics to estimate the number of rows coming back - Each operation will be assigned to costs => costs are just a number to compare different options inside the planner - Costs parameters can be changed at runtime or globally => be careful, it can go against you
  • 23. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - pg_stats is your friend: test=# d pg_stats View "pg_catalog.pg_stats" Column | Type | Modifiers -------------------------------+-----------+----------- schemaname | name | tablename | name | attname | name | inherited | boolean | null_frac | real | avg_width | integer | n_distinct | real | most_common_vals | anyarray | most_common_freqs | real[] | histogram_bounds | anyarray | correlation | real | most_common_elems | anyarray | most_common_elem_freqs | real[] | elem_count_histogram | real[] |
  • 24. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Updating statistics - System statistics are updated by ANALYZE: test=# h ANALYZE Command: ANALYZE Description: collect statistics about a database Syntax: ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ] - In most setups autovacuum is in charge of updating pg_statistic - In most cases statistics are not an issue
  • 25. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - How does PostgreSQL estimate costs? - seq_page_cost = 1 - random_page_cost = 4 - cpu_tuple_cost = 0.01 - cpu_operator_cost = 0.0025 - cpu_index_tuple_cost = 0.005
  • 26. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (1): test=# explain SELECT count(*) FROM t_test; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0) (2 rows) - total costs are at 75100.81 - costs are composed of I/O and CPU costs
  • 27. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (2): test=# SELECT pg_relation_size('t_test') / 8192; ?column? ---------- 22672 (1 row) - our table consists of 22672 blocks - each block is 8kb in size
  • 28. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (3): The seq scan: I/O cost = 22672 * seq_page_cost = 22672 4.194.304 * cpu_tuple_cost = 41943.04 = 64615.04 for the seq scan The aggregate: 4.194.304 * cpu_operator_cost = 10485.76 Total costs => 75.100.80 + cpu_operator_cost (we have to display the tuple)
  • 29. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Inflation at work: test=# SET seq_page_cost TO 10; SET test=# explain SELECT count(*) FROM t_test; QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=279148.80..279148.81 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..268663.04 rows=4194304 width=0) (2 rows) - Costs can be changed at runtime to fine tune index usage => only do this if you are fully aware of what you are doing. It can have unintended side effects
  • 30. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Spinning disks vs. SSDs - Traditional disks are fast sequentially and pretty bad when doing random I/O - SSDs fixed the problem. => consider changing random_page_cost
  • 31. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Abusing tablespaces: test=# ALTER TABLESPACE pg_default SET (random_page_cost = 1); ALTER TABLESPACE - Allows different cost settings for various disk subsystems - It also allows to split “cached” and “uncached” data -> ugly but useful
  • 32. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Correlation and disk layout test=# CREATE TABLE t_random AS SELECT * FROM t_test ORDER BY random(); SELECT 4194304 test=# CREATE INDEX idx_random ON t_random(id); CREATE INDEX test=# ANALYZE t_random; ANALYZE - The PostgreSQL optimizer considers the physical order of rows on disk - High-correlation will make indexes ways more likely as the optimizer reduces its estimates for I/O costs.
  • 33. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Correlation and disk layout test=# explain SELECT count(*) FROM t_test WHERE id < 1000; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=75.35..75.36 rows=1 width=0) -> Index Only Scan using idx_id on t_test (cost=0.00..72.72 rows=1049 width=0) Index Cond: (id < 1000) (3 rows) test=# explain SELECT count(*) FROM t_random WHERE id < 1000; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=950.31..950.32 rows=1 width=0) -> Index Only Scan using idx_random on t_random (cost=0.00..947.94 rows=947 width=0) Index Cond: (id < 1000) (3 rows)
  • 34. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Implications: - This is why different plans can pop up EVEN if the data is the same - There is no fixed amount of data making PostgreSQL switch from index to sequential scan - High correlation can improve performance => consider clustering the table test=# h CLUSTER Command: CLUSTER Description: cluster a table according to an index Syntax: CLUSTER [VERBOSE] table_name [ USING index_name ] CLUSTER [VERBOSE]
  • 35. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Using OR / AND: - PostgreSQL can use more than one index per table per query - PostgreSQL provides multi-column indexes - What you might see is a so called “Bitmap Scan” => don't mix it up with Oracle Bitmap Indexes
  • 36. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Bitmap scans: test=# explain SELECT * FROM t_test WHERE id = 2343 OR id = 423423; QUERY PLAN --------------------------------------------------------------------------- Bitmap Heap Scan on t_test (cost=9.44..17.41 rows=2 width=9) Recheck Cond: ((id = 2343) OR (id = 423423)) -> BitmapOr (cost=9.44..9.44 rows=2 width=0) -> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0) Index Cond: (id = 2343) -> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0) Index Cond: (id = 423423) (7 rows) - PostgreSQL will scan the index twice - PostgreSQL will look for blocks in the underlying table - The condition has to be re-evaluated
  • 37. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Bitmap scans: test=# explain SELECT * FROM t_test WHERE id = 2343 AND name = 'josef'; QUERY PLAN ----------------------------------------------------------------------- Index Scan using idx_name on t_test (cost=0.00..8.27 rows=1 width=9) Index Cond: (name = 'josef'::text) Filter: (id = 2343) (3 rows) - PostgreSQL does not always use two indexes when you have 2 quals - The more selective index might be enough
  • 38. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Multicolumn indexes: test=# DROP INDEX idx_id; DROP INDEX test=# CREATE INDEX idx_combined ON t_test (id, name); CREATE INDEX test=# explain SELECT * FROM t_test WHERE id = 10; QUERY PLAN -------------------------------------------------------------------------------- Index Only Scan using idx_combined on t_test (cost=0.00..8.91 rows=1 width=9) Index Cond: (id = 10) (2 rows) - PostgreSQL can use parts of those column IF they are in the first part(s) of the index - Imagine a phone book; it is just liked a combined index
  • 39. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Many indexes or combined indexes? - It depends on what you want to query - If you always use the first conditions in the index a combined index might be a good idea - Many indexes are more flexible but maybe not perfect - Sometimes a mixed-strategy can be useful
  • 40. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 4. Indexes to provide order - b-tress can be used for more than searching - Binary trees provide you with order. - Order helps to avoid repeated sorting. test=# explain SELECT * FROM t_test ORDER BY id LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------- Limit (cost=0.00..0.31 rows=10 width=9) -> Index Scan using idx_id on t_test (cost=0.00..131602.27 rows=4194304 width=9) (2 rows)
  • 41. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 5. Dealing with upper / lowercase - Upper and lower case searches are common: - If you want to do case-insensitive, don't use a functional index - Consider using “citext” test=# CREATE EXTENSION citext; CREATE EXTENSION test=# SELECT 'ABC'::citext = 'abc'::citext; ?column? ---------- t (1 row)
  • 42. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - PostgreSQL supports more than just btrees - B-Trees are fine if you are interested in things which can be sorted - Try to sort polygons => you won't find them - Geometric data and Full-Text-Search need different algorithms NOTE: This is not about, which index is faster. This is about the correct ALGORITHM
  • 43. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - Index types provided by PostgreSQL - B-Trees - Gist: Generalized Search Tree - Gin: Generalized Inverted Index - Sp-Gist: Space Partitioned Gist - Hash
  • 44. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - Indexes and algorithms - B-Trees: numbers, text, dates, etc. - Gist: Generalized Search Tree - Gin: Generalized Inverted Index - Sp-Gist: Space Partitioned Gist - Hash
  • 45. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. Gist indexes - Gist operates on different principles than btree - it supports “contains”, “left of”, “overlaps”, etc. - “contains”, etc. are good for => Full Text Search => Geometric operations (PostGIS, etc.) => Finding genome sequences => Handling ranges (time, etc.) => Fuzzy search - Gist allows KNN-search
  • 46. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. Gist indexes - How it works internally ...
  • 47. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. GIN indexes - Gist is a so called inverted index - Used for Full Text Search - If you have 1 mio documents containing the word “house”. Do you really want to have house inside the index 1 mio times? => Binary tree for words => A document list for each word => Classical approach to text search - FTS is not about “=”, it is about “contains” => forget btree
  • 48. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. GIN indexes - GIN internal workings:
  • 49. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. SP-Gist indexes - SP-Gist is a space partitioned index - Can be used for a variety of algorithms, which use space partitioning => quad trees => suffix trees => k-d trees
  • 50. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. SP-Gist indexes - Quad trees: A prototype example ... - We want to insert ... (6, 4) and (2, 8)
  • 51. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Stemming: - Before searching, it makes sense to perform “stemming” test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car'); to_tsvector ----------------------------------------- 'better':5 'car':3,11 'mani':2 'one':10 (1 row)
  • 52. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Stemming is language dependent: - Stemming works nicely for “roman” languages => it is hard to do this for chinese and so on test=# SELECT to_tsvector('english', 'i am'), to_tsvector('german', 'i am'), to_tsvector('dutch', 'i am'); to_tsvector | to_tsvector | to_tsvector -------------+-------------+-------------- | 'i':1 | 'am':2 'i':1 (1 row)
  • 53. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - “contains” is your friend: - ts_query compares a search string with a so called ts_vector: test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car'); ?column? ---------- t (1 row)
  • 54. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - “contains” is your friend: - ts_query compares a search string with a so called ts_vector: test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car'); ?column? ---------- t (1 row)
  • 55. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Indexing is easy: - All you need is a functional index - Alternatively the stemmed content can be “materialized” in a separate column CREATE INDEX idx_fti ON t_test USING gist (to_tsvector('german', name));
  • 56. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - ts_vector and ts_query magic - PostgreSQL allows you to use “and” (&) and “or” (|) test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car & truck'); ?column? ---------- f (1 row) test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', '(car | truck) & many'); ?column? ---------- t (1 row)
  • 57. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - A stupid question: What is a “word”? - PostgreSQL is NOT limited to textual search - Remember, it is all about “contains” ... - Create yourself your own parser: test=# h CREATE TEXT SEARCH PARSER Command: CREATE TEXT SEARCH PARSER Description: define a new text search parser Syntax: CREATE TEXT SEARCH PARSER name ( START = start_function , GETTOKEN = gettoken_function , END = end_function , LEXTYPES = lextypes_function [, HEADLINE = headline_function ] )
  • 58. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Even more flexibility (2): test=# h CREATE TEXT SEARCH CONFIGURATION Command: CREATE TEXT SEARCH CONFIGURATION Description: define a new text search configuration Syntax: CREATE TEXT SEARCH CONFIGURATION name ( PARSER = parser_name | COPY = source_config )
  • 59. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Even more flexibility: test=# h CREATE TEXT SEARCH DICTIONARY Command: CREATE TEXT SEARCH DICTIONARY Description: define a new text search dictionary Syntax: CREATE TEXT SEARCH DICTIONARY name ( TEMPLATE = template [, option = value [, ... ]] ) test=# h CREATE TEXT SEARCH TEMPLATE Command: CREATE TEXT SEARCH TEMPLATE Description: define a new text search template Syntax: CREATE TEXT SEARCH TEMPLATE name ( [ INIT = init_function , ] LEXIZE = lexize_function )
  • 60. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - What does it take to organize a btree? Operator Strategy number < 1 <= 2 = 3 >= 4 < 5
  • 61. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Why care? - The way numbers are treated is pretty “common” - How about sorting this one? “2305 09 04 78” “4353 07 06 77” => it seems the sort order is correct as shown => it isn't – it is an Austrian social security number => 1977 was before 1978 and not other way round
  • 62. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Defining indexing strategies - We can write our own operators - Those operators can be assigned to an operator class, which will tell the index how to “behave” “2305 09 04 78” “4353 07 06 77” => it seems the sort order is correct as shown => it isn't – it is an Austrian social security number => 1977 was before 1978 and not other way round
  • 63. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Writing an operator (1): test=# CREATE OR REPLACE FUNCTION normalize_si(text) RETURNS text AS $$ BEGIN RETURN substring($1, 9, 2) || substring($1, 7, 2) || substring($1, 5, 2) || substring($1, 1, 4); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; CREATE FUNCTION test=# SELECT normalize_si('2305090478'); normalize_si -------------- 7804092305 (1 row)
  • 64. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Writing an operator (2): test=# CREATE OR REPLACE FUNCTION si_lt(text, text) RETURNS boolean AS $$ BEGIN RETURN normalize_si($1) < normalize_si($2); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; test=# CREATE OPERATOR <# ( PROCEDURE=si_lt, LEFTARG=text, RIGHTARG=text); CREATE OPERATOR CREATE FUNCTION test=# SELECT '2305090478'::text <# '4353070677'::text; ?column? ---------- f (1 row)
  • 65. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Creating the operator class: - write operators for all operations needed - write “support functions” (= “same”, etc.) - make sure that the most important strategies have proper operators test=# h CREATE OPERATOR CLASS Command: CREATE OPERATOR CLASS Description: define a new operator class Syntax: CREATE OPERATOR CLASS name [ DEFAULT ] FOR TYPE data_type USING index_method [ FAMILY family_name ] AS { OPERATOR strategy_number operator_name [ ( op_type, op_type ) ] [ FOR SEARCH | FOR ORDER BY sort_family_name ] | FUNCTION support_number [ ( op_type [ , op_type ] ) ] function_name ( argument_type [, ...] ) | STORAGE storage_type } [, ... ]
  • 66. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - pg_trgm - Trigrams are perfect to perform fuzzy matching - Trigrams can be used nicely along with KNN-search - pg_trgm is available as extension to PostgreSQL test=# CREATE EXTENSION pg_trgm; CREATE EXTENSION - Problem: “What is the proper way to spell the name of this village? “gramatneusiedl” vs. “grammatneusiedel”?
  • 67. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - Testing pg_trgm test=# CREATE TABLE t_search AS SELECT relname::text FROM pg_class; SELECT 303 test=# CREATE INDEX idx_trgm ON t_search USING gist(relname gist_trgm_ops); CREATE INDEX
  • 68. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - Testing pg_trgm (2): test=# SELECT *, 'pgclass' <-> relname FROM t_search ORDER BY 'pgclass' <-> relname LIMIT 10; relname | ?column? --------------------------------+---------- pg_class | 0.454545 pg_opclass | 0.538462 pg_class_oid_index | 0.714286 pg_opclass_oid_index | 0.727273 pg_class_relname_nsp_index | 0.793103 pg_opclass_am_name_nsp_index | 0.8 pg_seclabel | 0.823529 pg_am | 0.833333 pg_seclabels | 0.833333 pg_shseclabel | 0.842105 (10 rows)
  • 69. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - KNN in action: test=# explain SELECT *, 'pgclass' <-> relname FROM t_search ORDER BY 'pgclass' <-> relname LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------- Limit (cost=0.14..1.40 rows=10 width=19) -> Index Scan using idx_trgm on t_search (cost=0.14..38.20 rows=303 width=19) Order By: (relname <-> 'pgclass'::text) (3 rows)
  • 70. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 11. Traditional LIKE - LIKE can be indexed in some cases: - The PostgreSQL optimizer can rewrite queries featuring LIKE in a fancy and efficient way => The goal is to find the “next character” in line and query for a range - This kind of rewrite only works when the next character Is actually knows to PostgreSQL - Special operator classes might be needed => varchar_pattern_ops, text_pattern_ops
  • 71. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 11. Traditional LIKE - An example: test=# CREATE INDEX idx_relname ON t_search (relname); CREATE INDEX test=# SET enable_seqscan TO off; SET test=# explain SELECT relname FROM t_search WHERE relname LIKE 'abc%'; QUERY PLAN ---------------------------------------------------------------------------------- Index Only Scan using idx_relname on t_search (cost=0.27..8.29 rows=1 width=19) Index Cond: ((relname >= 'abc'::text) AND (relname < 'abd'::text)) Filter: (relname ~~ 'abc%'::text) (3 rows)
  • 72. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 12. Indexing MIN / MAX - An example: - MIN / MAX works by reading the index from left and right (backward scan) test=# explain SELECT min(relname), max(relname) FROM t_search; QUERY PLAN ---------------------------------------------------------------------------------- Result (cost=0.74..0.75 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.27..0.37 rows=1 width=19) -> Index Only Scan using idx_relname on t_search (cost=0.27..29.57 rows=303 width=19) Index Cond: (relname IS NOT NULL) InitPlan 2 (returns $1) -> Limit (cost=0.27..0.37 rows=1 width=19) -> Index Only Scan Backward using idx_relname on t_search t_search_1 (cost=0.27..29.57 rows=303 width=19) Index Cond: (relname IS NOT NULL) (9 rows)
  • 73. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de Any question? Thanks you for your attention Any question?
  翻译: