Messing with PostgreSQL: b-tree indexes, likes and operator classes
I can do a lot with JPA. Working with database is not a hard thing in Java environment. Java EE has a pretty easy standard when it comes to working with data, Spring makes configuration even easier and Hibernate takes care of almost every problem with already created and tested solutions. Although lately I started to peek at the possible optimization. Since I have came to as little Entity Manager calls as possible, there is not much to do in the code. So I decided it is, finally, time to dive deeper into databases and their magic world of indexes, analyzing, likes, trigrams, etc.
I am used to work with PostgreSQL RDMS, so this post will be mostly in its dialect.
If you want your database to work faster - index its appopriate fields. Basic knowledge. But... why? How does the indexing work? Those are my new questions. And to get some answers, since lately I started to work with Spring Boot (literally just searched to see, how correctly write it :o ) and Spring framework in general, I decided to create new database called opttest with user opttest and small java project.
It is small project hosted on my Github account. It is not clean or anything I might be proud of, but it serves one easy purpose - fill database with semi-random data. Based on two .csv files I found on the internate, it parses them and creates whatever amount of customers I want, by simply adding randomly chosen name from list of names (first_name) and randomly chosen word from dictionary (last_name). With that I was able to quickly fill up my database opttest with a lot of data.
With such a preparation, let's see about those indexes.
So this is the database I will work with. Also some sample data. I decided to work on 10000 records at first.
Let's do some fetching, shall we?
To "debug" my queries I am using EXPLAIN ANALYZE. It shows me planning time, which is the time PostgreSQL takes to choose strategy of obtaining data and execution time, which is actual time of retrieving data. So in both cases the time it takes is high. More then 2500 ms? No way. So what to do in this case? Yep, let's check out an index. In this case, B-tree might be suitable.
B-tree is common index and is used to execute searching of equal value or left-anchored like/alike faster (the beginning of the word known, like 'word%'). Indeed, good for auto-completion mixed with 3 character limit. In general, indexes work like tables of contents in books. It keeps data ordered, allowing for much faster retrieving but... every update will be doubled, since it will update row and also an index. So it's doubled edged sword:
Advantages:
I am used to work with PostgreSQL RDMS, so this post will be mostly in its dialect.
What I already knew?
If you want your database to work faster - index its appopriate fields. Basic knowledge. But... why? How does the indexing work? Those are my new questions. And to get some answers, since lately I started to work with Spring Boot (literally just searched to see, how correctly write it :o ) and Spring framework in general, I decided to create new database called opttest with user opttest and small java project.
Sandbox_db
It is small project hosted on my Github account. It is not clean or anything I might be proud of, but it serves one easy purpose - fill database with semi-random data. Based on two .csv files I found on the internate, it parses them and creates whatever amount of customers I want, by simply adding randomly chosen name from list of names (first_name) and randomly chosen word from dictionary (last_name). With that I was able to quickly fill up my database opttest with a lot of data.
With such a preparation, let's see about those indexes.
Messing with PostgreSQL
So this is the database I will work with. Also some sample data. I decided to work on 10000 records at first.
Let's do some fetching, shall we?
To "debug" my queries I am using EXPLAIN ANALYZE. It shows me planning time, which is the time PostgreSQL takes to choose strategy of obtaining data and execution time, which is actual time of retrieving data. So in both cases the time it takes is high. More then 2500 ms? No way. So what to do in this case? Yep, let's check out an index. In this case, B-tree might be suitable.
B-tree is common index and is used to execute searching of equal value or left-anchored like/alike faster (the beginning of the word known, like 'word%'). Indeed, good for auto-completion mixed with 3 character limit. In general, indexes work like tables of contents in books. It keeps data ordered, allowing for much faster retrieving but... every update will be doubled, since it will update row and also an index. So it's doubled edged sword:
Advantages:
- much faster data retrieving
- slower updates
- takes disk space
So what I did is creating an index on column first_name, then searching for 'Christopher'. And the result is... awesome! Time of 0.7 ms compared to more than 2.5 ms? I'll take it. And as I said earlier it will also improve time of left-anchored like queries as seen... Ops. Nope. I was really surprised when I saw the result. I googled quite a bit to see, why the documentation is lying and... it seems it is not. It's the locale problem. To make B-tree index correctly I have to do it the other way.
So what I did here? Specified the index operator class. With varchar column 'first_name' I set index to with operator class varchar_pattern_ops. It is clearly visible, now both searching are much, much faster!
Database after creating index:
Just for fun, I decided to see the impact on the database consisting of 100000 rows.
And after creating an index:
Yeah, 18 s... Differences is huge. And it clearly shows, how indexes are important.
Summary
So in this post I have proved the obvious - indexes are great at making performance of a database much smoother. But I consider it a good experience with debugging queries, getting to known operator classes and that like/alike does not always work with b-tree indexing.
Komentarze
Prześlij komentarz