An atypical approach to SQL query optimization targeting an Oracle database
The first in a series under the banner "Feedback on database administration", this article already published in French on 12-25-2024, aims to raise awareness of the importance of the SQL query optimization step in the process of implementing a relational database.
It is naturally aimed at DB administrators and developers but is also very instructive for project managers and IT decision-makers, as actors who guarantee rigorous compliance with processes.
Finally, it helps to sharpen the perception of the nature of problems related to computer software for anyone with a certain culture in the field of computer systems.
When implementing a relational database, the SQL query optimization step that should be part of the development phase is too often deadlocked for a variety of reasons.
Among these,
This observation allows us to better understand the origin of the disastrous performance of certain applications using data from a relational database.
During a recent Oracle database optimization campaign, an AWR report highlighted SQL queries that were visibly "toxic" in a transactional environment, as shown below:
It should be noted that this report, which covers a one-hour monitoring period, reflects an exclusively transactional activity.
In this first presentation, we focus on the first request, the main characteristics of which are as follows:
These characteristics alone in a transactional environment justify any attempt to improve the processing of our request, the original text of which is as follows:
A sketch of the logic sub-model hosting all the tables referenced in this text is as follows:
The urgency of optimization becomes evident when we look at the execution plan proposed by Oracle's optimizer:
The runtime cost of 1080711 displayed by the RDBMS reflects a very high level of consumption of the resources required to process the request.
This cost is the result of the combination of two full scans of the T_IMPOT table, whose cardinality here is 30609112, with a sorting induced by the set operator UNION and carried out on 14645501 lines.
If we consider that these figures will continue to grow over time, the hypothesis of a gradual deterioration in the performance of a system already on the verge of asphyxiation becomes a certainty.
Our challenge: How can we proceed to reduce and stabilize this obviously prohibitive processing cost, with a view to significantly reducing the load related to the consumption of DB server resources?
The small disk footprint of the newly-created index actually helped us to achieve a slightly less costly execution plan. However, the proportions of the gain obtained did not live up to our expectations.
This reformulation inspired an optimization strategy, resolutely oriented towards avoiding the Gordian knot of two full scans through the voluminous T_IMPOT table.
The idea is to pre-qualify tax types in real-time. In more explicit terms, when a new tax identifier enters one of the two tables T_IMPOT_DOC_DEPOT or T_DOC following an insertion, a marking of the corresponding tax type is carried out in real time in one of the two new columns provided for this purpose, within table T_IMPOT_TYPE.
Having added the two pseudo-calculated columns required for the benefit of table T_IMPOT_TYPE, namely IN_T_IMPOT_DOC_DEPOT and IN_T_DOC respectively, we rewrote the text of the SQL query as follows:
Recommended by LinkedIn
And hold on tight!
As a result of this rewrite, the cost of processing the query has dropped from 1080711 to 5, as illustrated by the following new execution plan:
In our test environment, running each of the two versions of the query brought back the same dataset, consisting of 178 rows.
The spectacular cost difference was reflected in execution times of 17.134 seconds for the original query and 17 milliseconds for the rewritten query, a ratio of 1:1000.
It should be noted, however, that the impact will be significantly magnified in a production environment, where several queries run in parallel.
The new version of the query should now override the original version in the application code.
1. Add two additional columns to the T_IMPOT_TYPE table, namely: IN_T_IMPOT_DOC_DEPOT and IN_T_DOC,
The “INVISIBLE” column option, which dates to Oracle version 12c, keeps the application running smoothly despite structural changes to the T_IMPOT_TYPE table.
2. Manually populate the two added columns, using the following update queries:
3. Create an insert/update trigger attached to the T_IMPOT_DOC_DEPOT table, for the subsequent automatic feeding of column IN_T_IMPOT_DOC_DEPOT in T_IMPOT_TYPE table,
4. Create a delete/update trigger attached to the T_IMPOT_DOC_DEPOT table, for the IN_T_IMPOT_DOC_DEPOT column in T_IMPOT_TYPE table to update automatically,
5. Create an insert/update trigger attached to the T_DOC table, for the subsequent automatic feeding of column IN_T_DOC in T_IMPOT_TYPE table,
6. Create a delete/update trigger attached to the T_DOC table, for the subsequent automatic update of column IN_T_DOC in T_IMPOT_TYPE table,
7. Replace the original request in the application code with the new rewritten one.
8. Plan a daily task of readjustment, with a view to making up for any failures in the operation of the triggers, due to maintenance or other specific operation/constraint, likely to short-circuit the trigger mechanism.
The task could be in the form of a shell script, as shown below:
Note that all steps are ideally handled by a DBA profile, except for step 7 which requires the collaboration of a team of developers or, in the worst case, the software publisher.
The proposed solution is obviously spectacular, but its implementation remains highly intrusive. It requires a modification of the application code that the software publisher is generally unwilling to make. It is certain that a rigorous development phase, with an uncompromising query optimization stage, would have encouraged the scrapping of a non-scalable execution plan, likely to deal a fatal blow to the ability to sustain a possible increase in load. It is therefore very clear that developers should not underestimate the step of optimizing SQL queries in the context of their business, at the risk of prematurely mortgaging the quality of the service provided by the software.
Conseiller en Communication et Promotion de la santé chez Organisation mondiale de la Santé
4moWell done dear Bro
Intégrateur et Administrateur Système et Deploiement
4moThanks M. Nsondé for sharing your feedback with us. I really appreciate. At the beginning of the article, you must put 25-12-2024 instead of 25-12-2025.