In this article, we will explore how cost evaluation plays an important role in SQL query optimization. We will also discuss some of the cost components that are used by the optimizer to evaluate queries and find the best query plan.
Table of Contents
The main goal of query optimization is to choose the most efficient & effective plan at the lowest possible cost to implement the relational algebra operations.
In other words, optimizers have a goal to choose the most optimal execution plan for a SQL statement (query) at the lowest cost among all considered plans.
Points to be remembered:
- The main aim of query optimization is to choose the most efficient way of implementing the relational algebra operations at the lowest possible cost.
- The query optimizer should not depend solely on heuristic rules, but, it should also estimate the cost of executing the different strategies and find out the strategy with the minimum cost estimate.
- The cost functions used in query optimization are estimates and not exact cost functions.
- The cost of an operation is heavily dependent on its selectivity, that is the proportion of select operation(s) that forms the output.
- In general, the different algorithms are suitable for low or high-selectivity queries.
- In order for the query optimizer to choose a suitable algorithm for an operation an estimate of the cost of executing that algorithm must be provided.
- The cost of an algorithm depends on the cardinality of its input.
- To estimate the cost of different query execution strategies, the query tree is viewed as containing a series of basic operations which are linked in order to perform the query.
- It is also important to know the expected cardinality of an operation’s output because this forms the input to the next operation.
Cost Component of Query Evaluation
The following are the cost components of the execution of a query-
- Access cost to secondary storage
It could be the price of reading, searching, or writing a block of information initially on secondary storage, specifically on the disk. The cost of looking at records within the file also depends on the kind of access structure the file is equipped with.
- Storage cost
It is also the price of the storage of any intermediate files(files that are the result of processing the input but aren’t exactly the outcome) created by the execution strategy used for the query.
- Computation cost
This is the cost of performing the memory operations that are available on the record within the data buffers. Operations like searching for records, merging records, or sorting records. This can also be called the CPU cost.
- Memory usage cost
The cost of memory usage can be calculated simply by using the number of memory buffers needed to execute the query.
- Communication cost
This is the expense of transmitting or communicating the query results from one location to the next. This also covers the cost of transferring the results and table to different sites in evaluating the query.
Important of Access Cost
- Out of the above five cost components, the most important is the secondary storage access cost.
- The focus on cost minimization depends on the size and type of database applications.
- For example in a smaller database, the focus is on minimizing computing costs because most of the data in the files involved in the query can be completely stored in the main memory.
- For a large database, the main emphasis is on minimizing the access cost to a secondary device.
- For distributed databases, the communication cost is minimized because many sites are involved in the data transfer.
- To estimate the cost of various execution strategies, we must keep track of any information that is needed for the cost function.
- This information may be stored in the database catalog, where it is accessed by the query optimizer.
Measures of Query Cost in DBMS
The Query Cost is a cost that the enhancer considers for how long your query will take (comparative with the absolute clump time). The analyzer then tries to select the most effective query strategy by looking at your query and your data, testing various execution plans, and deciding on the least expensive of them.
The measurement of cost for queries in DBMS can be achieved through the creation of a framework that allows for a variety of designs to be made to answer an inquiry. It’s usually done by contrasting each possible arrangement with the estimated cost. To calculate how much the deal will cost net an account, each of the activities within an agreement must be calculated in a predetermined and consolidated amount to calculate the net estimated cost of the question assessment plan.
Examples: We utilize the square exchanges, the block that comes from the disk, and the amount of disk searches to evaluate the cost of a query assessment program. If the disk subsystem has the standard of tT second to transfer an information square and has a standard block-access time (disk lookup time, in addition to the idleness of rotation) that is tS second, then, at this point, any activity that can move, b blocks and executes S seeks would take the sum of b* tT + *tS seconds. The benefits of tT and S should be considered for the utilization of the disk framework. However, the typical characteristics of the top-end disk of today are the tS value of 4 milliseconds and tT equal to 0.1 milliseconds. This would be based on a 4-kilobyte block size and a speed of exchange of 40 megabytes every second.
tT – time to transfer one block
tS – time for one to seek
Cost for b block transfers plus S seeks
b * tT + S * tS
The expense of a query assessment plan is determined by different assets that follow as:
- The number of disk accesses.
- Time of Execution taken by the CPU to execute a query.
- The involved Communication costs in either distributed or parallel database systems.
To gauge the expense of a query assessment plan, utilizing the number of blocks moved from the disk and the quantity of disk seeks for. For the most part, for assessing the expense, we must consider the most pessimistic scenario that could occur, which is the worst-case scenario. The clients accept that, at first, the information is perused from the disk, as it were. However, there should be an opportunity for the data is now present in the main memory, also called the main memory. Notwithstanding, the clients generally disregard this impact, and because of this, the actual expense of execution comes out lesser than the assessed esteem.
The reaction time, i.e., the expected time to execute the arrangement could be utilized to assess the expense of the query assessment plan. Yet, because of the accompanying reasons, it becomes easier to work out the reaction time after executing the query assessment plan. As soon as the query starts the execution process, the reaction time becomes reliant upon the substance put away in support. In any case, this data is hard to recover when the query is in its upgraded mode, or it isn’t accessible.
Whenever there is a framework with various disks available, the time of reaction relies upon a cross-examination that is ready “what is the way that accesses gets to be circulated among the disk available?”. It is hard to gauge without itemized information on the format presented over the disk. Thus, rather than limiting the reaction time for any query assessment plan, the analyzers observe that it is better to lessen the wholesome asset utilization of the query plan. Along with these lines, to appraise the expense of a query assessment plan, limiting the assets utilized for getting to the disk or utilization of the additional assets.
If you find anything incorrect in the above-discussed topic and have further questions, please comment below.