When developers start using a new programming language they must not only learn the syntax, but also become skilled at the performance best practices for that language. Usage of SQL requires the same knowledge and skill in order to be effective.
Unfortunately, too many IBM i developers tend to stop their learning after gaining a knowledge of the SQL syntax. This first level of knowledge enables them to create SQL requests that sort, aggregate, or join data as the application requires; however, many developers often become frustrated when the SQL statement or application doesn't meet their performance expectations. In this situation, the developers resort back to using DB2's native record-level access interface because they know how to apply the performance best practices for native access. By doing this, these developers overlook the fact that they need to spend time deepening their understanding of SQL performance best practices just like the performance awareness they developed for traditional record-level access in RPG and COBOL programs.
If you count yourself among this group, don't fear. This article will deepen your SQL performance knowledge by focusing on SQL Open Data Paths (ODPs). An ODP is an internal structure utilized by the DB2 for i engine during the runtime execution of an SQL statement. DB2 for i only uses the ODP structure for the execution of SQL Data Manipulation (DML) statements: Select, Update, Delete, Merge, and Insert.
ODP Performance Magic
The runtime behavior of SQL ODPs can have a huge impact on the performance of transactional applications. The impact is large because transactional applications typically execute the same set of SQL statements hundreds of times within a job. In this article, the term "job" is synonymous with a database connection or thread. The repeated executions of the same SQL statement within a job magnify the effect of ODPs, either positively or negatively.
To understand the magical performance attributes of an ODP, let's first step back to learn about the ODP creation process. As shown in Figure 1, the runtime execution of a SQL statement consists of three phases: Optimize, Open, Run.
As you might expect, the Optimize phase is when the DB2 Query Optimizer takes action. The Query Optimizer analyzes the SQL statement and determines the set of data access methods and algorithms that can produce the best performance. The Query Optimizer focuses a lot of its analysis on finding the best index to use for access and determining the ideal join order and technique. The output of the DB2 for i Query Optimizer is stored in a structure called an Access Plan (or Query Plan). Contrary to popular belief, the Query Optimizer does not actually run the query; the Optimizer just produces the plan which contains the set of instructions (e.g., join order) for the DB2 engine to follow. The Optimizer believes this set of instructions can deliver the best performance.
With the Access Plan in place, the SQL request proceeds to the Open phase. During the Open phase is when the creation of a SQL ODP can occur. As shown in Figure 2, the creation of a SQL ODP is synonymous with the term "Full Open" because DB2 has to execute the full code path of the Open phase in order for a SQL request to proceed to the Run phase. Running through the full code path results in suboptimal performance for the SQL statement. Thus, one of the keys to good SQL performance is minimizing ODP creation (i.e., Full Open) when you have SQL statements running multiple times in a job.
If you're well versed in traditional record-level access programming, this reduction of open operations should not be a new concept. Sharing an ODP is considered one of the performance best practices with traditional record-level access. With record-level access, the programmer is responsible for controlling the sharing of the ODP. In contrast, the sharing or reuse of SQL ODPs is controlled by the DB2 engine; programmers can only influence the sharing of SQL ODPs. Later on, you can learn about the programmer actions that affect good and bad ODP behavior.
The ODP structure is built based on the instructions contained in the Access Plan. You can think of a SQL ODP as a pipe for data access that DB2 builds from the blueprint stored in the Access Plan. This ODP data access pipe is then used to perform the I/O specified in your SQL request.
Third Time's the Charm
A SQL ODP is always created the first time a SQL statement is run in a job. No reuse of the ODP can occur before the ODP is created. As a result, the first execution of a SQL statement is always going to be slower than later executions of that same SQL statement. DB2 deletes the ODP (i.e., performs a Hard Close) after the first execution of the SQL statement completes. The ODP is deleted after the first execution of the statement because DB2 is not sure that SQL request will be executed again in the job. An ODP consumes memory within the job that uses it, so DB2 doesn't want to unnecessarily consume memory by leaving around ODPs for SQL statements that are never executed again.
The second execution of the SQL statement in a job now has to create an ODP because the ODP was deleted after the first execution of the statement. The conclusion of the second execution is when the ODP magic occurs to improve performance on subsequent runs of the SQL request. With the second execution complete, the DB2 engine now has an indication that this SQL statement is going to be run multiple times in the job. As a result, DB2 decides not to delete the ODP so that it can be reused in later executions of that SQL statement. Leaving an ODP around for reuse is also known as a Pseudo-Close. The SQL statement is logically closed from an application perspective, but the associated ODP structure is left in such a state that DB2 can reuse the statement. The reuse of the ODP structure delivers faster performance on the third execution and all future executions of the SQL request.
With the ODP available for the third execution of that SQL statement, DB2 can now dramatically improve performance by reusing the ODP instead of creating one. The reuse of an existing ODP consumes 10-20 times less CPU resources than the creation of a new ODP. With this savings in CPU, you should now have a clear understanding of the performance benefits of good ODP reuse behavior! Elimination of the Full Open processing can deliver dramatic performance gains.
Figure 3 shows an example of the SQL ODP reuse algorithm in action. A database connection (or job) is initiated to run an application that contains SQL statements. One minute after Connection #1 starts, the specified Select statement is run. Because it's the first execution of the Select statement in Connection #1, the ODP must be created.
DB2 uses the instructions stored in the statement's access to plan to build the ODP and then uses the ODP to run the Select statement. After the first execution completes, DB2 deletes the ODP because it's not known at this point if the Select statement will be run again in Connection #1.
A minute later, the user launches the same application function, which results in the same Select statement being executed. There's no ODP around for the Select statement to use, so an ODP is created as part of the second execution of the Select statement. Note that after the second execution of the Select statement, the ODP isn't deleted. DB2 performs a Pseudo-Close of the statement's ODP instead of a Hard Close.
Next, a second database connection is started (Connection #2), and the user selects the same application function. Even though the same Select statement is run in Connection #2 and an ODP exists for that statement, DB2 for i has to create an ODP because it's the first execution of the Select statement in Connection #2. A SQL ODP can only be reused within a database connection or job. But the Access Plan can be shared across jobs, as shown in Figure 3. The Access Plan structure is only the blueprint of methods to use, so it's safe to reuse that set of instructions across multiple jobs.
A couple of minutes later, the same Select statement is run in Connection #1. As you might expect, DB2 reuses the ODP for the third execution of the statement instead of creating a new ODP. Future executions of this same Select statement in Connection #1 result in DB2 reusing the same ODP to deliver fast performance. The ODP for this SQL request remains in memory until the database connection or job ends. If the same Select statement is executed multiple times in Connection #2, then ODP reuse occurs concurrently within the second connection.
Learning from the Past
You might be wondering how DB2 for i knows if it's the first, second, or third time a SQL statement has been run in a job. Each time a SQL statement is run in a connection, the DB2 engine keeps track of the number of executions of that SQL statement. For dynamic SQL requests (e.g., CLI or ODBC), these counters are stored in the working memory for your job, meaning the counters disappears when the job ends. In contrast, the counters for static SQL and extended dynamic SQL requests are stored in permanent objects: a program, module, or package object.
The counters stored in the permanent objects allow DB2 to learn how frequently a SQL statement has been run in the past. When the execution history for a SQL statement shows that it's a frequently run request, the DB2 engine overrides the default ODP algorithm and leaves the SQL ODP around for reuse after the first execution of the SQL statement. Thus, if you analyze the ODP behavior for the static SQL statements you have embedded within high-level language applications, you will most likely find that the ODP for those embedded SQL requests are being reused after the first execution instead of the second execution.
Unveiling the ODP Magic
The best tool for analyzing the runtime behavior of SQL ODP is the SQL Performance Monitor (also known as the Database Monitor), which is shipped with the IBM i OS. The IBM System i Navigator client provides a graphical interface for collecting and analyzing database monitor data. Database monitor data can also be collected using the Start Database Monitor (STRDBMON) and End Database Monitor (ENDDBMON) system commands. The SQL Performance Monitor can be used to collect detailed or summary-level monitor data. Detailed monitor data is required for the analysis of SQL ODPs. You should also be aware that IBM considers the summary-level database monitor to be non-strategic and is no longer enhancing this tool.
While the traditional debug messages generated in the job by DB2 include feedback on the behavior of SQL ODPs, it's not wise to use that tool. First, it's very difficult to associate the ODP messages with the SQL statement that's associated with the message. Second, this tool is also considered non-strategic and no longer being enhanced.
Once a detailed SQL Performance Monitor collection is in place, the System i Navigator interface can verify what happened in terms of SQL ODP reuse. The first step is looking at the ODP behavior at a summary level. Summarizing the monitor data is done by right-clicking the monitor collection and selecting the Analyze task, as shown in Figure 4. After selecting the Analyze task, the status window in the bottom right-hand corner lets you know that it could take some time to summarize the monitor data. The length of time depends on how long the SQL Performance Monitor was running and how many jobs were monitored. Best practice is to run the detailed monitor for short periods of time on a single job. Following this advice ensures that the analysis process runs quickly and that you don't consume a large amount of disk space on your system with large monitor collection files.
Once the monitor data is summarized, DB2 displays the dashboard summary window, shown in Figure 5, and stores the summarized data into a permanent object. The storing of the data into a permanent object enables you to perform your analysis over a number of days without waiting for the data to be summarized each time you want to analyze the monitor data.
Note that the dashboard summary view of the monitor data includes a count of the number of Full Opens (ODP creation) and Pseudo Opens (ODP reuse). Ideally, you want to see a Pseudo-Open count that's substantially higher than the Full Open count. This type of ratio shows that most executions of SQL statements are reusing ODPs instead of creating them. If the SQL Monitor collected data from several jobs, then the Full Open count can be slightly misleading. As shown in Figure 3, it's normal behavior for DB2 to create an ODP the first time a SQL statement is run in a job even when that same SQL statement has had an ODP already created in another job. For that reason, you can simplify your ODP analysis by collecting monitor data for a single job or by manually subsetting the monitor data.
You can identify those SQL statements that caused a Full Open by selecting the Full Opens category in the dashboard window and then clicking the Statements button on the right. This generates a report window similar to the one shown in Figure 6. This Full Open report contains a line for each individual execution of a SQL statement that ran during the monitor collection period. On the left of this window, there are classification columns that highlight the ODP behavior for that execution of the SQL statement: Full Open, Pseudo-Open, Full Close or Pseudo-Close. A value of 1 is placed in the appropriate column to document the ODP action that took place.
Examination of the sample report shows a value of 1 in the Full Opens column. This value indicates that the first two executions of the Update statement required the creation of a SQL ODP. The Hard Close reason of "First full close" is just another way of saying the ODP is being deleted because it's the first execution of the statement in the job (i.e., normal behavior). The Pseudo-Closes column is set to 1 for the Close statement that follows the second execution of the Update statement. This is your indicator that the ODP for the Update statement has not been deleted and the ODP is available for reuse if the Update statement is executed again within the job.
Some Hard Close reason values returned in the Full Opens report are easier to understand than others. You may need to enlist the services of IBM Support or the DB2 Center of Excellence team within IBM Lab Services (see the resources sidebar) to figure out the root cause of some Hard Closes. This Full Opens report also includes a Cursor Hard Close Subcode column that's not shown; sometimes this column value is required by IBM to determine the exact cause of the more obscure Hard Close events.
Dispelling Some Myths
Now that you understand the performance benefits of SQL ODPs, let's examine some of the myths that have been perpetuated by IBM i developers without a deep knowledge of the DB2 for i SQL engine.
Myth: SQL has a file/memory leak issue. After calling a program that run SQL statements, some developers use the Display Job (DSPJOB) command to look at the list of open files for that job. When presented with a long list of open files (a.k.a, tables), developers assume they did something wrong in their program or that the IBM i SQL support has a leaking issue that's causing files to remain open after the program call completes. In reality, a long list of open files indicates that the SQL ODPs associated with your application have been left around so that they can be reused.
Myth: Reusable SQL ODPs cause lock conflicts. Once a developer understands that the DB2 for i reuse of SQL ODPs is good for performance, he or she wonders if the SQL ODPs will cause lock conflicts with other jobs running on the system. For example, a reorganize file operation is scheduled to run on the Orders table at midnight, and there are several QZDASOINIT host server jobs that have reusable SQL ODPs that reference the Orders table. The assumption is that the Reorganize Physical File Member (RGZPFM) command won't be able to obtain the exclusive lock it requires while there are SQL ODPs around.
While SQL ODPs do hold a lock on the table(s) that they reference, this is a "soft" lock that shouldn't conflict with other work on the system. Furthermore, all system commands that require an exclusive lock, such as RGZPFM, have been enhanced to automatically release the locks associated with SQL ODPs so the commands can successfully complete their work.
If you have a situation where a reusable SQL ODP appears to be preventing other work from completing, then you should report that issue to IBM Support. Before contacting IBM Support, you should verify that your system has the latest Database Group PTF installed.
Myth: SQL performance is slow. You sometimes hear developers say that SQL is slower than traditional record-level access. Some developers even state their claim is based on performance tests they've run. A good number of these performance tests turn out this way because the developer is measuring the performance of the first two executions of a SQL statement, which includes the ODP creation overhead. Instead the developers should be measuring the steady state execution time of the SQL statement when it's reusing the ODP.
Best Practices for Maximizing ODP Reuse
Now that you have a detailed understanding of SQL ODPs, let's review some of the application design and programming best practices that maximize SQL ODP reuse. And just so you know, the DB2 for i SQL Performance Workshop (see link below) is the best and only place that provides a complete examination of the SQL ODP reuse best practices.
Keep jobs and connections active. As mentioned above, the SQL ODPs reside in the working memory for a job. This means that when the job or database connection ends, all of the ODPs are deleted. Thus, a poor application design would be a program that each time it's run, starts a connection, executes a set of SQL statements, and then ends the connection. This type of application design won't have good SQL performance if the same set of SQL statements are being executed each time the program is run because the associated SQL ODPs are being deleted each time the connection ends. Reusing the same job or connection provides more opportunities for the associated SQL ODPs to be reused as well. Connection pooling is one technique to accomplish this.
IBM encourages this type of application design for programs using embedded SQL by providing default values of *ENDJOB and *ENDACTGRP for the Close SQL Cursor (CLSSQLCSR) parameter on the SQL precompiler commands (e.g., CRTSQLRPGI). These default values cause reusable SQL ODPs to remain until the job ends instead of being deleted after the program call ends.
As previously mentioned, using connection pools is another to way to keep jobs around longer on IBM i database servers. SQL interfaces such as ADO.NET and JDBC require a database connection object in order to run SQL statements against database servers such as DB2 for i. The connection pooling capability was provided to enable the reuse of the connection objects in the application. When an application is done with the connection, it's returned to the pool of connections instead of the connection object being deleted. The associated IBM i job remains on the server while the connection object remains in the pool. This behavior enables the reusable SQL ODPs to remain active within the associated IBM i job while the pooled connection waits for its next usage.
When a pooled connection is reused by the application, DB2 for i does have to verify that the new user of the pooled connection matches the user profile that originally executed the SQL statements associated with the ODPs. If these user profile values do not match, then DB2 for i must delete the existing ODPs because the new user may not have the necessary authority to the objects referenced by the SQL ODP. Due to this requirement, most applications that utilize connection pooling use the same user profile for all the database connections. This technique does maximize SQL ODP reuse, but you need to verify the impact of using a shared user profile against your security policy.
Dynamic SQL: Prepare once, execute many. The execution of Dynamic SQL requests involves the dynamic construction of a SQL statement within a character string. The SQL statement contained in the character string is then transformed into an executable format with a Prepare statement. The actual running of the Dynamic SQL request is done using the Execute statement.
Once a Dynamic SQL statement has been prepared once, an application can run the statement over and over again using the Execute statement. There's no requirement to prepare a statement each time it's executed; it's really a wasted call to the database engine to repeatedly prepare the same statement within a job. This wasteful behavior can also have a negative impact on the reuse of SQL ODPs.
Logically, you can equate the Prepare statement with the creation of an ODP and the Execute statement with the reuse of an ODP. Thus, an application using dynamic SQL should Prepare a SQL statement once and Execute the SQL statement many times to guarantee the best reuse of SQL ODPs. DB2 for i does perform some automatic caching of SQL ODPs to try and help poorly designed applications that perform repeated prepare operations for the same SQL request; however, these caching algorithms can change at any time and are less efficient than a properly designed application. That's why a "Prepare once, execute many" design is the best solution for maximizing ODP reuse with Dynamic SQL applications.
There are some Dynamic SQL interfaces—such as the Execute Immediate statement, the SQLExecDirect function, and the JDBC Statement object—that perform both the Prepare and Execute operations on a single request. These interfaces don't allow a SQL statement to be prepared once and executed many times. Based on this, the DB2 for i ODP reuse algorithm assumes that these types of interfaces are only being used for SQL requests executed once within a connection. If you're using these interfaces for SQL requests executed multiple times within a job, there's a high probability your ODP reuse and performance are suboptimal.
Minimize environment changes between SQL executions. A change to the runtime application environment is one of the more common causes of ODP deletions. Say a job has executed the same SQL statement multiple times, and it's perfectly reusing the associated ODP. Next thing you know, an application or environment setting such as isolation level (a.k.a., commitment control level) changes and DB2 can no longer reuse the existing ODP.
If there are environment or application settings that are changing between executions of a SQL statement, those changes should be reviewed to determine if the changes are intended and really necessary. My experience is that many times the changes are accidental instead of intentional. The Hard Close Reason column in the SQL Performance Monitor report in Figure 6 does a pretty good job of highlighting these change events.
If you're using unqualified references on your SQL statements—UPDATE Tab2 SET c3=100 WHERE c1=5—and system naming convention (i.e., *SYS), then changes to a job's library list in between executions will cause the associated SQL ODP to be deleted. A SQL ODP is built to perform I/O operations on a specific instance of a table. For example, if the first execution of the SQL statement updated the table, Tab2, in the L1 library, then DB2 cannot reuse that same SQL ODP to update a different version of Tab2 in the L2 library. The Override Database File (OVRDBF) command is another way to change which version of an object DB2 is operating on and also forces SQL ODPs to be deleted.
A change in isolation level can also cause the ODP for an SQL statement to be deleted between executions. This deletion most frequently happens when there's a dramatic change in the isolation level—for instance, running with No Commit on one execution of the statement and using a high isolation level such as Read Stability on the next execution, or vice versa.
Taking SQL Performance to the Next Realm
Hopefully, this article has cleared away the fog and mistruths that were hampering your ability to create high-performing SQL applications. With this new knowledge of SQL ODP best practices, you should have no problem maximizing SQL performance in your transactional applications.
Kent Milligan is a senior DB2 for i consultant on IBM’s ISV Enablement team for IBM i. Kent spent the first eight years of his IBM career as a member of the DB2 development group in Rochester. He speaks and writes regularly about relational database topics.
Find Out more
For more more information, visit these websites:
DB2 for i SQL Performance Workshop:
IBM DB2 Center of Excellence Lab Services team:
IBM Redbook "On Demand SQL Performance Analysis Simplified for DB2 for i":