1	Planning and Tuning Queries for Legacy Data Presented by: Howard F. Arner, Jr. Client Server Development, Inc hfarner@SQLThing.com
2	Overview SQL is the language of the web Most AS/400 system are developed around record level access Making reliable and scalable SQL statements against these systems requires a consistent design methodology This presentation will illuminate a development methodology that delivers consistent performance and scalable queries in your distributed applications
3	Methodology Overview Most programs…. We used to say 95% of our programs were getting data out of a file and putting it on the screen or vice versa With the web, this is still true Just more garbage in the middle! Data access plays a central role to this Presentation formatting is a processor intensive operation on the web server Data access is an expensive operation because the web server must communicate with a separate system
4	Methodology Overview By ensuring consistent data access performance, we eliminate the data sources as a bottleneck and ensure scalability If the performance of the data access is known, performance problems must be in another portion of the application By taking a data centric approach to development, we cure performance problems before they can appear thus easing the QA process
5	Steps Required (Simplified) Identify data inputs to the target program Gather information about the target physical files Construct SQL statement to gather data Verify statement performance/access plan Benchmark statement Save/Paste to development environment
6	Identifying Data Inputs Ensure that you understand which elements need to come from the AS/400 Get a list of physical files where each element appears Generate a columns report for the physical files so that you understand the data types for the target elements Identify transformations of data
7	A Columns Query The following query can be used to retrieve the columns in a table. Note that it uses parameter markers SELECT TRIM(COLUMN_NAME) AS COLNAME, ordinal_position AS OFF, TRIM(data_Type) AS DATA_TYPE, LENGTH AS LEN, numeric_scale AS SCLE, numeric_precision AS PREC, TRIM(ifnull(column_heading,'')) AS COL_HEADING, TRIM(ifnull(column_text,'')) AS COL_TEXT, is_nullable AS NULLS, has_default AS HD, system_column_name AS SYSCOL FROM qsys2.syscolumns WHERE table_schema = ? AND TABLE_NAME=? ORDER BY OFF
8	Output Sample
9	Transformations Sometimes, the data on the AS/400 is not the data you want on the client Example, Java does not like Decimal as they are an immutable type. If the target needs to manipulate the data, plan on doing a cast in the select statement to a better numeric type Decimal(3,0) might map to smallint Decimal(12,2) might map to double Decimal(6,0) might map to integer
10	Transformations Chars waste space Example, you have a CHAR(40) on the AS/400 but most of the entries are 20 characters or less The AS/400 sends CHAR(40) as 40 characters, padded at the end with blanks Minimize the amount of transmitted data by trimming character fields before transmission
11	Transformations Dates may be in a legacy format Example: a system stores an invoice date as DECIMAL(2,0) Century, DECIMAL(2,0) Year, DECIMAL(2,0) Month and DECIMAL(2,0) Day. If the client intends to manipulate the value as a date, do the transformation on the 400 prior to sending the data to the client Some clients can not understand the full range of an AS/400 date, fix up these dates before sending to the client
12	Transformations Identify any user defined functions you may wish to create to transform data If you constantly need to send legacy dates back as SQL Dates, consider writing a function to do the transformation on AS/400 so that you have re-usable code If you need to do special calculations, like percent to total or such, write a UDF to do the transformation with full error handling to ease the code on the client
13	Example UDF to convert dates CREATE FUNCTION SQLTHING.MINCRON_DATE (CEN DECIMAL(2,0), YR DECIMAL(2,0), MO DECIMAL(2,0), DY DECIMAL(2,0)) RETURNS DATE LANGUAGE SQL BEGIN DECLARE F_OUTPUT DATE; DECLARE F_TEST INTEGER; DECLARE CONTINUE HANDLER FOR SQLEXCEPTION SET F_TEST=1; SET F_TEST=0; SET F_OUTPUT = DATE( TRIM(CHAR(MO))\|\|'/'\|\| TRIM(CHAR(DY))\|\|'/'\|\| TRIM(CHAR((CEN*100)+YR))); IF F_TEST=0 THEN RETURN F_OUTPUT; ELSE RETURN NULL; END IF; END
14	Gathering Data For each physical file, generate a list of logicals that can be used in the resolution of the query Look at the selectivity of the logical file to determine it's suitability for the query at hand Understand the implications of join order and the types of joins being used Understand the null status of the fields in any grouping or join statements
15	Generating a List of Logicals SELECT trim(DBFLDP) AS LIBRARY, trim(DBFFDP) AS FILENAME, trim(DBXTXT) AS SHORT_DESC, DBKPOS AS OFF, trim(DBKFLD) AS KEY_FIELD, DBITYP AS DATATYPE, DBITXT AS FIELD_DESC, DBICLN AS CLEN, DBINLN AS NLEN, DBINSC AS NSCLE, DBXREL AS REL, DBkORD AS DIR, dbxatr AS TYPE, (CASE dbxunq WHEN 'U' THEN 'Y' ELSE 'N' END) AS UNIQUE From QSYS.QADBLDEP LEFT JOIN QSYS.QADBXLFI ON (dbffdp=dbxfil AND dbfldp=dbxlib) LEFT JOIN QSYS.QADBKATR ON (DBFFDP=DBKFIL AND DBFLDP = DBKLIB AND DBFRDP = DBKFMT) LEFT JOIN QSYS.QADBIATR ON (DBFFDP=DBIFIL AND DBFLDP=DBILIB AND DBFRDP=DBIFMT AND DBKFLD=DBIFLD) WHERE dbflib=? AND dbffil=? AND DBXATR IN ('LF','IX') AND dbxrel='Y' ORDER BY FILENAME, OFF
16	Sample Output
17	Generating a list of logicals On my web site, download the index listing spreadsheet This Excel spreadsheet has a macro that you can use to gather a list of logicals that can be used in the resolution of queries URL: www.sqlthing.com/SQLFAQ.htm
18	How the optimizer uses logicals Optimizer looks at statement to determine columns accessed, conditions for retrieval, ordering and joins The more tables accessed in a statement, the more time this takes The more logical files over each physical file adds time to this operation
19	Statement Optimization For each logical file Optimizer looks at index selectivity Determines cost of access path usage If cost < cost of previous path then pop path to top of stack See if time is expired or check next logical
20	Optimization Problems Lots of logicals means lots of decisions Decisions take time Indexes typically evaluated in creation descending order Tables with a large number of indexes can cause optimizer to time out, this can cause the AS/400 to not see the most appropriate access path
21	What Logicals can be Used? SQL can use most logicals where the file is marked as relational However, SQL will rarely use logicals with select omit criteria Join logicals are almost always a bad idea Always initially select against the physical, do not name the logical in the from clause
22	Index Selectivity The selectivity of an index controls whether the optimizer will think using that index is a good idea Selectivity can be found by looking at the key loading information from the DSPFD command Use my selectivity report, SQLThing Enterprise Edition, to understand how selective an index is
23	Selectivity Report
24	Getting Selectivity Manually You can also get selectivity information with DSPFD command against a logical file or SQL Index Selectivity information comes from the Number of unique partial key values field(s)
25	Example DSPFD Info
26	What selectivity means Files have records. Indexes have Keys. The selectivity of a key indicates how many records will be returned, (on average), for one instance of a key Selectivity can be skewed, for example a status field might have only four entries, but a I status only occurs in one out of 10,000 records Understanding the selectivity of an index helps in planning queries
27	Real World Trucking company with 60,000,000 records in a physical file. 60 Logical files All logicals begin with company number 1 Company number All queries did not include where company = clause No logicals chosen to implement queries because left most key of logicals was not used in restriction
28	Query Development Create your query based on the fields and transformations required Use OPS Navigator or my freeware SQLThing tool. Create the query as it will be used in your application. If you are using replacement parms, then you must do the replacement parms in the query
29	Test for Performance Run statement with debug on to see optimizer messages Check prepare, execute and fetch times Verify that AS/400 is picking correct path Check for data type differences/access path rejection messages
30	Example Query Given, a file called WEBTEMP2 which contains hits against my web site SELECT * FROM SQLBOOK.WEBTEMP2 WHERE REQTS BETWEEN '1999-10-01-00.00.00.000000' AND '1999-10-01-23.59.59.999999'
31	Example Job Log
32	Example Details
33	Create suggested index The job log indicates an access path would help the query CREATE INDEX SQLBOOK.WTREQTS ON SQLBOOK.WEBTEMP2 (REQTS ASC) Next, we run the query again and look at the log
34	After Creation of Index
35	What to look for The important thing during validation is to assure that the query is using a good access path Look at the throughput of the query and ensure that the Prepare, Execute and Fetch phases are not taking a large period of time Look at the access paths considered, CPI432C message, to determine why an access path was selected/rejected The following messages bear investigation
36	Bad Messages SQL7919 - Data conversion required on FETCH or embedded SELECT SQL7939 - Data conversion required on INSERT or UPDATE CPI432E - Selection columns mapped to different attributes These messages indicate that you are comparing apples to oranges. This will preclude the use of an index
37	More Bad Messages CPI432A Query Optimizer Timed Out This is bad, as all access paths were not considered. The optimizer might have been one path away from query nirvana. In this case, is the preferred path is not considered, try naming the DDS logical in the FROM clause of the query. This causes the AS/400 to consider it first.
38	Effects of Packages on Clients Running your statements using parameter markers allows you to implement packages Packages are where the AS/400 stores the query information and the access plan developed by the optimizer Optimization is expensive, packages cause static statements to be compiled only once
39	Packages solve timeouts When packaging is on, optimizer will evaluate ALL access paths available to a statement thus ensuring the “best” path is chosen. Access path is then added to package file Next time statement is executed, access path is retrieved from the package and the expensive optimization process is avoided.
40	Problems with packages If your application is using a lot of dynamic SQL, packaging will not help Search is done by string length, so AS/400 can miss statements that are the same though they have different lengths First optimization can be quite long due to exhaustive search To effectively use packages you must use parameterized statements
41	Other was to influence the optimizer Try using the OPTIMIZE for N rows Investigate using QAQQINI to set options for how queries are optimized Turn OFF treat constants as parameters Use OPTIMIZE_STATISTIC_LIMITATION to allow optimizer to "see" more values in the logical files Use FORCE_JOIN_ORDER to force optimizer to link files in order named in query
42	QAQQINI in QTEMP You can copy the QAQQINI from QSYS into QTEMP using CRTDUPOBJ This way, settings will only effect your job Set options on QTEMP copy CHGQRYA QYROPTLIB(QTEMP) to tell job where to look for QAQQINI
43	Once your query is happy When you have the desired level of performance, you should save the query Paste the query into the development environment Run the code and ensure performance is adequate Before code goes to production, run application with Database Performance Monitor on and examine output
44	Methodology Summary The key to consistent performance is understanding how SQL will interpret your queries and implement them on the AS/400 The more information you have, the easier optimization of a query becomes By taking a data centric approach, you eliminate data performance problems in the beginning
45	Server Side Tuning The remaining slides will deal with settings you can make on the AS/400 server that will effect client server processing. These settings can help JDBC, ODBC and OLEDB applications
46	Understanding Communication ADO uses the Connection Object to communicate with the AS/400. Like ODBC and JDBC, OLEDB talks to the QSERVER subsystem, (QUSRWRK on V5) A connection causes a pre-start job, QZDASOINIT, to be used. The QZDASOINIT job will stay alive until the connection is closed The QZDASOINIT job is returned to a pool, and then reused by a subsequent connection.
47	Adjusting Prestart Jobs You can change how jobs are started and reused by using the Change Prestart Job Entry, CHGPJE, command. On systems serving a lot of JDBC,ODBC and OLEDB activity, increase the number of pre-start jobs available and manipulate the thresholds. Create a separate memory pool and change subsystem to use this pool. Size the pool to eliminate/reduce page faults.
48	Turn on Expert Cache The AS/400 has the ability to cache data. Using WRKSHRPOOL, look at the pools on your AS/400 system. Make sure the base pool is set to *CALC. This turns on object expert cache. Create a separate memory pool and load frequently used objects into the pool using the SETOBJACC command. This allows the object to be maintained in RAM.
49	Set up a separate pool for Servers By default, QSERVER subsystem (QUSRWRK on V5), runs in the base memory pool. Create a separate pool and adjust the subsystem description to run in this new pool. Adjust the prestart job entries so that they run in the new pool. This segregates the memory used by ODBC, JDBC and OLEDB into a separate space and reduces contention and fragmentation. Make sure you set the pool to *CALC with the WRKSRHPOOLS command so that it uses expert cache.
50	Contact Information More information about the query optimizer can be found in DB2 Universal Database for iSeries -Database Performance and Query Optimization (Redbook from IBM) Download the free version of SQLThing to assist with your queries. www.midrangeserver.com Feel free to contact me, hfarner@SQLThing.com, I would appreciate feedback on this presentation.