Notes
Slide Show
Outline
1
Planning and Tuning Queries
for Legacy Data
  • Presented by: Howard F. Arner, Jr.
    Client Server Development, Inc
  • hfarner@SQLThing.com
2
Overview
  • SQL is the language of the web
  • Most AS/400 system are developed around record level access
  • Making reliable and scalable SQL statements against these systems requires a consistent design methodology
  • This presentation will illuminate a development methodology that delivers consistent performance and scalable queries in your distributed applications
3
Methodology Overview
  • Most programs….
    • We used to say 95% of our programs were getting data out of a file and putting it on the screen or vice versa
    • With the web, this is still true
      • Just more garbage in the middle!
    • Data access plays a central role to this
      • Presentation formatting is a processor intensive operation on the web server
      • Data access is an expensive operation because the web server must communicate with a separate system
4
Methodology Overview
  • By ensuring consistent data access performance, we eliminate the data sources as a bottleneck and ensure scalability
  • If the performance of the data access is known, performance problems must be in another portion of the application
  • By taking a data centric approach to development, we cure performance problems before they can appear thus easing the QA process
5
Steps Required (Simplified)
  • Identify data inputs to the target program
  • Gather information about the target physical files
  • Construct SQL statement to gather data
  • Verify statement performance/access plan
  • Benchmark statement
  • Save/Paste to development environment
6
Identifying Data Inputs
  • Ensure that you understand which elements need to come from the AS/400
  • Get a list of physical files where each element appears
  • Generate a columns report for the physical files so that you understand the data types for the target elements
  • Identify transformations of data
7
A Columns Query
  • The following query can be used to retrieve the columns in a table.  Note that it uses parameter markers


  • SELECT TRIM(COLUMN_NAME) AS COLNAME,
  • ordinal_position AS OFF,
  • TRIM(data_Type) AS DATA_TYPE,
  • LENGTH AS LEN, numeric_scale AS SCLE,
  • numeric_precision AS PREC,
  • TRIM(ifnull(column_heading,'')) AS COL_HEADING,
  • TRIM(ifnull(column_text,'')) AS COL_TEXT,
  • is_nullable AS NULLS,
  • has_default AS HD,
  • system_column_name AS SYSCOL
  • FROM qsys2.syscolumns
  • WHERE table_schema = ? AND
  • TABLE_NAME=? ORDER BY OFF
8
Output Sample
9
Transformations
  • Sometimes, the data on the AS/400 is not the data you want on the client
    • Example, Java does not like Decimal as they are an immutable type. If the target needs to manipulate the data, plan on doing a cast in the select statement to a better numeric type
    • Decimal(3,0) might map to smallint
    • Decimal(12,2) might map to double
    • Decimal(6,0) might map to integer
10
Transformations
  • Chars waste space
    • Example, you have a CHAR(40) on the AS/400 but most of the entries are 20 characters or less
    • The AS/400 sends CHAR(40) as 40 characters, padded at the end with blanks
    • Minimize the amount of transmitted data by trimming character fields before transmission
11
Transformations
  • Dates may be in a legacy format
    • Example: a system stores an invoice date as DECIMAL(2,0) Century, DECIMAL(2,0) Year, DECIMAL(2,0) Month and DECIMAL(2,0) Day.
    • If the client intends to manipulate the value as a date, do the transformation on the 400 prior to sending the data to the client
    • Some clients can not understand the full range of an AS/400 date, fix up these dates before sending to the client
12
Transformations
  • Identify any user defined functions you may wish to create to transform data
    • If you constantly need to send legacy dates back as SQL Dates, consider writing a function to do the transformation on AS/400 so that you have re-usable code
    • If you need to do special calculations, like percent to total or such, write a UDF to do the transformation with full error handling to ease the code on the client
13
Example UDF to convert dates
  • CREATE FUNCTION SQLTHING.MINCRON_DATE
  • (CEN DECIMAL(2,0), YR DECIMAL(2,0),
  • MO DECIMAL(2,0), DY DECIMAL(2,0))
  • RETURNS DATE
  • LANGUAGE SQL
  • BEGIN
  • DECLARE F_OUTPUT DATE;
  • DECLARE F_TEST INTEGER;
  • DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
  • SET F_TEST=1;
  • SET F_TEST=0;
  • SET F_OUTPUT = DATE( TRIM(CHAR(MO))||'/'|| TRIM(CHAR(DY))||'/'|| TRIM(CHAR((CEN*100)+YR)));
  • IF F_TEST=0 THEN
  • RETURN F_OUTPUT;
  • ELSE
  • RETURN NULL;
  • END IF;
  • END
14
Gathering Data
  • For each physical file, generate a list of logicals that can be used in the resolution of the query
  • Look at the selectivity of the logical file to determine it's suitability for the query at hand
  • Understand the implications of join order and the types of joins being used
  • Understand the null status of the fields in any grouping or join statements
15
Generating a List of Logicals
  • SELECT  trim(DBFLDP) AS LIBRARY, trim(DBFFDP) AS FILENAME,
  •    trim(DBXTXT) AS SHORT_DESC,    DBKPOS AS OFF,
  •    trim(DBKFLD) AS KEY_FIELD,  DBITYP AS DATATYPE,
  •     DBITXT AS FIELD_DESC, DBICLN AS CLEN,
  •     DBINLN AS NLEN,   DBINSC AS NSCLE,   DBXREL AS REL,
  •     DBkORD AS DIR, dbxatr AS TYPE,
  •      (CASE dbxunq WHEN 'U' THEN 'Y' ELSE 'N' END) AS UNIQUE
  •  From QSYS.QADBLDEP
  •     LEFT JOIN QSYS.QADBXLFI
  •          ON (dbffdp=dbxfil AND dbfldp=dbxlib)
  •     LEFT JOIN QSYS.QADBKATR
  •          ON (DBFFDP=DBKFIL AND
  •              DBFLDP = DBKLIB AND
  •              DBFRDP = DBKFMT)
  •     LEFT JOIN QSYS.QADBIATR
  •          ON (DBFFDP=DBIFIL AND DBFLDP=DBILIB AND
  •                   DBFRDP=DBIFMT AND DBKFLD=DBIFLD)
  •  WHERE dbflib=? AND dbffil=? AND DBXATR IN ('LF','IX') AND dbxrel='Y'
  •  ORDER BY FILENAME, OFF
16
Sample Output
17
Generating a list of logicals
  • On my web site, download the index listing spreadsheet
  • This Excel spreadsheet has a macro that you can use to gather a list of logicals that can be used in the resolution of queries


  • URL: www.sqlthing.com/SQLFAQ.htm
18
How the optimizer uses logicals
  • Optimizer looks at statement to determine columns accessed, conditions for retrieval, ordering and joins
  • The more tables accessed in a statement, the more time this takes
  • The more logical files over each physical file adds time to this operation
19
Statement Optimization
  • For each logical file
    • Optimizer looks at index selectivity
    • Determines cost of access path usage
    • If cost < cost of previous path then
      pop path to top of stack
    • See if time is expired or check next logical
20
Optimization Problems
  • Lots of logicals means lots of decisions
  • Decisions take time
  • Indexes typically evaluated in creation descending order
  • Tables with a large number of indexes can cause optimizer to time out, this can cause the AS/400 to not see the most appropriate access path
21
What Logicals can be Used?
  • SQL can use most logicals where the file is marked as relational
  • However, SQL will rarely use logicals with select omit criteria**
  • Join logicals are almost always a bad idea**
  • Always initially select against the physical, do not name the logical in the from clause
22
Index Selectivity
  • The selectivity of an index controls whether the optimizer will think using that index is a good idea
  • Selectivity can be found by looking at the key loading information from the DSPFD command
  • Use my selectivity report, SQLThing Enterprise Edition, to understand how selective an index is
23
Selectivity Report
24
Getting Selectivity Manually
  • You can also get selectivity information with DSPFD command against a logical file or SQL Index
  • Selectivity information comes from the Number of unique partial key values field(s)
25
Example DSPFD Info
26
What selectivity means
  • Files have records.  Indexes have Keys.  The selectivity of a key indicates how many records will be returned, (on average), for one instance of a key
  • Selectivity can be skewed, for example a status field might have only four entries, but a I status only occurs in one out of 10,000 records
  • Understanding the selectivity of an index helps in planning queries
27
Real World
  • Trucking company with 60,000,000 records in a physical file.
  • 60 Logical files
  • All logicals begin with company number
  • 1 Company number
  • All queries did not include where company = clause
  • No logicals chosen to implement queries because left most key of logicals was not used in restriction
28
Query Development
  • Create your query based on the fields and transformations required
  • Use OPS Navigator or my freeware SQLThing tool.
  • Create the query as it will be used in your application.
    • If you are using replacement parms, then you must do the replacement parms in the query
29
Test for Performance
  • Run statement with debug on to see optimizer messages
  • Check prepare, execute and fetch times
  • Verify that AS/400 is picking correct path
  • Check for data type differences/access path rejection messages
30
Example Query
  • Given, a file called WEBTEMP2 which contains hits against my web site


  • SELECT * FROM SQLBOOK.WEBTEMP2 WHERE 
    REQTS BETWEEN '1999-10-01-00.00.00.000000'
     AND '1999-10-01-23.59.59.999999'
31
Example Job Log
32
Example Details
33
Create suggested index
  • The job log indicates an access path would help the query


  • CREATE INDEX SQLBOOK.WTREQTS
    ON SQLBOOK.WEBTEMP2 (REQTS ASC)


  • Next, we run the query again and look at the log
34
After Creation of Index
35
What to look for
  • The important thing during validation is to assure that the query is using a good access path
  • Look at the throughput of the query and ensure that the Prepare, Execute and Fetch phases are not taking a large period of time
  • Look at the access paths considered, CPI432C message, to determine why an access path was selected/rejected
  • The following messages bear investigation
36
Bad Messages
  • SQL7919 - Data conversion required on FETCH or embedded SELECT
    SQL7939 - Data conversion required on INSERT or UPDATE
    CPI432E - Selection columns mapped to different attributes
  • These messages indicate that you are comparing apples to oranges.  This will preclude the use of an index
37
More Bad Messages
  • CPI432A Query Optimizer Timed Out
    • This is bad, as all access paths were not considered.  The optimizer might have been one path away from query nirvana.
    • In this case, is the preferred path is not considered, try naming the DDS logical in the FROM clause of the query.  This causes the AS/400 to consider it first.
38
Effects of Packages on Clients
  • Running your statements using parameter markers allows you to implement packages
  • Packages are where the AS/400 stores the query information and the access plan developed by the optimizer
  • Optimization is expensive, packages cause static statements to be compiled only once
39
Packages solve timeouts
  • When packaging is on, optimizer will evaluate ALL access paths available to a statement thus ensuring the “best” path is chosen.
  • Access path is then added to package file
  • Next time statement is executed, access path is retrieved from the package and the expensive optimization process is avoided.
40
Problems with packages
  • If your application is using a lot of dynamic SQL, packaging will not help
  • Search is done by string length, so AS/400 can miss statements that are the same though they have different lengths
  • First optimization can be quite long due to exhaustive search
  • To effectively use packages you must use parameterized statements
41
Other was to influence the optimizer
  • Try using the OPTIMIZE for N rows
  • Investigate using QAQQINI to set options for how queries are optimized
    • Turn OFF treat constants as parameters
    • Use OPTIMIZE_STATISTIC_LIMITATION to allow optimizer to "see" more values in the logical files
    • Use FORCE_JOIN_ORDER to force optimizer to link files in order named in query
42
QAQQINI in QTEMP
  • You can copy the QAQQINI from QSYS into QTEMP using CRTDUPOBJ
  • This way, settings will only effect your job
  • Set options on QTEMP copy
  • CHGQRYA QYROPTLIB(QTEMP) to tell job where to look for QAQQINI
43
Once your query is happy
  • When you have the desired level of performance, you should save the query
  • Paste the query into the development environment
  • Run the code and ensure performance is adequate
  • Before code goes to production, run application with Database Performance Monitor on and examine output
44
Methodology Summary
  • The key to consistent performance is understanding how SQL will interpret your queries and implement them on the AS/400
  • The more information you have, the easier optimization of a query becomes
  • By taking a data centric approach, you eliminate data performance problems in the beginning
45
Server Side Tuning
  • The remaining slides will deal with settings you can make on the AS/400 server that will effect client server processing.
  • These settings can help JDBC, ODBC and OLEDB applications
46
Understanding Communication
  • ADO uses the Connection Object to communicate with the AS/400.
  • Like ODBC and JDBC, OLEDB talks to the QSERVER subsystem, (QUSRWRK on V5)
  • A connection causes a pre-start job, QZDASOINIT, to be used. The QZDASOINIT job will stay alive until the connection is closed
  • The QZDASOINIT job is returned to a pool, and then reused by a subsequent connection.
47
Adjusting Prestart Jobs
  • You can change how jobs are started and reused by using the Change Prestart Job Entry, CHGPJE, command.
  • On systems serving a lot of JDBC,ODBC and OLEDB activity, increase the number of pre-start jobs available and manipulate the thresholds.
  • Create a separate memory pool and change subsystem to use this pool. Size the pool to eliminate/reduce page faults.
48
Turn on Expert Cache
  • The AS/400 has the ability to cache data.
  • Using WRKSHRPOOL, look at the pools on your AS/400 system.  Make sure the base pool is set to *CALC.  This turns on object expert cache.
  • Create a separate memory pool and load frequently used objects into the pool using the SETOBJACC command.  This allows the object to be maintained in RAM.
49
Set up a separate pool for Servers
  • By default, QSERVER subsystem (QUSRWRK on V5), runs in the base memory pool.
  • Create a separate pool and adjust the subsystem description to run in this new pool.
  • Adjust the prestart job entries so that they run in the new pool.
  • This segregates the memory used by ODBC, JDBC and OLEDB into a separate space and reduces contention and fragmentation.
  • Make sure you set the pool to *CALC with the WRKSRHPOOLS command so that it uses expert cache.
50
Contact Information
  • More information about the query optimizer can be found in DB2 Universal Database for iSeries -Database Performance and Query Optimization (Redbook from IBM)
  • Download the free version of SQLThing to assist with your queries.
  • www.midrangeserver.com
  • Feel free to contact me, hfarner@SQLThing.com, I would appreciate feedback on this presentation.