Ask for only what you need....
Most of the consulting that I do in my
practice involves helping companies with SQL access issues
against their AS/400 legacy systems. Typically, these companies
are doing Active Server Pages or Java Server Pages in order
to display information from the system via the web. This week,
I want to share one of the most common performance problems
that I see in my practice and how you can avoid sending me
money by proactively solving the problem.
The issue is something called lazy-fare programming.
See, when you program using record level access, you are used
to opening a physical file or logical file and having all
of the fields available to your program just by referencing
them. Usually, when RPG programmers, and a lot of SQL programmers
are guilty of this shorthand too, construct SQL statements
they tend to use the SQL * character to indicate to SQL to
return all fields from the physical file. So, the following
statement might be found in an ASP or JSP page to list customers
that match an inquiry screen:
SELECT * FROM CUSMAS WHERE CUSNAME
Now, this is fine and dandy SQL. If there
is a logical file associated with the physical with the CUSNAME
field as the first key, it will probably return records quite
quickly. However, the number of records per second that a
query can return is directly effected by the number of fields
in the physical file. To wit: If CUSMAS has 10 fields the
query will run really fast, probably > 2,000 records per
second. But, if CUSMAS has 150 fields, the query will return
noticeably slower, (like < 750 records per second).
Communications and memory
Before I get into the why, it is probably
important to understand what is happening during the processing
of the SQL statement on both the client and the server. So,
bear with me for a moment as I illuminate the inner workings
of client and server communication.
On the client side,
whether you are using a Java program or a Visual Basic program
with ADO, the following steps occur when you attempt to run
the above query.
In the preparation
phase, the query is sent to the server and the server parses
the SQL statement to determine what information (fields) the
client wants and where the information comes from (physical
file). The AS/400 query optimizer then looks at the physical
file and its associated logical files to determine the
best way to get the data that you want. Once this is done,
the optimizer has what is called an execution plan. Having
a plan on how to get the data, the AS/400 then gets information
from the system catalogue about each field that your query
is asking to return and sends this information (metadata)
back to the client application. The metadata consists of the
name of the field, the data type, length, etc. Once the client
application gets the metadata, it constructs objects in memory
to hold the returned query information.
At this point, the
client can send the AS/400 an execution request. The execution
request causes the AS/400 to look for the data that satisfies
the query request by using the execution plan that it formulated.
Once it finds a record(s) that satisfy the request, the AS/400
sends an acknowledgement message to the client that the execution
request is complete.
the notification, the client can then request that the AS/400
return one or more records from the open query using by issuing
a fetch request. The AS/400 will acknowledge this request
with one or more records from the open data path, (one or
more depending on how the client issues the fetch request
and the size of the fetch buffers. But that is a different
column!) Now that you understand the rudiments of what is
happening during the execution of a query, lets examine why
SELECT * is a bad idea!
See, if you use SELECT * and a physical
file contains only ten fields, then only 10 pieces of metadata
are returned to the client application. Then, the client application
only has to allocate 10 buffers to hold the returned fields.
The AS/400 only has to read 10 fields of information per record
and only has to transform 10 fields from internal representation
to pretty client side representation. Now, contrast this with
a physical file that contains 150 fields of information. The
AS/400 has to look up and return metadata on 150 fields. The
client has to interpret 150 pieces of metadata to allocate
objects or buffers to hold 150 pieces of information. The
AS/400 has to read 150 fields and transform them from internal
storage format to pretty client format. Also, since the number
of fields is larger, the size of the records is larger so
less records fit into a transmission buffer causing more packets
to be sent and acknowledged in transferring the information.
Easy to understand, right?
Ok, so why is it bad. Its like my momma
used to tell me, only put on your plate what you are going
to eat. Most of the programs I see that have SELECT * do not
require all of the fields in the physical file, just 5 or
10. Why punish your CPI and squander your resources with wasted
effort that you are not going to look at anyway? Name your
fields and you increase the throughput of your information,
thereby increasing the scalability of your system on both
the client and server side.
Dont believe me? Here are a few numbers
from the real world. I had a client that was doing the above-mentioned
customer listing web page. His query was a SELECT * going
against a physical file with 179 columns of information. His
program was only using 15 fields of information from the result
set. Before he changed the query, the execution phase of the
query took .4 seconds and the records per second clocked in
at 250. After changing the query to only ask for the fields
required, the execution phase went from .4 seconds to .1 seconds
(less metadata and client memory allocation), and the records
per second returned from the AS/400 went to over 1,200. Also,
the time it took for JSP to render the page went from 2.2
seconds down to .9 seconds.
Remember, less is more