Ask for only what you need....

Most of the consulting that I do in my practice involves helping companies with SQL access issues against their AS/400 legacy systems. Typically, these companies are doing Active Server Pages or Java Server Pages in order to display information from the system via the web. This week, I want to share one of the most common performance problems that I see in my practice and how you can avoid sending me money by proactively solving the problem.

The issue is something called lazy-fare programming. See, when you program using record level access, you are used to opening a physical file or logical file and having all of the fields available to your program just by referencing them. Usually, when RPG programmers, and a lot of SQL programmers are guilty of this shorthand too, construct SQL statements they tend to use the SQL * character to indicate to SQL to return all fields from the physical file. So, the following statement might be found in an ASP or JSP page to list customers that match an inquiry screen:


Now, this is fine and dandy SQL. If there is a logical file associated with the physical with the CUSNAME field as the first key, it will probably return records quite quickly. However, the number of records per second that a query can return is directly effected by the number of fields in the physical file. To wit: If CUSMAS has 10 fields the query will run really fast, probably > 2,000 records per second. But, if CUSMAS has 150 fields, the query will return noticeably slower, (like < 750 records per second).

Communications and memory

Before I get into the why, it is probably important to understand what is happening during the processing of the SQL statement on both the client and the server. So, bear with me for a moment as I illuminate the inner workings of client and server communication.

On the client side, whether you are using a Java program or a Visual Basic program with ADO, the following steps occur when you attempt to run the above query.

1) Preperation
2) Execution
3) Fetch

In the preparation phase, the query is sent to the server and the server parses the SQL statement to determine what information (fields) the client wants and where the information comes from (physical file). The AS/400 query optimizer then looks at the physical file and it’s associated logical files to determine the best way to get the data that you want. Once this is done, the optimizer has what is called an execution plan. Having a plan on how to get the data, the AS/400 then gets information from the system catalogue about each field that your query is asking to return and sends this information (metadata) back to the client application. The metadata consists of the name of the field, the data type, length, etc. Once the client application gets the metadata, it constructs objects in memory to hold the returned query information.

At this point, the client can send the AS/400 an execution request. The execution request causes the AS/400 to look for the data that satisfies the query request by using the execution plan that it formulated. Once it finds a record(s) that satisfy the request, the AS/400 sends an acknowledgement message to the client that the execution request is complete.

Having received the notification, the client can then request that the AS/400 return one or more records from the open query using by issuing a fetch request. The AS/400 will acknowledge this request with one or more records from the open data path, (one or more depending on how the client issues the fetch request and the size of the fetch buffers. But that is a different column!) Now that you understand the rudiments of what is happening during the execution of a query, lets examine why SELECT * is a bad idea!

Practical Applications

See, if you use SELECT * and a physical file contains only ten fields, then only 10 pieces of metadata are returned to the client application. Then, the client application only has to allocate 10 buffers to hold the returned fields. The AS/400 only has to read 10 fields of information per record and only has to transform 10 fields from internal representation to pretty client side representation. Now, contrast this with a physical file that contains 150 fields of information. The AS/400 has to look up and return metadata on 150 fields. The client has to interpret 150 pieces of metadata to allocate objects or buffers to hold 150 pieces of information. The AS/400 has to read 150 fields and transform them from internal storage format to pretty client format. Also, since the number of fields is larger, the size of the records is larger so less records fit into a transmission buffer causing more packets to be sent and acknowledged in transferring the information. Easy to understand, right?

Ok, so why is it bad. It’s like my momma used to tell me, only put on your plate what you are going to eat. Most of the programs I see that have SELECT * do not require all of the fields in the physical file, just 5 or 10. Why punish your CPI and squander your resources with wasted effort that you are not going to look at anyway? Name your fields and you increase the throughput of your information, thereby increasing the scalability of your system on both the client and server side.

Don’t believe me? Here are a few numbers from the real world. I had a client that was doing the above-mentioned customer listing web page. His query was a SELECT * going against a physical file with 179 columns of information. His program was only using 15 fields of information from the result set. Before he changed the query, the execution phase of the query took .4 seconds and the records per second clocked in at 250. After changing the query to only ask for the fields required, the execution phase went from .4 seconds to .1 seconds (less metadata and client memory allocation), and the records per second returned from the AS/400 went to over 1,200. Also, the time it took for JSP to render the page went from 2.2 seconds down to .9 seconds.

Remember, less is more….