Enter the Vector, Victor!

Hello sir, I have a few questions and I hope that you would help me clear my doubts…! What are the situations when we add a EVI to a table on AS/00? What are the advantages of adding a EVI to a table? I referred IBM manuals but I was not able to interpret it clearly. Could you throw some light on this sir…..!! Ashwin, (Bangalore –india)

Ashwin, this is a really good question as EVI indexes can add a lot of performance to your query applications. I cover EVI indexes and their use in my book, iSeries and AS/400 SQL at Work, so here is an excerpt from the book explains EVI indexes. If you want to know more, you can order the book from www.sqlthing.com/book

Excerpt from iSeries and AS/400 SQL at Work Copyright 2000, Howard F. Arner, Jr. All Rights Reserved

Encoded Vector Indexes

With the advent of V4R3, a new type of index has been added to your arsenal of query response time weapons, the Encoded Vector Index (EVI). A logical file, regular index, on the AS/400 is always a Binary Radix Tree. An EVI index is a bitmap of key values. These indexes are useful when executing a query that could return a large number of values because they can be scanned more efficiently.

The AS/400 will use an index only if it thinks that the query will return less than 20% of the total number of records in the physical file. EVI indexes are useful to queries that will return or access between 20% and 60% of the data in a file. Above, 60% the AS/400 will usually elect to table scan. So, EVI’s fill an important performance niche on the AS/400.

































Table 6.1: This table represents transactions in physical file.

TRANSNUM STATE QUANTITY PARTID 0001 FL 22 AB21 0002 FL 32 KL32 0003 TN 91 AB19 0004 FL 13 KL32 0005 FL 21 AB01 0006 TN 9 AB32 0007 WY 2 AB01 The heart of the EVI is that it is based on a bitmap of values that represent whether the database record at that offset contains the value or not. Let’s use the table of data shown in Table 6.1 as an example. The table contains 7 rows of data. If a bitmap index were created on the STATE column, the resulting bitmap index will conceptually look like the data shown in Figure 6.1.

Figure 6.1: This figure shows the bitmap index that was created on the STATE column.

Notice the values in the bitmap column for the FL key. The first bit has a value of 1 indicating that the record at offset 1 in the table has the key value. The third bit is 0, indicating that the record at offset 3 in the table does not have the key value. If you were to query the database for records where the STATE column equals FL, the AS/400 could retrieve the bitmap and now knows the offsets for each FL record in the database.

In addition, the bitmaps can be combined via a logical OR to find records with multiple values. For example, querying for state equal to FL or WY would return a bitmap of 1101101 which indicates to retrieve the records at offset 1,2,4,5 and 7.

The problem with bitmap structures is that they take a lot of space. Let’s say the example transaction table has 200,000 records where all fifty states are referenced. The storage requirement for a bitmap index is 200,000 * 50 or 10,000,000 bits. These can get unwieldy when you have a large number of distinct key values in a bitmap index. They also cause problems in updating data as the index consists of X number of arrays of bits where X is the number of distinct key values. Adding a lot of data to the database causes the arrays to be re-dimensioned and extended continually. Enter the vector

EVI indexes are an advancement from IBM that allow all of the benefits of a bitmap index but do not have the storage overhead. Instead of using an array of bits for each key value, the AS/400 stores a single array of 1,2 or 4 byte vectors. The number of vectors in the array is equal to the number of records in the table. Each distinct value in the database will be given a vector as it is encountered, and each offset in the array will contain the vector value that represents the column value stored in that record in the table. So an EVI index is a combination of a table that maps keys to vectors and a singular array representing the vector locations in the database. Figure 6.2 shows a conceptual vector table for the transaction data in Table 6.1.

Figure 6.2: This represents the conceptual view of an EVI index over the data in Table 6.1.

When the AS/400 needs to find records in using the above index, it will read the table of vectors to retrieve the vector key for the column values it is searching for and then compare those values to each element in the vector array. This is quite speedy.

The number of bytes required to represent the column data is related directly to the number of distinct values in the column. If the number of distinct values is less than 256, the each vector will be only one byte. If the number of values is less than 32768, it can be represented using only a 2-byte vector. Database columns having more than 32768 distinct values will be represented by a 4-byte vector.

The point of the EVI methodology is that it takes less space than the traditional bitmap index. In addition, the EVI is maintainable in a production environment, whereas bitmap indexes are notorious for causing problems. Another advantage of EVI indexes is that the AS/400 can combine multiple EVI indexes together in memory when resolving a database query. Imagine you have two EVI indexes on the transaction table shown in Table 6.1. The first is on IDATE and the second is on STATE. Now imagine executing the following query:

	IDATE in (‘11/01/2002’,’11/12/2002’,’12/12/2002’) 

The query optimizer may choose to merge the two indexes together to get the result set. This merge can be very fast for statements that will retrieve large numbers of records, so you need to play with EVI indexes on your large tables.