| Frames | Modems | Help | Home Page | Chipsets | Search | No Frames |
| Diary Entries | See also Site Info & Diary. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 03 November 2003 | Optimising SQL statements - a practical walkthrough... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
My last 4 months have been spent re-writing the site at modem-help.com - closeted away in my attic, like some demented monk, chained to the desk and only released for short periods for necessary ablutions, or a crust of bread, or a few hours sleep. The first stage is almost finished and the results will be seen shortly on the site. There have been 3 criteria:
Implicit in the last one is to radically shorten the preparation time of each page at the server. This latter has become a crucial issue for me since the sites moved to a co-located server in July 2003. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The sites were formerly hosted by UK Linux on a twin-Pentium III server. My sites shared the server with many other UKLinux paying customers, and after a time it became clear that the folks at UKLinux considered my sites to be crippling their server with poorly-written php/mysql scripts. Naturally, my attitude was that this was bunkum. After the experience of the last 4 months, I’m not so sure. Olivia is the name of the server that has delivered this page up to you. She is quite some beast - dual matched-intel Xeon 2.4 GHz cpus on a Gigabyte SR222E-R MB, 1 Gig memory, Adaptec zero channel RAID with 3 x 73.4 GB 10,000 rpm SCSI SCA HDDs running RedHat Linux v8.0, Apache/2.0.40, MySQL 3.23.56-log & PHP. Ridiculously over-specified for a 25,000 hits/day server. And yet, and yet...Olivia is running phpMyAdmin 2.5.1, and this version offers many features vital for bench-marking sql statements. The hosted-server is running 2.2.5 (from memory) and benchmarks are not available with this version - if you do not already have the later version, upgrade quick. Running my own server also means being able to access as the root MySQL user, and this gives access to runtime info & system variables that were unavailable in the hosted state. All of this became vital on the 16 Oct just gone. So, what’s offered below are live examples of benchmarking one of the final sql statements from the chipsets.php script. The example page is chipsets.php?mid=122&ncd=4485, the list of drivers for all modems with this chipset. The sql tables are listed at 14 Jan. The original php-script is: "SELECT m.MID,m.Name,n.NID,n.Name,n.dead,k.LTD,lt.Name,k.OTD,ot.Name,DATE_FORMAT(k.Updated,'%d %b %Y'),Version "._QF_MODEM." "._QJ_MFCS." "._QJ_LINKL." "._QJ_LINKT." LEFT JOIN ostypes ot ON k.OTD=ot.OTD WHERE ".(($nbd)?"b.NID=$nbd":"c.NID=$ncd")." && k.LTD="._QTL_DRIVR." GROUP BY m.Name,n.Name,ot.Name ORDER BY k.LTD DESC,ot.Name,k.Version DESC,m.Name,n.Name" _QF_MODEM and the other constants are pre-defined slivers of sql script:
define("_QF_MODEM","
FROM names n
"._QJ_MODEM ."
LEFT JOIN names nm ON dn.NMD=nm.NID
LEFT JOIN names no ON dn.NCD<=>no.NID
LEFT JOIN names a ON dn.CAD<=>a.NID
LEFT JOIN names b ON dn.CBD<=>b.NID
LEFT JOIN names c ON dn.CCD<=>c.NID
");
...which makes it easier to build the sql statements. Use of these constants does introduce some redundancy into the statements, but this affects neither the results nor the speed - MySQL just ignores the redundant portions - whilst the use of a global define allows every sql statement that uses it to be changed in one go if improvements are found, and also reduces the size of the php pages. For the page in question, this script turns out like this (the examples below are drawn from the phpMyAdmin pages): |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Less than 0.1 sec may not sound slow but, apart from the fact that there are scores of queries on each page, clicking the Explain SQL link (obviously not a link here) makes the point:
The first row (names n) is the problem: no keys, all rows in the table have to be examined, and a temporary table with filesort is being used - bad news all round. Clearly, this is a downside in the original decision in the php-script to use DEFINEs. These were all based on ‘names n’ as the first table (to maintain consistency). This was a good choice for most sql statements, but a poor choice for this one. Total rows examined is the product of all the ‘rows’ column: 11019 x 1 x 1 x 1 x 1 x 1 x 1 x 1 x 5 x 1 x 1 = 55,095 rows. ‘modems’ is the obvious target for the first table, and a very small change to the sql statement makes a dramatic change to the script duration: |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
More than twice as quick as the previous query. Explain SQL gives:
The time savings are coming purely because there are less modems than names; still no keys, all rows have to be examined and a temporary table with filesort. The number of rows that need to be examined is now 2596 x 5 = 12,980 (23.5% of the previous figure). The choice of key in the WHERE is the issue, as the original choice was wrong. c.NID (used to find the chipset name) is exactly equivalent to dn.NID, the modem chipset. Making this one change makes another dramatic difference: |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Now, this is more like it. More than ten times as quick as the original query, yet giving exactly the same results. Explain SQL gives:
Notice that the number of rows that need to be examined is still 12,980. Only the addition of where used; in the primary table has made the difference. Looking at the table indexes, CCD forms part of a combo-index NCD (chipset component names), designed originally to save disk space as the sites were on a space-limited hosted arrangement. That is most certainly not an issue now. Adding an index for CCD (plus all other components) allows another time boost: |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Another 50% time reduction. Explain SQL gives:
Notice that the number of rows that need to be examined is now 19 x 5 = 95, compared to the original 55,095, which explains the value of the index together with all other improvements. There is one further refinement that I can see to be made, and this involves k.LTD. This is the Type of the link (in this case it is Drivers). The scripts on this page show up to 4 different types of listings: drivers, firmware, manuals & support. Originally, 4 different connections were made, one for each type. Far more sensible is to make one connection and gather all the links, then use php to split them according to type. Removing the k.LTD conditional gives the following result: |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A little longer than the single query, but much less time than 4 seperate queries. Explain SQL gives:
So, what’s the bottom line after all this work?
The non-optimised queries took 4 x 0.0787 = 0.3148 secs. This is more than 64 times quicker than the original statements - astonishing! - and the difference between a speedy page and a laggard one, particularly at times of high load. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||