|Frames||Modems||Help||Home Page||Chipsets||Search||No Frames|
|Diary Entries||See also Site Info & Diary.|
|20 March 2004||Cutting bandwidth costs + shortening download times with compression...|
This page is a real credit to the Open Source movement.
See bottom of page for updated class Conteg.include (RFC-compliant Response/Request headers).
Pages on modem-help.com began to be gzip-compressed by default at the server starting at about 5 am this morning. If you have a problem viewing any page, the Home page is NOT compressed, and has a link which will change the default for all other pages to not-compressed (look for “Page Problems”).
Olivia--the server for both modem-help.com & modem-help.co.uk--is located in Manchester University, Lancashire, England at a co-location; many hundreds of electronic boxes huddled together in air-conditioned corridors, permanently connected to the internet backbone, LEDs winking at each other as they shuffle electronic files across the globe, their ceaseless activity witnessed only by the odd security guard pacing across spotless floors. That’s my fantasy, anyway, since I have never actually visited the site!
These pages began on free webspace in 13 Feb 1999, provided as part of a dial-up ISP connection with Freeserve. On 20 Mar 2000 I got the modem-help.co.uk domain via Micronicos. On 12 Aug 2001 I got free webspace with UKLinux and began experimenting with PHP + MySQL, then later, on 24 Feb 2002, registered the modem-help.com domain & obtained hosting facilities from UK Linux. Each of these steps involved greater cost on my part, but each step was small. They were all low-cost options which I could easily fund from my own pocket. The next step on the ladder was much, much bigger...
In June 2003 Olivia--the colo beast--was installed, made live & plugged in to the Internet Backbone that runs up the West coast of England. She is a formidable machine--especially for a Linux box--with twin 2.4GHz Xeon CPUs (4 virtual CPUs) and capable of handling scores of simultaneous connections with ease. This is excellent for site responsiveness & future growth, of course. The first stage of my .com site rewrite was uploaded on Christmas Day 2003, and the site bandwidth leapt immediately and has continued to grow as more & more people hit the site each day (1 million per month for .co.uk + .com combined). It is even recommended by the British Library (please forgive my snobbery). The next stage of the rewrite is certain to cause yet another vast increase in throughput. From a business point of view this is excellent, of course--more customers, more (potential) income. On a collocation, however, the owner pays for every bit & byte that goes both in & out of their machine. This makes my Yorkshire mind quiver (Yorkshire men are renowned for their thrift). Imagine my joy, then, to discover that it was possible to implement gzip-compression on web-pages...
This must be the best kept secret on the web. Most every modern browser is capable of receiving pre-compressed web-pages and presenting them in normal fashion. This includes Microsoft Internet Explorer... the point there being that ‘gzip’ is an open-source compression protocol used by--amongst others--Linux, which itself is viewed by Microsoft as the big-bad-daemon of computing. One of the few ‘browsers’ that cannot is Google’s search-bots. Tut tut, Google (the pages still appear, since the php-routine checks for browser capability).
The point--and the bottom line--is that compression is reducing pages on my .com site to one-third of their previous size. Some are reduced to one-fifth or even less. The server takes less than 0.002 seconds to do this, and the client browser is unlikely to take much longer to re-inflate them. It really is a win-win situation. My bandwidth costs at the colo shrink by two-thirds, and your download times also shrink by two-thirds. Just imagine what a boon it would be in alleviating internet congestion if everyone implemented it on their web-pages. The one downside that I can think of is that the entire page has to be downloaded before it can be presented on to the screen. Although many net-pages are like this anyway--do not, as one example, wrap the entire page in a table--my own have always been written so that something will appear immediately, even if later sections are delayed. This gives confidence in these days of the World Wide Wait.
Here is a screen shot taken from my bandwidth-monitoring page, viewed at about 9pm today; lines are drawn on a 5-minute average (green is out-traffic, blue is incoming):
In the next section will be links & info so that--if you are a webmaster with access to PHP--you can get stuck into this yourself, but it may be as well to point out that this is not the same as the compression which most every modem implements as part of it’s normal working. V.44, as one good example, is a development of the earlier V.42 compression protocol, and promises better throughput for V.92 modems even on the downstream path due to the more sophisticated routines employed. In fact, gzip pre-compression of pages makes this difference null & void, as any modem will find it difficult-to-impossible to compress such a bit-stream any further. The closest analogy on Windows would be to consider transferring a WinZip-ed .zip file.
Does this mean that the benefits for the client of pre-compression are removed by the loss of modem compression, then? No - mitigated, but by no means removed. A modem compresses only a handful of bytes at a time (depends on the buffer), whereas the gzip utility compresses the entire file at once. No modem will achieve two-thirds compression on a normal file. If you have a scientific bent, upload a text file to webspace, then a zipped version of the same file, and time them both (sorry, I really don’t have the time to do this myself).
This & following sections are for webmasters with access to PHP that would want to implement this for themselves. Also, here is an interesting page to test the possibilities. [30 Mar update: ob_gzhandler() is an alternative to the below, but was buggy until PHP 4.2.3. Olivia uses PHP 4.2.2 at this moment, so this is not an option for me.]
Principle of operation: Browsers which send an header say that they can decode a Content-Encoding: gzip file. Broadly, this is HTTP/1.1 browsers. Specifically, Internet Explorer version 4.0 on, Lynx 2.6 on, Mozilla 0.9.4 on, Opera 5.12 on & Netscape Communicator 4.6 on. In general, if HTTP/1.1 is not selected in the browser, it will not decompress compressed files. As a side-comment, only the content of files are ever compressed; headers are always sent without any compression.
Bugs: There are bugs in ie5.5 & ie6 decompression initiated by installing certain programs such as the Real player; these are fixed by having a file with the following specs:
The Apache web server has been able to gzip files since Autumn 2000, using the mod_gzip module (Apache/2 also has a deflate module, achieving a similar effect by a different route). In general, this is not part of a default installation. That is also the situation on my server.
Both the modem-help.com plus modem-help.co.uk sites are served up by Apache/2.0.40 at the time of writing. It was certainly a very attractive idea to use Apache to achieve web compression, not least because this would have compressed the .co.uk pages as well. In the end I decided to go the PHP route, principally because of the lesser pain & time involved.
Compression of php pages from within PHP itself takes just 3 lines per script, plus one additional line for the stats reporting. It actually took less time to both implement & check than this page has taken to write. In addition, by using PHP it becomes possible to report live on the page both whether compression is active, and the compression achieved if so.
Most of my work had already been done for me at Leknor.com with the gzip_encode Class - hence the comment at the top of the page. The one thing that was missing was the stats reporting. I added it, and the amended Class can be viewed as a text file (gzip_encode.include.txt) (23 March update: I couldn’t stand the thought of some people not being able to view the site because of bugs in their browser--not exactly very friendly--so spent a day adding a simple way to auto-switch compression off from user input). (18 June update: after a couple of months preparation, php on Olivia now has ‘register_globals = off’; this has needed some small changes in the Class.) The file includes full instructions, so I will not repeat it all here.
How it works: The Class requires that Output Buffering is turned on:
Following this line, all output will be stored into an internal buffer rather than sent immediately to the browser. As a side-effect, page Headers (cookies, whatever) can be sent after echo statements (that would normally raise an error). On the very last line of the script is an invocation of an instance of the Class:
and that’s it! (line 2 is the declaration of the Class, usually near the top of the script). The Class gzips the contents of the buffer plus sets up various Headers. Easy-peasy, and it seems to work fine.
Mine own contribution comes about because I am irredeemably anal-retentive! If you are going to add a new feature, then it is natural to me to also measure it. Good scientific method, but also a good Catch 22 in this case, since nothing can be added to the page once it has been compressed, and the stats can only be accumulated afterwards. The solution is to use place-holders in the page for the required values, and to do the compression sequence twice.
23 March update: my own version of the class now includes a simple way for a user to switch off compression. This will clearly be a necessity if they have a buggy browser. It is another Catch 22, of course, as any link cannot be seen on a compressed page if needed. The solution is to leave the Home page non-compressed, and to propagate any desire for non-compressed pages into all site links. The following are provided to acheive this:
22 Sep 2005 update: Conteg.include.0.10.txt [72 KB] (the link is actually v0.13.6)