A Proxy Server Experiment: an Indication of the Changing Nature of the Web

MAJ Richard Howard
Directorate of Information Management United States Military Academy
West Point, NY 10996
Voice (914)938-7449
Fax: (914)938-5956
howard@exmail.usma.edu

MAJ Bernard J. Jansen
Department of EE & CS
United States Military Academy
West Point, NY 10996
Voice:(914) 938-3233
Fax: (914)938-5956
jjansen@acm.org

With the growing reliance on connectivity to the World-Wide Web (Web), many organizations have been experiencing trouble servicing their users with adequate access and response time. Increase bandwidth on more connections to Web can relieve the access problem, but this approach may not decrease the access time. Additionally, increase bandwidth comes at greatly increased cost. Therefore, many organizations have turned to the use of proxy servers. A proxy server is a Web server that caches Internet resources for re-use by a set of client machines. The performance increases of proxy servers has been widely reported; however, we could not locate any recent test of proxy server performance. Given the exponential growth of the Web in just the last year, we wondered if this would have an effect on the performance of proxy servers. Therefore, we conducted a 14-day proxy server experiment. The results of our experiment showed that the proxy servers actually decreased performance, i.e. access time. We review this experiment, analyze why the proxy sever failed to decrease access time, and draw conclusions on the changing nature of the Web and its impact on proxy servers.

Please Cite: Howard, R. and Jansen, B.J. 1998. A proxy server experiment: an indication of the changing nature of the web. Seventh International Conference on Computer Communications and Networks. Lafayette, Louisiana.

With the growing reliance on connectivity to the World-Wide Web (Web), many organizations have been experiencing trouble servicing their users with adequate access and response time as network and server loads have increased dramatically. Increased bandwidth (i.e., more or "bigger" connections to Web) can relieve the access problem, but increasing bandwidth may not decrease the response time for users. Additionally, increase bandwidth comes at greatly increased cost due to typical monthly charges. Therefore, many organizations, including businesses, schools, universities, government, and military organizations have turned to the use of proxy servers.

A proxy server is a Web server that caches Internet resources for re-use by a set of client machines. Caching proxies have been introduced to improve the system performance with the assumption that a page will be fetched many times before it is destroyed or modified. The performance increases of proxy servers has been widely reported. In fact, the praise of proxy servers has been almost universal. However, we could not locate any recently published, scholarly articles on tests of proxy server performance. Therefore, we conducted an experiment to measure if the use of a proxy server would decrease access time for users on the university WAN.

We present a brief description of and uses for a proxy server. We then review recent proxy server literature, both from the trade press and from scholarly journals. We then discuss the methodology of the experiment and the nature of our network on which the experiment was conducted. We follow this discussion with the results of the experiment and conclude with thoughts on why our results differed from those previously reported and the implications for the future of proxy servers on the Web

Proxy servers have two main purposes. They can improve performance and filter requests. By filtering requests, we mean an organization might use a proxy server to prevent its employees from accessing a specific set of Web sites. We are more concerned, for this experiment, in the first purpose of a proxy server, improving performance. Proxy servers can dramatically improve performance for users of an organization. This is because the proxy server saves the results of all requests for a certain amount of time.

For example, consider the situation where both User A and User B access the Web through the same proxy server. First, User A requests a certain Web page, which we will refer to as Web Page 1. The proxy sever will forward, based on the Uniform Resource Locator (URL), the request to the Web server where Web Page 1 resides. Depending on the network’s Web connection, the number of graphics in the Web page, etc., this can be a time consuming operation. Now, later, User B requests the same Web page. Instead of forwarding the request to the Web server where Page 1 resides, the proxy server simply returns the Web Page 1 that it already fetched for User A. Since the proxy server is usually on the same network as the user, this is a much faster operation. If this series of actions is repeated over several to hundreds of users, the performance increase via reduced access time can a real benefit to the users on a network. The major online services such as CompuServe and America Online, for example, employ an array of proxy servers to service thousands of users [7]. If User B had requested a Web page that had not been previously requested, the proxy server forwards the request to the real Web server designated in the URL. .

The storing of server requests by proxy servers is referred to as caching. As stated, it was this aspect of the proxy server that we were primarily interested in studying. Web pages are modified, deleted, renamed continuously, so the proxy server must have a means of checking to see if the page that it has in cache is the most current version. Briefly, a Web caching proxy server "cruises" the Web and examines pages that are currently cached on the server. If a page has been modified, the proxy server stores the new version on a local drive. Some proxy servers can also use certain guidelines to hit links on that page to pull down related pages. Most proxy servers are extremely efficient. They can examine and store thousands of Web pages, and when any local user on the LAN asks for a specific stored page, the page flies out of a local drive or cache without Internet transmission delays.

To ensure that the proxy server can do it job, the network must be set up so that users needing access to the Web must use the proxy server as their Internet gateway. One can accomplish this access control through proper router setup, which places all users "behind" the firewall.

Organizations that use Web proxy servers report that the proxy server’s caching technology has greatly reduced network costs. Many organizations bought proxy servers in order to decrease access time. However, they received an unexpected benefit when caching reduced traffic on the Internet connections. Industry analysts report that proxy servers often reduced traffic enough to eliminate the need to add bandwidth servers [3]. The demand for proxy servers among businesses, organizations, government, and academic was been strong. Microsoft, Netscape and Novell all offer proxy-server software as part of their Internet server suites. The demand is understandable given the reported increases in performance from users. Reported performance enhancements from proxy server for end users is typically about 20 to 25%, i.e., a one-quarter decrease in access time [1] [2]. Organizations also reported high volume of proxy server cache access, as high as 40% and extremely active caches with thousands of Web documents [3].

There has been very little discussion of limitations or negative aspects of proxy servers. In general, proxy servers generally cannot provide the sophisticated event statistics, reports, alarms and audit tracking of standalone firewalls. However, a high end proxy server goes for about $1,000. Firewalls cost from $5,000 to $50,000 [2]. Also, industry experts caution that proxy servers cannot take the place of a second, high-speed, Web access line if the first is overloaded. Other than these points, there have been very little reported drawbacks of proxy servers.

With all the positive reports, it is no wonder that proxy servers are still an active area of research. Most of the research focuses on methods to increase the performance aspect of proxy servers as opposed to the filtering aspect of proxy servers. Jeffery, Das, and Bernal [4] investigated the design and implications of an extended proxy server that shares cache resources not only itself, but also with near neighbors. They reported a substantial reduction in network work load that can be obtained from this proxy sharing. This shared cache also lead to a corresponding increase in performance. The best performance came from a simple implementation model that is non-hierarchical; proxies access each other using the natural topology of the Web. Instead of cache sharing, Law and Nandy [6] investigated a distributed proxy server architecture that can increase the service availability, provide system scalability, coupled with load balancing capability. The system employs TCP-based switching mechanism which has a finer session granularity, and more dynamic control on resource allocation. Finally, [5] researched the ability of proxy servers to cache video.

However, we could not locate any recent test of proxy server performance. Given the exponential growth of the Web in just the last or two year, we wondered if this growth would have an effect on the performance of proxy servers. Therefore, we conducted a 14-day proxy server experiment. We first installed a proxy server. We then review the sites that our network users commonly visit. We selected one site, www.microsoft.com, that was typical of the web sites that our users commonly visited. We then blocked direct access to this site for two weeks at the Firewall. This action forced all users that desired to visit www.microsoft.com to go through the newly installed proxy server. Without this block, users could bypass the proxy server and our data set would not be as dense. With any experiment of this type, one should get a sense of the size, traffic load, and nature of the network users.

The experiment was conducted at the United States Military Academy (USMA), which is a four year, undergraduate institution. USMA graduates about 1000 students, called cadets, per year. Almost all of the graduating cadets immediately serve in the US Army. All cadets, all faculty, and the majority of the staff have computers and Internet access.

The USMA network at has a Fiber Distributed Data Interface (FDDI) backbone at 100 Mbps. About half of our users are connected to the backbone via dedicated 10 Mbps lines. Other users are connected to the backbone via shared 10 Mbps lines. Our network is connected to the Web via two (2) connections, one DREN and one NIPRNET. DREN is the Defense Research Engineering Network and NIPRNET is uNclassified (but sensitive) Internet Protocol Routing NETwork). DREN is primarily for our education needs. NIPRNET is the military Internet. Both run at about T1 speeds, 1.5Mbps. This network serves the approximately four thousand cadets, the faculty and staff of the institution, plus the various staff agencies of the military post. Figure 1 illustrates the layout and size of the USMA network. Average utilization is 15-20% on the FDDI backbone, about 96% on the DREN Gateway, and about 80% on NIPRNRT Gateway.

The cadets use the network and Web extensively for courses, both in the classroom and for research projects. The faculty also uses the network and Web for teaching preparation and research. Some typical sites that are commonly visited are:

We were expecting to see a substantial performance gain by caching common documents that many users on the USMA visit repeatedly. Based on the architecture of our network, as explained above, the proxy server should have retrieved a large number of cached documents and delivered these at close to USMA network speeds (100 Mbps around the backbone and generally 10 Mbps to the user). Unfortunately, this did not happen. As an example, from 0010 hrs to 1945 hrs on 17 December 1997, the proxy server accepted 290,000 server requests. It only served 12,000 documents from its cache. This is a 4% hit ratio. We were expecting about a 20% hit ratio.

The factor that we failed to consider is that many web sites, including www.microsoft.com, are using dynamic Web pages and dynamic HTML to create their documents. Dynamic Web pages refer to Web content that changes each time it is viewed. For example, the same URL could result in a different page depending on any number of parameters, such as: geographic location of the reader, time of day, previous pages viewed by the reader, or profile of the reader. There are already many languages and technologies for producing dynamic HTML, including CGI scripts, Server-Side Includes, cookies, Java, JavaScript, and ActiveX. It appears that the number of dynamic Web pages will increase, especially with the advent of Dynamic HTML. Dynamic HTML are new HTML extensions that will enable a Web page to react to user input without sending requests to the Web server. Microsoft and Netscape have submitted competing Dynamic HTML proposals to W3C, which must now hammer out the final specification [7].

So, with Dynamic Web pages every time you go to the site, you may get a different page. It may be mostly the same information as the last time you visited the site, but the page is created on the fly. The proxy server can not cache this kind of web page in the normal way. Instead, the proxy server acts as a middleman. It sends a request to the distance Web server, copies the documents to the proxy server hard drive and then delivers the documents to the user. This had two effects: 1. There were very few documents stored in cache. Only, the static HTML documents. 2. For the dynamic Web pages, the proxy server was actually slowing down access time with the coping and updating of the documents.

For this experiment, we set out to see if the growth of the Web had changed the role and performance of proxy servers. We conducted a 14-day experiment and channeled users through a proxy server if they wanted to access www.microsoft.com, a major Web site. Based on previous trade reports, we expected about a 20% hit ratio. Instead, our hit ratio was one fifth of this expectation. We traced the cause to the increased use of dynamic Web pages. ,While the growth of the Web was not the major factor, it appears that as the Web has grown, it has and is evolving from a static to a dynamic information repository. Under this environment, the role of the proxy server will decrease as it less able to delivery performance enhancements. Obviously, product redesign is in order.

[1] Computerworld, April 21, 1997 v31 n16 p16(1) Proxy servers gain user appeal Laura DiDio.

[2] Computerworld, Nov 17, 1997 v31 n46 p6(1) Planning blunts Web traffic spikes. Sharon Machlis.

[3] Computerworld, Jan 26, 1998 v32 n4 p47(2) Web-caching servers cut network costs. Bob Wallace.

[4] Jeffery, C.L.; Das, S.R.; Bernal, G.S. Proxy-sharing proxy servers. 7-10 May 1996. Proceedings of COM’96. First Annual Conference on Emerging Technologies and Applications in Communications.

[5] Ki Dong Nam; Hyeun Tae Lee. Design of a virtual server for service interworking over heterogeneous networks. 20-22 Aug. 1997. 1997 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PACRIM. 10 Years Networking the Pacific.

[6] Law, K.L.E.; Nandy, B.; Chapman, A. A scalable and distributed WWW proxy system. 3-6 June 1997. Proceedings of IEEE International Conference on Multimedia Computing and Systems.