Sunday, January 15, 2006

Web Cache

Currently web-Cache applications are storing MACRO-CONTENT query results. I assume that in the near future there will be storage of MICROCONTENT query results which will change dramatically the way people get their answers: Instead of getting many links to macro-content pages web-cache will retrieve one page with many relevant microcontent chunks.

In order to check the feasibility of this idea I asked QT/Saver to collect some information on the subject:

Contents:

To Increase Speed
Multiple Cache Servers
Little Used In North America
Cachelink
Cache-Friendly
Oracle Application Server
Site Dedicated To Caching
Book on Web Caching
Web Proxy
Reverse Proxy
Web Cache Copyright

To Increase Speed

Caching is a way to store requested Internet objects (e.g. data like web pages) available via the HTTP, FTP, and gopher protocols on a system closer to the requesting site. Web browsers can then use the local Squid cache as a proxy HTTP server, reducing access time as well as bandwidth consumption. This is often useful for Internet service providers to increase speed to their customers, and LANs that share an Internet connection. Because it is also a proxy (i.e. it behaves like a client on behalf of the real client), it provides some anonymity and security.
Suppose slow.example.com is a "real" web server, and www.example.com is a Squid cache server that "accelerates" it. The first time any page was requested from www.example.com, the cache server would get the actual page from slow.example.com, but for the next hour/day/year (matter of cache configuration) every next request would get this stored copy directly from the accelerator. The end result, without any action by the clients, is less traffic to the source server, meaning less CPU and memory usage, and less need for bandwidth.
Multiple Cache Servers
http://en.wikipedia.org/wiki/Web_cache Web cache - Wikipedia, the free encyclopedia
All major websites which routinely receive millions of queries per day require some form of web caching. If multiple cache servers are used together, these may coordinate using protocols like the Internet Cache Protocol and HTCP.

Little Used In North America

The Cache Now! campaign is designed to increase the awareness and use of proxy cache on the Web.
Web cache offers a win/win situation for both content providers and users, yet is little used in North America.
Cachelink
http://www.mangosoft.com/products/cachelink Mangosoft, Inc
Mangosoft's Cachelink software dramatically speeds access to commonly viewed web pages. Rather than going "outside" to the Internet to collect the information, Cachelink enables the information to be gathered "inside" within a local area network (LAN). This is achieved by storing web information within your local network - a technique known as "caching". Cachelink aggregates the cache from all of the PC's on a LAN and makes it available to the entire network.

Cache-Friendly
Caching Tutorial for Web Authors and Webmasters
The best way to make a script cache-friendly (as well as perform better) is to dump its content to a plain file whenever it changes. The Web server can then treat it like any other Web page, generating and using validators, which makes your life easier. Remember to only write files that have changed, so the Last-Modified times are preserved.

Oracle Application Server
OracleAS Web Cache
Oracle Application Server Web Cache 10g is the software industry's leading application acceleration solution…Built-in workload management features ensure application reliability and help maintain quality of service under heavy loads. And new in this release, end-user performance monitoring features provide unparalleled insight into end-user service levels.

http://www.oracle.com/technology/products/ias/htdocs/9iaswebcache_fov.html Oracle9ias Web Cache--Feature Overview--Oracle Corporation
ORACLE9iAS WEB CACHE Oracle9iAS Web Cache improves the scalability, performance and availability of e-business Web sites.
Web Cache combines caching, compression and assembly technologies to accelerate the delivery of both static and dynamically generated Web content. As the first application server to implement ESI, Oracle9i Application Server boasts the industry's fastest edge server, with support for partial-page caching, personalization and dynamic content assembly at the network edge. Oracle9iAS Web Cache also provides back-end Web server load balancing, failover and surge protection features which ensure blazing performance and rock-solid up-time.
Oracle9iAS Web Cache understands the contents of HTTP headers -- including cookies --and is capable of making caching and routing decisions based on administrator or application-defined cacheability rules. This "content awareness" makes it possible for administrators to cache different content for different categories of visitors, such as the ability to show full prices to new customers and discounted prices to returning customers.

Site Dedicated To Caching
Web Caching and Content Delivery Resources
Welcome to my web cache and content delivery network pages. This site is dedicated to providing a comprehensive guide to the resources about and in support of caching and content delivery on the World Wide Web. If you know of something that we are missing, make sure you tell us about it! This field, like many areas of the Web, is constantly changing so bookmark this site and come back often!

Book on Web Caching
Online Catalog: Web Caching, First Edition
A properly designed web cache, by reducing network traffic and improving access times to popular web sites, is a boon to network administrators and web users alike. This book hands you all the technical information you need to design, deploy, and operate an effective web caching service. It also covers the important political aspects of web caching, including privacy and security issues.

Web Proxy
A common proxy application is a caching Web proxy. This provides a nearby cache of Web pages and files available on remote Web servers, allowing local network clients to access them more quickly or reliably.
When it receives a request for a Web resource (specified by a URL), a caching proxy looks for the resulting URL in its local cache. If found, it returns the document immediately. Otherwise it fetches it from the remote server, returns it to the requester and saves a copy in the cache.
Google's Web Accelerator is an example of a split proxy.
Privoxy is a free, open source web proxy with privacy features
Sun Java System Web Proxy Server, formerly Sun ONE Web Proxy Server.

Reverse Proxy
A reverse proxy is a proxy server that is installed in the neighborhood of one or more servers..
There are several reasons for installing reverse proxy servers:
Encryption / SSL acceleration: when secure websites are created, the SSL encryption is sometimes not done by the webserver itself, but by a reverse proxy that is equiped with SSL acceleration hardware.
Load distribution: the reverse proxy can distribute the load to several servers, each server serving its own application area.
Caching static content: A reverse proxy can offload the webservers by caching static content, such as images. Proxy caching of this sort can often satisfy a considerable amount of website requests, greatly reducing the load on the central web server.
The Apache HTTP Server may be used as a reverse proxy.

Web Cache Copyright
Some people worry that web caching may be an act of copyright infringement. In 1998 the DMCA added rules to the United States Code (17 Sec. 512) that largely relieves system operators from copyright liability for the purposes of caching.
http://news.com.com/2100-1038_3-1024234.html Google cache raises copyright concerns | CNET News.com
As seemingly benign and beneficial as it is, some Web site operators take issue with the feature and digitally prevent Google from recording their pages in full by adding special code to their sites. Among other arguments, they say that cached pages at Google have the potential to detour traffic from their own site, or, at worst, constitute trademark or copyright violations. In the case of an out-of-date news page in Google's cache, a Web publisher could even face legal troubles because of false data remaining on the Web but corrected at its own site.
Admittedly, Google's cache is like any number of backdoors to information on the Web. For example, proxy servers can be the keys to a site that is banned by a visitor's hosting Web server. And technically, any time a Web surfer visits a site, that visit could be interpreted as a copyright violation, because the page is temporarily cached in the user's computer memory.
A provision in the Digital Millennium Copyright Act (DMCA) includes a safe harbor for Web caching. The safe harbor is narrowly defined to protect Internet service providers that cache Web pages to make them more readily accessible to subscribers. For example, AOL could keep a local copy of high-trafficked Web pages on its servers so that its members could access them with greater speed and less cost to the network. Various copyright lawyers argue that safe harbor may or may not protect Google if it was tested.

No comments: