Introduction
Caching is a very important topic when it comes to static assets of a web page. Images,css are commonly cached by the browser to avoid the cost of a network round trip.We can enable additional layers of content caching in number of ways.We can implement caching provided by cloud provider CDN’s like AWS,cloudflare,Akamai.They offer geo-location based HA and scalable content caching.
But here we are not going to talk about this distributed caching, rather we will see caching at more ground level. We will use an apache httpd server and proxy a backend service. We will enable caching at the httpd server and see how various headers like cache-control,last modified date,Etag etc takes control of the way the Cache is loaded and validated.Note that the end goal is see have minimum network traffic.
Our demo environment
To simulate a proxy server and a backend server.We are going to create 2 apache server running in different ec2 instances
ReverseProxy Server 13.127.108.184
BackendService 13.126.84.157
Throughout the post, we assume
Shared cache = Cache at the apache server
Local cache = Cache at the browser
Before we implement cache, the behavior without a shared cache is somewhat like below. So in this case we only have local cache
If we add the shared cache in the apache web server which proxies the backend service,the flow will be like
And for already locally cache resources
Configuration at Reverse proxy
We first load the landing page which will have link to the downstream service.
Remember there is a difference between linking a page vs loading (f5)
of a page in the browser directly.
The former honors the local cache while the later sends a set max-age=0 to revalidate the cache immediately.
We will discuss this behavior further, but for now lets have look at the html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>load demo</title>
<style>
body {
font-size: 12px;
font-family: Arial;
}
</style>
<script src="https://code.jquery.com/jquery-3.5.0.js"></script>
</head>
<body>
<b>Projects:</b>
<ol id="new-projects"><a href="http://13.127.108.184/pics/files/mona.jpg">clicked</ol>
<script>
</script>
</body>
</html>
we now need to proxy pass to the backend service
.
For proxy pass
the configuration is very straight forward as below
RewriteEngine on
RewriteRule ^/pics/(.*)$ "http://13.126.84.157/img/$1" [P]
ProxyPassReverse "/pics/" "http://13.126.84.157/img/"
So when we call http://13.127.108.184/ it calls the down stream service using url rewrite as follows
http://13.127.108.184/pics/files/mona.jpg --> http://13.126.84.157/img/files/mona.jpg
We have to configure our cache here too.Lets look at the cache configuration here
# Cache module
LoadModule cache_module modules/mod_cache.so
<IfModule mod_cache.c>
# Mode of caching is disk
LoadModule cache_disk_module modules/mod_cache_disk.so
<IfModule mod_cache_disk.c>
# Location to store the cache
CacheRoot "/tmp"
# Cache specifically files under /pics/files/
CacheEnable disk "/pics/files/"
# No of directories
CacheDirLevels 1
# Length of character in each directory
CacheDirLength 1
# Don't allow the browser to take control of the cache
CacheIgnoreCacheControl On
</IfModule>
</IfModule>
The CacheDirLevel
decides how many directories to create from the hash string and the CacheDirLength decides how many characters are in each directory name.
For example, if you have a file that hashes to “abcdefghijklmnopqrstuvwxyz”, then a CacheDirLevel
of 2 and a CacheDirLength
of 4 would lead to this file being stored in:
[path_of_cache_root]/abcd/efgh/ijklmnopqrstuv
CacheIgnoreCacheControl is a very important property to check here.The shared caching will not have any effect if this is turned off (it is by default off).If this is off and a browser/user agent sends a header with cache-control max-age=0 (to validate) or cache-control no-cache(completely ignore cache and reload) then the shared cache at the server side is completely ignored and request is passed down to the downstream service.Which we certainly don’t want. We want the server the control of how the shared cache is served back to the browser.
We honor the client till proxy level and not beyond that.
The cache validity is set by
Header append Cache-Control max-age=3600
This actually sets the validity of the shared cache to 3600 secs(1 hr). This information is also send back to client, So the local cache of the client updates its validation accordingly.
Note that this actually tell the browser/UA to retain the local cache for this period of time before contacting to server at all.
we will not able to view this feature if we don’t use link.If you press enter or F5
or ctrl+F5
it will revalidate/reload explicitly without honoring
the local cache validity.
Thats why we had a link in our home page, instead of directly loading through browser url.
See this behavior in the images below
when we use a link,that is we click the link in the homepage to load the image, not load it directly from the browser bar
home page -> url
when we hit enter/f5 (technically this goes as set max-age=0)
when we cntrl+f5
or disable cache option (technically this goes as set cache-control= no cache)
Configuration at Resource
Here will hold 2 images at 2 different directory so that we can see if we can narrow down the directory we want to cache.In our case only mona.jp
[root@ip-172-31-8-212 html]# tree
.
├── img
│ ├── admin
│ │ └── files
│ │ └── basketball.png
│ └── files
│ └── mona.jpg
└── index.html
we plan to cache on /img/files
and NOT img/admin/files
Be aware of the local cache
The shared cache does not store 304 response from the downstream service , valid response cashable is 200. Which means that for the 0th cache load, we need a 200
response.Till that point even if you enable caching it will be skipped
by the clients who already have the latest cache.
loading up the cache can be done in one of the following ways
- Clear cache from browser and hit url
- Open browser in disable cache mode (effectively this sets cache-control header to no-cache while sending request)
- A new client
- Use a curl to call the endpoint
Important thoughts
- For the 0th client call (one of the four scenario mentioned above) will hit the backend server and load the cache. Any subsequent call will be served from the cache.
- Once the shared cache is loaded, any further client request to
revalidate
alocal cache
(304) will result in revalidation of shared cache once an hour(max-age) by the proxy server with the backend resource. - Any subsequent calls to revalidate local cache by any client within an hour will be directly served as 304 from the shared cache
Cache logs
It is useful to see what under the hood the cache is doing.We put log level as debug in httpd.conf
Let see it in action
Open a browser to simulate the calls
Tail error log in proxy server (to view cache logs)
tail -f /var/log/httpd/error_log
Tail access log in backend server (to view incoming request)
tail -f /var/log/httpd/access_log
Lets load the cache using curl
curl http://13.127.108.184/pics/files/mona.jpg
In the server we see the shared cache created as below
[root@ip-172-31-13-147 tmp]# ll
drwx------ 3 root root 17 May 9 09:55 systemd-private-17a7e1ac8de74e63b1a3449f23a647ad-httpd.service-Dy0HSb
also in cache log
cache: Caching url http://13.127.108.184:80/pics/files/mona.jpg? for request /pics/files/mona.jpg
AH00770: cache: Removing CACHE_REMOVE_URL filter.
AH00737: commit_entity: Headers and body for URL http://13.127.108.184:80/pics/files/mona.jpg? cached.
Now the cache in place,lets make a call from the browser for a 200
response(using cntrl+F5
) and see if it returns from the shared cache
AH00781: Incoming request is asking for a uncached version of /pics/files/mona.jpg,
but we have been configured to ignore it and serve a cached response anyway
AH00763: cache: running CACHE_OUT filter
AH00764: cache: serving /pics/files/mona.jpg
Did you notice that, the webserver altered us that the browser is trying for a full reload?
But we don’t want that because the cache is fresh.This is the importance of ignoring the forcing behavior of revalidating/reloading of shared cache from browser CacheIgnoreCacheControl On
.
Now we make a curl request(after 5 mins of cache expiry).We see the server1 revalidates the shared cache.Server2 says there is no change so 304
.
AH00737: commit_entity: Headers and body for URL http://13.127.108.184:80/pics/files/mona.jpg?
cached., referer: http://13.127.108.184/
AH02971: cache: serving /pics/files/mona.jpg (revalidated), referer: http://13.127.108.184/
13.127.108.184 - - [09/May/2020:10:09:48 +0000] "GET /img/files/mona.jpg HTTP/1.1" 304 - "http://13.127.108.184/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
Now from browser if we call then it returns from the shared cache again
AH00698: cache: Key for entity /pics/files/mona.jpg?(null) is http://13.127.108.184:80/pics/files/mona.jpg?
AH00709: Recalled cached URL info header http://13.127.108.184:80/pics/files/mona.jpg?
AH00720: Recalled headers for URL http://13.127.108.184:80/pics/files/mona.jpg?
Note that we are hitting the image directly till now
http://13.127.108.184/pics/files/mona.jpg
Had we used the link (within the 5 min time period of expiry) it will still be loaded from the disk(the local cache)
Links
- Understanding Request, RITM, Task in ServiceNow
- Steps to create a case in ServiceNow (CSM)
- Performance Analytics in 10 mins
- Event Management in 10 minutes - part1
- Event Management in 10 minutes - part2
- Custom Lookup List
- Script includes in 5 minutes
- Interactive Filter in 5 minutes
- UI Policy in 6 Minutes
- Client Side Script Versus Server Side Script in 3 minutes
- Java
- ACL
- Performance analytics(PA) Interactive Filter
- Various Configurations in Performance analytics(PA)
- Service Portal
- Performance Analytics(PA) Widgets
- Performance Analytics(PA) Indicator
- Performance Analytics(PA) Buckets
- Performance Analytics(PA) Automated Breakdown
- Client Script
- Rest Integration
- Understanding the Request, RITM, Task
- Service Catalogs
- Events in ServiceNow
- Advance glide script in ServiceNow
- CAB Workbench
Comments