Thursday, June 21, 2007

High Performance Reverse Proxy... Squid!


Little do you know but much of the internet world runs on a little known piece of software called Squid. While Apache, PHP, Ruby, Flash,.. etc may get all the glory, they would collapse on themselves without Squid.

Squid is a web caching server. In this article we will be using it as a reverse-proxy http accelerator for images. This can also apply to accelerating pretty much any static content from swfs, css files, file downloads, mp3s etc.. At HOTorNOT, we serve millions of static images via squid.

Choosing Hardware

We are choosing a very commodity "server" for this job.
  • CPU - a well priced 3.4ghz Pentium D - squid tends to be light on cpu with most of the time spent in the interrupt loop. More processors doesn't tend to help any so a Xeon or Opteron would be overkill for this job.
  • Memory - 4GB of 667mhz DDR2 - faster memory the better. the amount depends on how much stuff we're trying to accelerate. We want to serve most stuff from memory.
  • Disk - a very commodity 7.2k rpm sata drive with NCQ (native command queuing). Seagate for the win here. Squid can be heavy on the disk in terms of concurrent random reads and it's best to choose something with SCSI-like command queuing to help prioritize based on where
  • OS - Fedora 6 64bit. - A 32-bit OS is faster for this kind of job but we get limited by the amount of memory it can handle. For any machine with >= 4GB go for the 64bit.
  • Squid - Squid 2.6

Compiling
Here's our compile time options and some explanation.

ulimit -n8192 #it's common for squid to eat all 1024 file handles so we'll increase it here.
./configure \
--prefix=/usr/local/squid \
--enable-async-io \ #fork reads into multiple threads. good for concurrency. only good for linux.
--enable-icmp \ #not really sure.
--enable-snmp \ #for SNMP monitoring.
--enable-htcp \ #for cache peering. only useful if you hav emore than 1 cache.
--enable-ssl \ #some images need to be SSL'ed for the user to have that nice lock icon in the browser.
--with-large-files \ #cuz that's the way she likes it.
--with-build-environment=POSIX_V6_LP64_OFF64 \ #need to set this or compile will die. you'd think it'd be automagic by now..
--with-maxfd=8192 #again, squid likes to eat file handles like a hungry hungry hippo hippo.

Configuring

We basically want this machine to sit in front of our webservers, cache, and serve images for them. So a few key points with the configuation.

#set port to be 80 for http, with the default domain pix.hotornot.com
http_port 80 accel defaultsite=pix.hotornot.com

#our parent "cache" is our webservers at pix.hotornot.com
cache_peer pix.hotornot.com parent 80 7 no-query originserver

#use up the memory.
cache_mem 3500 MB #use up that 4GB! almost...

#set the max size of an object (so you don't cache an abnormally large image)
maximum_object_size 500 KB
maximum_object_size_in_memory 500 KB

#aufs is for async-io, offload disk reads to a thread. big cache is always good.
cache_dir aufs /hot/tmp/squid_cache 50000 16 256

#don't cache 404's, always go to origin servers
negative_ttl 0 minutes


Optimizations

This is a pretty cool one. Whenever squid reads from a cached file, it has to update the access time on the file by writing to disk. This actually becomes quite slow with a busy image server so switch this off!

edit your /etc/fstab, find the mount entry where the squid cache is and next to 'defaults' add 'defaults,noatime'.

or do it on commandline!

> mount -o remount,noatime /your_partition


A cheap server like the one we have listed here can push > 30Mbit of static traffic no problem.




No comments: