Many people find that the simple web server they built to host one or two web sites becomes increasingly unmanageable the more sites they add. Insufficient planning often makes
life incredibly difficult down the road as it is very difficult, if not impossible, to implement large scale changes on web servers already in production. As such, you should attempt to ensure that your
web servers will scale, not just in terms of actual system performance, but that the time you spend administering the machine does not increase exponentially as 20 hosted sites turns into 200.
What this means in practical terms is that you should spend the time to develop tools that will automate as many tasks as possible as well as coming up with a layout that is both easy to maintain and
flexible. I've come up with a few methods that may be helpful to you the next time you need to roll out an Apache based web server. Over the past few years they have certainly saved me a substantial
amount of time as any up front time I spent on development has been saved dozens of times over in reduced time spent on administration.
Since this document specifically relates to my own experiences, I'm going to assume you will be using the same software. That's not to say that these ideas may not be applicable in other cases, just that
you may need to do your own research as to how you can apply them. Which brings up another point. This document is not a step by step walk through that will hold your hand every step of the way. I am
making the assumption that you are already reasonably proficient with common systems administration tasks, and are just looking for new ideas to make your life easier.
- · FreeBSD 4.x or 5.x
- · Apache 1.3.x
- · ProFTPd 1.2.x
- · Cronolog
- · Webalizer
- File system Layout
Because I believe in adherence to a strict and coherent file system layout, I tend to follow FreeBSD's hier(7) manpage. As such, installing all of the above programs from FreeBSD's ports collection
works out perfectly since all of the programs will be installed in the correct location.
However, we should also consider how we'll handle the directories in which user's web sites will live. Since our ftpd is going to chroot users into their home directory, we don't necessarily want them
chrooted right into their document root, since we can do some neat things if we take them back one level. Regardless of the actual base path (/usr/local/www, /var/www, /home, etc), we should have
some subdirectories created. For examples sake, let's assume /usr/local/www is being used. Our layout could be:
/usr/local/www/www.domain.com/
/usr/local/www/www.domain.com/html/
/usr/local/www/www.domain.com/local-cgi/
/usr/local/www/www.domain.com/logs/
/usr/local/www/www.domain.com/stats/
The layout should now be obvious. /html is their document root, /local-cgi is the per-user cgi-bin, logs is for their apache logs (so they can download them themselves, should they be so inclined),
and /stats is for their Webalizer stats (which will be aliased in Apache to www.somedomain.com/stats). By chrooting them into /usr/local/www/www.domain.com/ they can FTP in and upload their content
while still having access to the other directories, and keeping said directories out of the document root.
Apache Configuration
One of the most common problems I see with people when trying to run hundreds of virtual hosts with Apache is that they have all of their configuration in the httpd.conf. Not only does this lead to a
bloated config file, but it also makes tracking down errors and making changes much harder than it need be. Apache has an include directive so by all means use it!
In my httpd.conf, I will define any global options I need, as well as the default http and https web sites. At the very bottom, I will specify a line:
Include /usr/local/etc/apache/virtual.conf
virtual.conf is a file which contains many lines, one for each hosted site, looking like so:
Include /usr/local/etc/apache/virtual/www.somedomain.com.conf
Include /usr/local/etc/apache/virtual/www.otherdomain.com.conf
Include /usr/local/etc/apache/virtual/subdomain.something.com.conf
Each of those included config files contains the virtual host declarations for the site mentioned in the filename. This lets me easily make custom changes on a per domain basis.
This also lets you do some neat tricks in terms of suspending sites. If www.somedomain.com hasn't paid his bill on time, just go into virtual.conf and comment out his line, give Apache a HUP, and now
his site is no longer accessible. You can make this even better by making your default virtual host one that simply has an index.html file stating that the site the user is requesting either does not
exist or is suspended. If you want to be really fancy, make it a PHP script that gets the requesting URL and print that out in the page as well.
Some things you may want to include in each domains configuration file might be:
- · Assuming you globally defined /cgi-bin in your httpd.conf as a ScriptAlias to the system cgi-bin directory, define another ScriptAlias of /local-cgi
to map to the local-cgi directory in the user's home directory. This lets you have a common pool of CGI scripts which you offer and support, but also lets user's have their own personal CGI files
which are not accessible by other users.
- · Alias /stats to the /stats directory in the user's home directory. We do this so that a careless user doesn't delete or overwrite their stats
directory, preventing the next webalizer run from running.
- · Add additional ServerAlias lines if you are using name based virtual hosts and there are multiple hostnames pointing at a single web site
- · Enable additional features, such as mod_speling, to enable case insensitivity in the case that you are migrating a web site from a Windows to a UNIX
server
- ProFTPd Configuration
Because ProFTPd is fairly difficult to configure correctly, mostly due to the state of the documentation, I'll give you a bit of help. Here is a sample proftpd.conf file similar to the one I use. It should be reasonably obvious what it does when you view it, but I do do a few things slightly differently that you may need to be aware of.
- · I always create a user and group called ftp for ProFTPd to run as, rather than running it as nobody/nogroup. You can do the same, or change the
config file to suit your own dummy users.
- · I disable reverse DNS and ident lookups since I neither need nor care about either, and they just slow down connections.
- · I have fairly strict access rules, limiting only 15 simultaneous connections, with a maximum of 2 connections per username. Obviously increase those
if your load requires it.
- · The chroot target is set to ~, so in other words, the user's home directory as listed in the passwd file
- · I disable the chmod command for everyone except users listed in the staff group because I do not want users going around changing permissions on
files/directories and breaking their web site (say, by removing world read permissions on their document root)
- · There is a section where you can reduce the privileges of certain users. i.e. denying some users access to delete files. This can be useful for
creating upload only accounts.
- · There is a special section at the end where I give an example of how to do per directory overrides. In that case, changing the default umask on a
directory.
- Log Rotation
Log rotation of Apache logs can be problematic on servers with many virtual hosts. Because you need to pass Apache a HUP after you rotate a log out, either you end up with many HUPs being sent over a
short period of time (rotating each log file individually and HUPing after each one), or rotating all of the files en masse (Which can take quite long if there are large logs from active sites) and
doing one HUP at the end.
A much better solution I have found is to use a program called Cronolog (/usr/ports/sysutils/cronolog/ on FreeBSD). Because Apache allows log file declarations to be a pipe, you can pipe your logging
to Cronolog and it handles the log rotation for you automatically. You simply specify the filename, complete with a pattern, and when the date changes, it will begin writing to the new file, no
rotation necessary.
For example, in your main httpd.conf, you could put the following directive:
ErrorLog "|/usr/local/sbin/cronolog /var/log/httpd/error_log.%d"
Which will log all error messages to a file in /var/log/httpd that varies depending on the day of the month. i.e. if it is January 21st, the file will be /var/log/httpd/error_log.21. The benefit of
doing it that way is that, at most, you will only ever have 31 error log files and you won't need to prune out old ones since the next month, the log will be overwritten when it comes to that day
again. Cronolog supports all manner of other variants thereof, so you could even do:
ErrorLog "|/usr/local/sbin/cronolog /var/log/httpd/%Y/%m/%d/error_log"
Which would then have log files of the form /var/log/httpd/2003/01/21/error_log, allowing you to to keep log files effectively forever (or until you run out of disk space).
There is a patch available for ProFTPd that will allow log files targets to be pipes, should you be so inclined, but I find it just as easy to use FreeBSD's newsyslog to do the rotation and just have
it HUP proftpd after it rotates out the common ftpd log.
Webalizer
Webalizer is relatively easy to configure, so my only recommendation is as to how you should run it. The one feature you'll want to enable, though, is the incremental feature. See the Webalizer docs
for more info. Aside from that, set the output directory to /stats in the user's home directory.
As usual, create an individual config file for each domain you're hosting, but also, create a shell script to run webalizer for that domain. So you might have:
/usr/local/etc/webalizer/conf/www.domain.com.conf
/usr/local/etc/webalizer/scripts/www.domain.com.sh
The script would just be a simple script calling webalizer correctly, most likely piping yesterday's access_log for the site to webalizer with a flag telling it to use the correct config file.
Once you've got that much, all you need is a wrapper to run all of said shell scripts, which could be something as simple as:
find /usr/local/etc/webalizer/scripts -type f -name "*.sh" | sh
And then just cron that to run shortly after midnight.
Automated Script
Assuming you've done all of the above, it now becomes totally trivial to automate adding new web sites. A simple script written in your language of choice (shell, perl, PHP, etc) can handle all of the
steps and could even be interactive and prompt for the required info. I'll leave this as an exercise to the reader, but your basic procedure should be:
- Get as input the username, password, domain, and if it is IP based rather than name based virtual hosting, IP address. Print it out and prompt before you proceed so you can verify you put everything
in correctly.
- Perform forward and (if not name based) reverse DNS lookups to make sure that's setup first. Make it possible to override this step in case you can't update the DNS yet, for whatever reason.
- Create the user. The best way to do this on FreeBSD is with the "pw useradd" program, since you can put all of the relevant fields on the command line, and pipe in the password.
- Create the base directory structure in whatever your chosen location is, including all subdirectories. Then, set the ownership to the user, and set appropriate permissions.
- Create the Apache .conf file for the domain in /usr/local/etc/apache/virtual
- Append the Include statement to /usr/local/etc/apache/virtual.conf
- Do any other tasks you might need to (creating a default index.html file in the user's document root, creating a .htaccess file to protect their /stats directory, etc)
- Restart apache to reload the configs. "apachectl graceful" is probably the best way to do this.
- If you do this well, you should be able to add and activate a web site in a single command. For example, using my script, I could setup www.widgets.com (who's IP is 10.11.12.13) on my server like so:
addweb widgets widG3yp4ss 10.11.12.13 www.widgets.com
Cleaning Up
You may be wondering why all of the config files and such I specified always used the fully qualified domain name. There is a very good reason for this. If you need to remove a web site, you should be
able to do so relatively safely and easily, and possibly even scripted. Since the filenames will be consistent, you can use find to your advantage to make things much easier.
Your procedure might be something like:
- Remove the Include entry from the /usr/local/etc/apache/virtual.conf
- Restart Apache
- Delete all of the config files relating to this domain. Assuming you followed my suggestions above, everything will be in /usr/local/etc. Thus, we can do:
find /usr/local/etc -type f -name "*www.domain.com*" -delete
- Remove the user ("pw userdel" will come in handy)
- Remove their home directory and all files
- Other Tricks
With everything nicely separated, reporting becomes easy, too. Let's say we want to find out how much disk space each site is taking up (assuming sites are laid out like:
/usr/local/www/www.domain.com):
find /usr/local/www -maxdepth 1 -mindepth 1 -type d | xargs du -s | sort -r -n
We can get an approximate idea of bandwidth usage, too, by mining the Webalizer stats in a similar fashion (note: this is an ugly hack of a script that I've never bothered fixing, it could probably be
simplified), assuming that our webalizer directory works out to be /usr/local/www/www.domain.com/stats:
/usr/bin/find /usr/local/www -mindepth 3 -maxdepth 3 -name webalizer.hist | /usr/bin/awk -F/ '{print "/bin/echo " $5 "\n/usr/bin/grep -w \"^SWAPDATE\" " $0 " |
/usr/bin/cut -d \"\ \" -f 6\n/bin/echo -n ^"}' | /usr/bin/sed s/SWAPDATE/`/usr/local/bin/gdate -d "1 month ago" +%m`/g | /bin/sh | /usr/bin/tr '\n' ',' | /usr/bin/tr '^' '\n'
| /usr/bin/sort -t , -k 2 -n -r | /usr/bin/awk -F, '{ printf("%40s \tKByte Usage: %s\n",$1,$2)}'