Automated ClamAV Virus Scanning

Automating Linux Anti-Virus Using ClamAV and Cron

Thankfully Linux isn’t a platform which has a significant problem with Viruses, however it is always better to be safe than sorry. Luckily ClamAV is an excellent free anti-virus solution for Linux servers. However, at least on RedHat Enterprise 5 (RHEL5) the default install doesn’t offer any automated scanning and alerting. So here is what I’ve done:

The following steps assume you are using RHEL5, but should apply to other Linux distributions as well.

First, you’ll want to install ClamAV:

yum install clamav clamav-db clamd
/etc/init.d/clamd start

On RHEL5 at least this automatically sets up a daily cron job that uses freshclam to update the virus definitions, so that’s good.

Next I recommend removing the test virus files, although you can save this until after you test the rest of the setup:

rm -rf /usr/share/doc/clamav-0.95.3/test/

Now we want to setup our automation. I have a daily cron job that scans the entire server which can take several minutes, and then an hourly cron job that only scans files which were created or modified within the last hour. This should provide rapid notification of any infection without bogging your server down for 5 minutes every hour. The hourly scans run in a couple of seconds.

Each scanning script then checks the scan logs to see if there were any infected files found, and if so immediately sends you a notification e-mail (you could set this address to your mobile phone’s SMS account if you wanted).

The Daily Scan:

emacs /etc/cron.daily/clamscan_daily

Paste in:

#!/bin/bash

# email subject
SUBJECT="VIRUS DETECTED ON `hostname`!!!"
# Email To ?
EMAIL="me@domain.com"
# Log location
LOG=/var/log/clamav/scan.log

check_scan () {

	# Check the last set of results. If there are any "Infected" counts that aren't zero, we have a problem.
	if [ `tail -n 12 ${LOG}  | grep Infected | grep -v 0 | wc -l` != 0 ]
	then
		EMAILMESSAGE=`mktemp /tmp/virus-alert.XXXXX`
		echo "To: ${EMAIL}" >>  ${EMAILMESSAGE}
		echo "From: alert@domain.com" >>  ${EMAILMESSAGE}
		echo "Subject: ${SUBJECT}" >>  ${EMAILMESSAGE}
		echo "Importance: High" >> ${EMAILMESSAGE}
		echo "X-Priority: 1" >> ${EMAILMESSAGE}
		echo "`tail -n 50 ${LOG}`" >> ${EMAILMESSAGE}
		sendmail -t < ${EMAILMESSAGE}
	fi

}

clamscan -r / --exclude-dir=/sys/ --quiet --infected --log=${LOG}

check_scan
chmod +x /etc/cron.daily/clamscan_daily

The Hourly Scan:

emacs /etc/cron.hourly/clamscan_hourly

Paste in:

#!/bin/bash

# email subject
SUBJECT="VIRUS DETECTED ON `hostname`!!!"
# Email To ?
EMAIL="me@domain.com"
# Log location
LOG=/var/log/clamav/scan.log

check_scan () {

	# Check the last set of results. If there are any "Infected" counts that aren't zero, we have a problem.
	if [ `tail -n 12 ${LOG}  | grep Infected | grep -v 0 | wc -l` != 0 ]
	then
		EMAILMESSAGE=`mktemp /tmp/virus-alert.XXXXX`
		echo "To: ${EMAIL}" >>  ${EMAILMESSAGE}
		echo "From: alert@domain.com" >>  ${EMAILMESSAGE}
		echo "Subject: ${SUBJECT}" >>  ${EMAILMESSAGE}
		echo "Importance: High" >> ${EMAILMESSAGE}
		echo "X-Priority: 1" >> ${EMAILMESSAGE}
		echo "`tail -n 50 ${LOG}`" >> ${EMAILMESSAGE}
		sendmail -t < ${EMAILMESSAGE}
	fi

}

find / -not -wholename '/sys/*' -and -not -wholename '/proc/*' -mmin -61 -type f -print0 | xargs -0 -r clamscan --exclude-dir=/proc/ --exclude-dir=/sys/ --quiet --infected --log=${LOG}
check_scan

find / -not -wholename '/sys/*' -and -not -wholename '/proc/*' -cmin -61 -type f -print0 | xargs -0 -r clamscan --exclude-dir=/proc/ --exclude-dir=/sys/ --quiet --infected --log=${LOG}
check_scan
chmod +x /etc/cron.hourly/clamscan_hourly

Protected System

You should now have a well protected system with low impact to system performance and rapid alerting. Anti-Virus is only one piece of protecting a server, but hopefully this makes it easy to implement for everyone.

Server Backups to S3

Backing Up Your Server, Locally and Remotely

After my recent hard drive failure, improving my backup strategy was high on my list. My server has two drives, so what I wanted was a local backup on the second drive for quick recovery in case the primary drive failed, and a remote backup somewhere in case the whole server tanked. It had to be cost effective, and automated.

First I had to pick a remote backup option. I looked at backup options from my hosting provider, and getting a second server, but they were both too expensive. I have about 200 GB to back up, so things add up pretty fast. Finally I settled on Amazon’s S3 service. It’s affordable, it’s outside of my data center, fast (I get about 10 Mbit/sec to and from S3) and should be pretty reliable.

Once I’d decided on using S3, I started looking at software to handle the backup. Initially I tried using backup-manager, however I ran into several issues with it. First, the S3 upload can’t handle a file larger than 2 GB, so you can’t use the tar archiver. You have to use the dar archiver. Which doesn’t handle incrementals. It also seemed a bit flakey overall. The S3 uploads would fail without any useful error messages, the dar archiving burned a ton of CPU, etc… So I gave up on it.

I ended up rolling my own. I have a bash script that backs everything up to the secondary hard drive using rsync, and then backs that up to S3 using the s3cmd utility. You can’t backup to S3 directly if the directory you’re backing up contains files that will disappear during the S3 sync. For instance, if you’re backing up your user’s home directory, the s3cmd sync will first catalog all the files it needs to backup, and then will sync them up to Amazon. If some files don’t exist anymore (such as e-mail messages that have been deleted) the s3cmd sync fails with an error. Same goes for log files, spool files, blog and gallery cache files, etc… So even if you don’t have a secondary drive, you need to rsync your files over to a backup area, before you sync them up to S3.

I also wanted a bit of a safety net, in case I accidentally delete a file. So I have the rsync only sync over deletes every other Sunday, and on alternate Sundays I have the S3 sync propagate deletes up to S3. That means that any file I delete exists somewhere for 1-2 weeks.

Click below to see how the backups work and get the bash script I’m using.
Continue reading

ATG SEO – Tools and Traffic

SEO Tools

Other than using lynx to test your site, there are several other tools I would recommend utilizing in your quest for better SEO. One great tool is the SEO Analysis Tool from SEOWorkers.com. This tool quickly analyzes the page, and provides excellent reports on the size and relevancy of your meta tags. It also provides data on the frequency of keywords, singly and in groups of two and three. Here is a part of the a report for sparkred.com:

Google Webmaster Tools for SparkRed.com

I’ve already mentioned utilizing Google Analytics to monitor what keywords are bringing you good traffic versus bad. I’d also recommend another Google tool, Google Webmaster Tools. Google provides a TON of useful data here. You can see things like the crawl frequency, page response times during the crawl, and page rank of your pages (including a by month breakdown). You can also use quick links to see related pages, pages which link in to your site, all the indexed pages of your site, and more. It also gives you diagnostics into any problems with the crawl or your meta tags.

How to Promote Your Site

In order to drive traffic to your site and to increase your ranking in search engines, such as your Google Page Rank, you want high quality links pointing to your site. By high quality links I don’t mean spam links, link farms, or the like. I mean links on relevant sites pointing your site, driving high quality traffic.

There are two halves to this. The first is to provide really good quality content on your site. This means having a great site, it means updating your site frequently, and it means providing non-core but helpful related content. An example of that might be having a blog that provides related content. For instance for MyShoeStore.com you might have a blog with posts about the latest fashions and how to match shoes to each look, the best sales you’re currently running, the 10 most extreme shoes of all time, photos of the shoes worn by celebrities that week, etc… You want something that will have people returning to visit your site more frequently, and something that will make other people link to your site in their blogs, tweets, Facebook pages, IMs to friends, and so forth. If you have a blog post about the shoes on the red carpet at the People’s Choice awards, people will link to that from tons of forums discussing fashion. Those a great links! Relevant high quality links bringing in high quality traffic. And you just need a blogger. A college fashionista who wants to get into fashion journalism will do it for cheap.

The second is to promote your site on other sites. Find forums and communities that are filled with the people you want to attract to your site, signup, and contribute to the community conversation. Note I didn’t say spam the community. Post useful stuff. Even stuff that links to sites that aren’t yours. Be a real contributing helpful member of the site. In your site profile, have a link to your site, and when it’s relevant feel free to link to your site, your latest sale, whatever, as long as it’s relevant. Obviously if you’re an ATG developer or a VP of IT, it’s probably not you doing all this, but have someone do it.

Those two things are how you get traffic and a high page ranking in the search engines. It’s all about having good content and being an upstanding citizen of the relevant online communities.

Setting Up SPF, SenderId, Domain Keys, and DKIM

If you run a mail server, and if you hate spam, you should setup your mail server to make use of all the best anti-spam tools available. There are two sides to spam, sending and receiving.

On the receiving side, you have things like blacklists, spamassassin, bayesian filtering, and lots more. I’ll probably cover this side of things in greater depth in another post.

On the sending side, first and foremost, you have to ensure your server is not acting as an open relay, and allowing spam to be sent through it. After that’s done, you want to be sure that e-mail you send isn’t flagged as spam by people receiving it. And, being a good e-mail citizen, you you want to support the anti-spam standards that are out there.

There are four primary standards for verifying senders and servers.

Sender Policy Framework (SPF) – from their FAQ:

Sender Policy Framework (SPF) is an attempt to control forged e-mail. SPF is not directly about stopping spam – junk email. It is about giving domain owners a way to say which mail sources are legitimate for their domain and which ones aren’t. While not all spam is forged, virtually all forgeries are spam. SPF is not anti-spam in the same way that flour is not food: it is part of the solution.

SenderId – a Microsoft technology which is very similar to SPF:

The Sender ID framework, developed jointly by Microsoft and industry partners, addresses a key part of the spam problem: the difficulty of verifying a sender’s identity.

DomainKeys – from Wikipedia:

DomainKeys is an e-mail authentication system designed to verify the DNS domain of an e-mail sender and the message integrity.

DKIM – an evolved form of DomainKeys, from Wikipedia:

DKIM uses public-key cryptography to allow the sender to electronically sign legitimate emails in a way that can be verified by recipients. Prominent email service providers implementing DKIM (or its slightly different predecessor, DomainKeys) include Yahoo and Gmail. Any mail from these domains should carry a DKIM signature, and if the recipient knows this, they can discard mail that hasn’t been signed, or that has an invalid signature.

SenderId is primarily used by Microsoft mail services like Hotmail/MSN, while DomainKeys and DKIM are primarily used by Yahoo. SPF is used by many mail services.

I’m going to walk you through setting up these anti-spam technologies. I will be setting them up for my domain, digitalsanctuary.com, and using my mail server, which is postfix running on Debian. Your setup and requirements may vary.
Continue reading