Site Network: Personal | Professional | Photography

Technical Blog

This blog will contain content related to Java, Seam, Security, my sites and projects, as well as other technical subjects I am interested in.

Comments and questions are welcome!

DDOS Against 10MinuteMail

February 21st, 2010

You may have noticed 10MinuteMail was unavailable for a few minutes over the last couple of days. 10MinuteMail recently came under a DDOS attack which locked up the site a few times. Most of the malicious traffic came from the Netherlands, Germany, and to a lesser extend other European countries and the USA. Initially I dealt with it by generating a list of the malicious IPs and adding them to my block list. However, the DDOS kept spreading (botnet?) so I finally did what I should have done ages ago, and tuned my CSF/IPTables firewall to block DDOS patterns. So far so good:)

I have NO IDEA why anyone would be attacking 10MinuteMail. It’s very odd.

Making Search Keywords Easy

February 21st, 2010

I was recently contacted by SortFix who introduced their offering to me and thought maybe I’d be interested in writing a blog post about SortFix. (full disclosure: I have received nothing from SortFix other than their e-mail request).

SortFix is basically a value added search provider who wraps Google search results. Their approach is to analyze other keywords which appear frequently in your search results. You can then drag these other high frequency keywords to either the “Add to search” box, or the “Remove” box. There’s also a “Dictionary” box which defines keywords you may not know.

For example, if you search for “RS6“, you’ll get “Power words” like “v10″, “performance”, “audi”, “carlos”, 2010″, “2003″, “juan”, etc… By adding or removing those keywords, you can tune your search for either the 2003 edition of the RS6, or the new 2010 one, or you can check out the king of Spain, Juan Carlos’, ride.

I can see this being useful for people who don’t have super good Google-Fu. I don’t see myself using it, but I can see it being useful for many other folks. Another point against it is that currently it’s a Flash based interface, and generally I avoid Flash as much as possible. Apparently they are working on a non-Flash version, which would be a nice improvement IMHO.

I really like the idea the idea of offering up high frequency additional keywords to people who are searching for things to help them refine their search. I can see this being very useful for onsite eCommerce searching, helping narrow down products based on common attributes, etc…

Make Google Ignore JSESSIONID

February 16th, 2010

Search engines like Google will often index content with params like JSESSIONID and other session or conversation scope params. This causes two problems: first the links returned in the Google search results can have these parameters in them, resulting in “session not found” or other incompatible session state issues. Secondly it can cause a single page of content, to be indexed multiple times (with differing parameters) this diluting your page’s rank.

I’ve posted two solutions to this issue in the past: Using Apache to ReWrite URLs to remove JSESSIONID and a more advanced solution of using a Servlet Filter to avoid adding JSESSIONID for GoogleBot Requests.

Now there’s an even better way to handle this. Google has added an amazing new feature to their Webmaster Tools which allows you to specify how the GoogleBot indexer should handle various parameters. You can ignore certain parameters such as JSESSIONID, cid, and others, and also specifically not ignore other parameters such as productId, skuId, etc…

Log into your Google Webmaster Tools, and select the site you wish to work with. Under “Site Configuration” -> “Settings” there is a new section at the bottom called “Parameter handling”. Click on “adjust parameter settings” to expand the parameter handling configuration for your site. Sometimes Google will suggest various parameters it has discovered while crawling your site, and other times you just enter the parameters you want Google to ignore or pay attention to.

Google Webmaster Tools Parameter Handling Interface

This is a much more elegant solution to the JSESSIONID problem, and also allows you to easily handle other parameters your site may use for either session state or dynamic content generation correctly. The only downside is that this only impacts Google, whereas with the correct configuration my older two solutions can handle any Search Engine Bot. Maybe other search providers will or do provide a similar feature.

I’m 1337

February 10th, 2010

I can never post on HN again… :)

Why ATG’s Core Based Licensing is Stupid

February 2nd, 2010

ATG, like most enterprise software companies started by licensing their product based on how many CPUs you ran it on. Back in 1999 this was a pretty fair way to do things. It meant that big companies running a very high traffic site on a big Sun E4500 or E10k paid a lot more than a smaller company running on a pair of E450s. They handled more traffic and hence ideally made more money off of the site, and therefore paid more. Overall it was a decently fair model, and very easy to enforce in the software. Upon startup the software checks to see how many CPUs the server has and checks that against the license file and only starts if the license file matches or exceeds the CPU count. Makes sense, right? Most folks were doing the same thing at the time.

One aspect of this system is that year over year, as processors got faster and faster (from 250 MHz to 480 MHz for example), you got more power for the same licensing cost. In generally this was partially or fully offset by the increasing complexity of the software you were running, but worst case scenario it kept you from LOSING request handling ability over time, and best case scenario you were able to increase your traffic handling ability a bit, as Moore’s law drove clock speeds up.

On the chart above you see the green line which is the number of transistors on a CPU growing just like you’d expect from Moore’s law. This line can be thought to translate roughly to performance, and has been on this trend for over 30 years.

However, the blue line, which is clock speed (MHz or GHz) does something very odd around 2003. It flattens out. What happened?

Processor design changed. Due to some limitations in our current chip technology, going faster and faster (and smaller and smaller) couldn’t keep happening due to some physical and quantum limitations. So instead, companies like Intel and AMD began designing and building CPUs that had multiple “cores”. They went wider instead of faster. Basically each core is like a mini-CPU that, if your software supports it, means you can get more work done per second without having to have a faster clock speed.

Instead of a CPU with a single 3.4 GHz core, we have a CPU with four 2.66 GHz cores. Now keep in mind that the actual useful performance of the CPU kept climbing as it has for the last 30 years. It’s just that instead of faster clock speeds, we moved into multiple cores.

The problem is that software reports each “core” as a CPU. That means a server with a single quad-core CPU appears to be a 4 CPU server. That means CPU/core based licensing now costs FOUR TIMES AS MUCH as it did last year for the same request handling ability. We’re not taking hundreds of dollars here. We’re talking hundreds of thousands or millions of dollars for each customer.

To make matters worse the latest generation of CPUs, Intel’s Nehalems, use something called HyperThreading to make each core do more work. The upside is this generation of chips performs better than the old ones, as they should. The downside is they now report as twice as many actual cores, due to the HyperThreading. A quad core single CPU now reports as 8 CPUs. You can disable HyperThreading, but that actually introduces a 20-40% performance penalty (depending on how you benchmark it, etc…), so in most cases you’re actually getting LESS performance than you did from last years chips. At that point you can either cough up another $500,000 in license costs, or have your brand new server be slower than your old one. Great options.

Fortunately most companies saw this issue when it first reared its ugly head back in 2003, and have moved to socket based licensing. They are basically licensing the same way they always have, just redefining the CPU as the “socket” or the physical spot on the motherboard the CPU plugs into, and getting away from things like cores, hyperthreading, and that whole mess. Customers of companies which made that change (such as Oracle, JBoss, and many others) essentially end up paying the same as they always have, and everyone goes home happy. The licensing cost/performance curve for those folks has stayed pretty stable over the past 10-15 years.

Unfortunately ATG has not changed their licensing at all. This means that ATG customers are paying 4-8 times as much for licenses than they would be in 2002. And it’s only getting worse. Processor design is continuing to go wider not faster, and ATG customers will continue to be massively penalized by this CPU architecture trend.

I’ve spoken to many people at ATG, and the response is generally the same: “We understand what you’re saying, we are aware of CPU architecture changes. But changing our licensing is a big deal and takes time to do right.” Okay, I buy that. You’ve had SEVEN years so far! This has been a growing issue since ~2003 and one that pretty much all the other players in the space have handled since then.

I posted about this almost two years ago in my Rant About Core Based Licensing, but unfortunately nothing has changed on the ATG front.

It’s getting harder and harder to get dual core CPU servers, and pretty soon you won’t be able to get anything smaller than a Nehalem quad with HyperThreading. This means that out of the box, if you want two small servers (for redundancy) you will need 16 cores of ATG Commerce licensing. That’s millions of dollars. If you disable HyperThreading, and take the 20%+ performance penalty, you “only” need 8 cores of ATG Commerce licensing. That’s still probably close to a million dollars (I don’t have actual costs handy). Not only is ATG penalizing all of their existing customers, but they’re really forcing themselves out of the mid-market they are trying to target.

The ATG “starter” bundles are becoming impossible to implement due to this as well. “two cores of commerce” means you can run a single server, which doesn’t offer any redundancy. “four cores of commerce” means if you can manage to find new servers that still have a single dual core proc available, you’re limited to really old and slow chips. For instance, looking at available single processor servers from one major hosting provider, the “best” dual core you can get is a Xeon 3060 dual core 2.4 GHz with a 4 MB cache and 667 MHz RAM bus speed. The best single single processor available is a Nehalem 5570 with a quad 2.93 GHz HyperThreaded chip with 8 MB caches and 1333 MHz RAM bus speed. Real world I’d expect the Nehalem to deliver at least four times the request handling ability as the 3060, if not more. If you’re using Oracle, JBoss, or almost any other piece of enterprise commercial software out there, you, the customer, can leverage the best hardware and get more bang for your license buck. You can upgrade and quadruple your real world performance for free (like you’ve been able to do for years and years). If you’re on ATG, the modern server will quadruple your price instead.

So if you’re an ATG customer, ATG partner, or ATG employee, be aware of this issue, and try to get ATG to adopt socket based pricing. Thanks. Exponentially increasing software costs hurt the customers in the short term, and will hurt ATG in the long term.