Site Network: Personal | Professional | Photography

Technical Blog

This blog will contain content related to Java, Seam, Security, my sites and projects, as well as other technical subjects I am interested in.

Comments and questions are welcome!

Archive for the ‘Java’ Category

Sparkred Launches an ATG Mailing List

Monday, March 15th, 2010

Over at Spark::red ATG Hosting we’ve decided to launch a monthly newsletter. Once a month we’ll send out an e-mail with some very useful ATG content, technical tips and source code, business tricks and advice on leveraging ATG products to increase sales. We’ll talk about PCI compliance and how to reduce cart abandonment.

We won’t send more than one e-mail a month, we won’t spam you, bug you, bother you, or waste your time. Each mailing will be as packed full of genuinely useful information as possible.

Sign up for the world’s best ATG Technology and Business Newsletter!

10MinuteMail and Form Submission Charsets in Seam/JSF

Thursday, March 4th, 2010

I launched a minor update to 10MinuteMail.com last night. It contained:

  1. Changed the mail domain to owlpic.com
  2. Updated the Russian language translation (thanks to Vladimir)
  3. Fixed a bug where replying to an e-mail using a non-latin character set would result in an unreadable e-mail (also thanks to Vladimir for pointing this out)

This last issue was an odd one to fix, so I wanted to document it here (although the same fix can be found elsewhere on the net).

10MinuteMail.com is pretty well internationalized. The site content is translated into over 30 languages and the pages are served as UTF-8. Incoming e-mails are also displayed using UTF-8 and display non-latin character sets correctly. However, until this latest release, if you replied to an e-mail using non-latin characters, the resulting e-mail contained gibberish instead of the correct characters.

I started off by adding UTF-8 as the specified character set for outgoing e-mails. That didn’t help. I added UTF-8 encoding declaration attribute to the form element. That didn’t help. Finally after some frustration, googling, and trying a ton of things, I discovered that for some reason, and I”m not sure if the bug is in JBoss, JSF, Seam, or where exactly, but you have to set the request objects character encoding programmatically for each request, otherwise it will use the wrong encoding on the form contents and you end up with gibberish. The easiest way to solve this that I’ve found so far is to create a small Servlet Filter that sets the encoding on the request, and add that filter in before your Seam filter in your web.xml. It worked for me.

The filter:

package com.digitalsanctuary.seam;

import java.io.IOException;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;

/**
 * The Class UTF8Filter.
 */
public class UTF8Filter implements Filter {

    /** The Constant UTF_8. */
    private static final String UTF_8 = "UTF-8";

    /**
     * Destroy.
     *
     * @see javax.servlet.Filter#destroy()
     */
    public void destroy() {
    }

    /**
     * Do filter.
     *
     * @param pRequest
     *            the request
     * @param pResponse
     *            the response
     * @param pChain
     *            the chain
     * @throws IOException
     *             Signals that an I/O exception has occurred.
     * @throws ServletException
     *             the servlet exception
     * @see javax.servlet.Filter#doFilter(javax.servlet.ServletRequest, javax.servlet.ServletResponse,
     *      javax.servlet.FilterChain)
     */
    public void doFilter(ServletRequest pRequest, ServletResponse pResponse, FilterChain pChain) throws IOException,
	    ServletException {
	pRequest.setCharacterEncoding(UTF_8);
	pChain.doFilter(pRequest, pResponse);
    }

    /**
     * Inits the.
     *
     * @param arg0
     *            the arg0
     * @throws ServletException
     *             the servlet exception
     * @see javax.servlet.Filter#init(javax.servlet.FilterConfig)
     */
    public void init(FilterConfig arg0) throws ServletException {
    }

}

An excerpt of web.xml:

....
	<filter>
		<filter-name>UTF8 Filter</filter-name>
		<filter-class>com.digitalsanctuary.seam.UTF8Filter</filter-class>
	</filter>

	<filter-mapping>
		<filter-name>UTF8 Filter</filter-name>
		<url-pattern>/*</url-pattern>
	</filter-mapping>

	<filter>
		<filter-name>Seam Filter</filter-name>
		<filter-class>org.jboss.seam.servlet.SeamFilter</filter-class>
	</filter>

	<filter-mapping>
		<filter-name>Seam Filter</filter-name>
		<url-pattern>/*</url-pattern>
	</filter-mapping>
....

Does anyone have a better fix or know exactly why this happens?

Make Google Ignore JSESSIONID

Tuesday, February 16th, 2010

Search engines like Google will often index content with params like JSESSIONID and other session or conversation scope params. This causes two problems: first the links returned in the Google search results can have these parameters in them, resulting in “session not found” or other incompatible session state issues. Secondly it can cause a single page of content, to be indexed multiple times (with differing parameters) this diluting your page’s rank.

I’ve posted two solutions to this issue in the past: Using Apache to ReWrite URLs to remove JSESSIONID and a more advanced solution of using a Servlet Filter to avoid adding JSESSIONID for GoogleBot Requests.

Now there’s an even better way to handle this. Google has added an amazing new feature to their Webmaster Tools which allows you to specify how the GoogleBot indexer should handle various parameters. You can ignore certain parameters such as JSESSIONID, cid, and others, and also specifically not ignore other parameters such as productId, skuId, etc…

Log into your Google Webmaster Tools, and select the site you wish to work with. Under “Site Configuration” -> “Settings” there is a new section at the bottom called “Parameter handling”. Click on “adjust parameter settings” to expand the parameter handling configuration for your site. Sometimes Google will suggest various parameters it has discovered while crawling your site, and other times you just enter the parameters you want Google to ignore or pay attention to.

Google Webmaster Tools Parameter Handling Interface

This is a much more elegant solution to the JSESSIONID problem, and also allows you to easily handle other parameters your site may use for either session state or dynamic content generation correctly. The only downside is that this only impacts Google, whereas with the correct configuration my older two solutions can handle any Search Engine Bot. Maybe other search providers will or do provide a similar feature.

Why ATG’s Core Based Licensing is Stupid

Tuesday, February 2nd, 2010

ATG, like most enterprise software companies started by licensing their product based on how many CPUs you ran it on. Back in 1999 this was a pretty fair way to do things. It meant that big companies running a very high traffic site on a big Sun E4500 or E10k paid a lot more than a smaller company running on a pair of E450s. They handled more traffic and hence ideally made more money off of the site, and therefore paid more. Overall it was a decently fair model, and very easy to enforce in the software. Upon startup the software checks to see how many CPUs the server has and checks that against the license file and only starts if the license file matches or exceeds the CPU count. Makes sense, right? Most folks were doing the same thing at the time.

One aspect of this system is that year over year, as processors got faster and faster (from 250 MHz to 480 MHz for example), you got more power for the same licensing cost. In generally this was partially or fully offset by the increasing complexity of the software you were running, but worst case scenario it kept you from LOSING request handling ability over time, and best case scenario you were able to increase your traffic handling ability a bit, as Moore’s law drove clock speeds up.

On the chart above you see the green line which is the number of transistors on a CPU growing just like you’d expect from Moore’s law. This line can be thought to translate roughly to performance, and has been on this trend for over 30 years.

However, the blue line, which is clock speed (MHz or GHz) does something very odd around 2003. It flattens out. What happened?

Processor design changed. Due to some limitations in our current chip technology, going faster and faster (and smaller and smaller) couldn’t keep happening due to some physical and quantum limitations. So instead, companies like Intel and AMD began designing and building CPUs that had multiple “cores”. They went wider instead of faster. Basically each core is like a mini-CPU that, if your software supports it, means you can get more work done per second without having to have a faster clock speed.

Instead of a CPU with a single 3.4 GHz core, we have a CPU with four 2.66 GHz cores. Now keep in mind that the actual useful performance of the CPU kept climbing as it has for the last 30 years. It’s just that instead of faster clock speeds, we moved into multiple cores.

The problem is that software reports each “core” as a CPU. That means a server with a single quad-core CPU appears to be a 4 CPU server. That means CPU/core based licensing now costs FOUR TIMES AS MUCH as it did last year for the same request handling ability. We’re not taking hundreds of dollars here. We’re talking hundreds of thousands or millions of dollars for each customer.

To make matters worse the latest generation of CPUs, Intel’s Nehalems, use something called HyperThreading to make each core do more work. The upside is this generation of chips performs better than the old ones, as they should. The downside is they now report as twice as many actual cores, due to the HyperThreading. A quad core single CPU now reports as 8 CPUs. You can disable HyperThreading, but that actually introduces a 20-40% performance penalty (depending on how you benchmark it, etc…), so in most cases you’re actually getting LESS performance than you did from last years chips. At that point you can either cough up another $500,000 in license costs, or have your brand new server be slower than your old one. Great options.

Fortunately most companies saw this issue when it first reared its ugly head back in 2003, and have moved to socket based licensing. They are basically licensing the same way they always have, just redefining the CPU as the “socket” or the physical spot on the motherboard the CPU plugs into, and getting away from things like cores, hyperthreading, and that whole mess. Customers of companies which made that change (such as Oracle, JBoss, and many others) essentially end up paying the same as they always have, and everyone goes home happy. The licensing cost/performance curve for those folks has stayed pretty stable over the past 10-15 years.

Unfortunately ATG has not changed their licensing at all. This means that ATG customers are paying 4-8 times as much for licenses than they would be in 2002. And it’s only getting worse. Processor design is continuing to go wider not faster, and ATG customers will continue to be massively penalized by this CPU architecture trend.

I’ve spoken to many people at ATG, and the response is generally the same: “We understand what you’re saying, we are aware of CPU architecture changes. But changing our licensing is a big deal and takes time to do right.” Okay, I buy that. You’ve had SEVEN years so far! This has been a growing issue since ~2003 and one that pretty much all the other players in the space have handled since then.

I posted about this almost two years ago in my Rant About Core Based Licensing, but unfortunately nothing has changed on the ATG front.

It’s getting harder and harder to get dual core CPU servers, and pretty soon you won’t be able to get anything smaller than a Nehalem quad with HyperThreading. This means that out of the box, if you want two small servers (for redundancy) you will need 16 cores of ATG Commerce licensing. That’s millions of dollars. If you disable HyperThreading, and take the 20%+ performance penalty, you “only” need 8 cores of ATG Commerce licensing. That’s still probably close to a million dollars (I don’t have actual costs handy). Not only is ATG penalizing all of their existing customers, but they’re really forcing themselves out of the mid-market they are trying to target.

The ATG “starter” bundles are becoming impossible to implement due to this as well. “two cores of commerce” means you can run a single server, which doesn’t offer any redundancy. “four cores of commerce” means if you can manage to find new servers that still have a single dual core proc available, you’re limited to really old and slow chips. For instance, looking at available single processor servers from one major hosting provider, the “best” dual core you can get is a Xeon 3060 dual core 2.4 GHz with a 4 MB cache and 667 MHz RAM bus speed. The best single single processor available is a Nehalem 5570 with a quad 2.93 GHz HyperThreaded chip with 8 MB caches and 1333 MHz RAM bus speed. Real world I’d expect the Nehalem to deliver at least four times the request handling ability as the 3060, if not more. If you’re using Oracle, JBoss, or almost any other piece of enterprise commercial software out there, you, the customer, can leverage the best hardware and get more bang for your license buck. You can upgrade and quadruple your real world performance for free (like you’ve been able to do for years and years). If you’re on ATG, the modern server will quadruple your price instead.

So if you’re an ATG customer, ATG partner, or ATG employee, be aware of this issue, and try to get ATG to adopt socket based pricing. Thanks. Exponentially increasing software costs hurt the customers in the short term, and will hurt ATG in the long term.

Terrible Code

Monday, November 2nd, 2009
request.setParameter("qualifySkus", getSkusRepository(d, cItem));
  1. “qualifySkus” is confusing. Is it an array/list/collection of “qualifiedSKUs” or a flag that’s a result of “qualifyingSkus” or….
  2. “qualifySKus” should be a constant with a nice comment, not an in-line String.
  3. The method getSkusRespository seems like it would return a catalog repository, doesn’t it? Instead it takes in a List of String SkuIds, loads up the corresponding SKU RepositoryItems, removes any that have the property “isLive” set to false, and removes any that have a current inventory stock level of zero. It then returns an ArrayList of those filtered SKU RepositoryItems. Perhaps a better name might be “getLiveInStockSKUs”?
  4. What on earth is “d”? Even looking at the full code of this class, it’s very difficult to tell what d is meant to contain. It’s actually a List of Strings of SkuIds that are qualifying skus for a given promo. “qualifiedSkus” would be a better name.
  5. cItem is a commerce item. However it’s not actually used by the getSkusRepository method at all. There’s no reason to pass it in.
  6. This line is in an ATG droplet and shoves the result of the getSkusRepository method into a request param before servicing an oparam. However, as you can see, it doesn’t inspect the output of the method. As I explained above, the method actually filters a list of SKUs based on isLive and current inventory state. It’s very possible that there will be no live and in-stock SKUs, and the param’s value will be null or an empty list. In that case, we’d actually want to render a different oparam, which is defined and called elsewhere, but not here. Validate your output!

That’s six issues in one line. Please don’t write code like this.