ATG SEO – URL Formats and Crawler Limits

URL Formats and Structures

By making your URLs expressive and relevant to the content and structure of the site, you help not only your search engine ranking but also your users, since they can easily tell what a given link will take them to.

This is a bad URL:

This is a good URL:

It is chock full of descriptive words. The page allows you to “shop” for “mens shoes”, more specifically “fluevogs” in “size 12”. This makes it much easier for search engines to know the purpose of the page and also for users to know what a link will take them to.

In order to accomplish this, you should name your directories and pages as accurately and descriptively as possible. You should also structure your site’s content and URLs in a logical hierarchical fashion.

Now your site may have a single actual JSP that handles displaying a category, and another one that handles displaying a product, any product. So you need to map the URL to actually serve up the content from

Depending on your technology there are different ways to do this.

If you’re using JBoss Seam it’s very easy to use rewrite patterns in the pages.xml mapping file. This not only handles mapping the incoming requests for pretty URLs to the actual resources on the backend, but also handles generating the pretty URLs within the site automatically, which is a huge time saver.

If you’re using Apache you can use mod_rewrite to translate the pretty requested URLs to the ugly actual URLs. Of course in that case you need to ensure you’re generating the correct pretty URLs on the pages of your site.

If you’re using ATG you should read the chapter of the ATG Programmers Guide titled Search Engine Optimization (chapter 10 for ATG 2006.3). This covers the ATG support for URL Templates and the Jump Servlet. A few of the downsides to be aware of, are that it’s not super simple to set up, and that it only displays the pretty URLs to search engines, not to all users. I really prefer solutions that give users the benefits of readable URLs as well. The out of the box ATG system has too much of a performance impact to use for all situations.

We’ll be releasing a high performance open source solution for URL re-writing in ATG eCommerce applications as part of the Open Source Foundation ATG eCommerce Framework in the near future.

Know Your Limits

Search engine spiders, like the GoogleBot, have limits as to what they’ll parse and consider. For instance the GoogleBot will only read in the first ~101kb of your page’s HTML. Anything after that is ignored. So you need to ensure that your pages are smaller than 101kb. This is also a best practice with regards to performance: keep your HTML as small as possible.

Search engines will often display a small chunk of text with the search results, usually this is taken from the page’s description meta tag. Most will only show the first 160 characters of the description, so you want to be sure that your description content is less than 160 characters and makes sense for a human to read.

Many search engines will ignore, or penalize you, for having more than 100 links on a given page. Keep the number of links on a single page to a reasonable level. If your primary navigation must have more than 100 links, you can load in the second, third, etc… level navigation via AJAX/Javascript.. This will let your users have access to the full navigation structure from any page, but keeps things more reasonable for the search engine crawler. You’ll want to be sure that the crawler will be able to traverse through the complete site structure using the more limited navigational options it can see, the non-AJAX navigation.






2 responses to “ATG SEO – URL Formats and Crawler Limits”

  1. Mark Jackson Avatar

    When will the “high performance open source solution for URL rewriting” be available? Any ETA? I’m working with two clients who have issues with URLs, and one of them is using the User Agent “cloaking” (out-of-the-box URL rewrite function).

    If it’s possible to speak with you directly, please let me know! I can be reached at mark (at)

Leave a Reply

Your email address will not be published. Required fields are marked *