JBoss jsessionid Query Parameter Removal

Instead of just using the Apache mod_rewrite rules from my post on “Hiding jsessionid parameters from Google“, which uses redirects, wouldn’t it be better to simply not output the jsessionid parameter into the URLs?

First, what are those jsessionid params, and why are they there?

For a web application to have state, i.e. remember things from one page request to the next (such as that you’re logged in, who you are, what is in your shopping cart, etc…), most web applications have something called a session. The session starts when you hit the website at first, sticks with you while you are on the site, and expires after you have either logged out or have been idle (i.e. not clicked on anything) for a set period of time (perhaps 30 minutes).

In general the actual session data is held on the server, things like your shopping cart, your user profile, all of that. However, in order to associate requests from your web browser with the correct session, your browser needs to pass something for the web application to recognize which session is yours. This is traditionally done in two ways:

firstly and primarily using a session-life browser cookie (or two) which hold a session identifier and optionally some additional security token(s). The browser receives this cookie from the web application, and then sends the cookie back to the web application with each page request. The web application looks at the cookie, and figures out which session is yours, and handles your page request appropriately.

secondly, and usually only as a fall-back for browsers which do not support cookies or whose cookie support has been turned off, is to rewrite every link in the web application which points to another page in the same web application with a special session id added to the URI of the link. This is usually done as a path parameter (following a ‘;’), but sometimes is also done as a query parameter (following a ‘?’).

Since on the first request to a web application, the browser is not sending a session cookie, the web application has no way of knowing if the browser actually supports cookies or not. So for the first page, the web application will usually send back the session cookie AND rewrite all of the links on the page with the jsessionid just in case the cookie is not returned.

So what’s the problem?

Search engine spiders, like Google’s GoogleBot, usually do not support cookies. This means that they see the site with the jsessionid parameter in every link and every requested URL. So this leads to three related problems. First, the links that show up in a Google search include an ugly ‘jsessionid=xxxxxx’ which looks ugly. Second, Google doesn’t recognize that the jsessionid parameter doesn’t change the page content, and as such each time the GoogleBot hits the site, and gets a different jsessionid, it indexes all of the pages again. This leads to getting multiple result listings for the same page in search results. For instance you might see the same page listed 7 times in a row. Third, by having multiple instances of the same page with the same content, the Google PageRank of the actual page is severely diluted and perhaps even penalized due to the multiple presentations.

Because of these problems, we do not want the GoogleBot to see the jsessionid URI parameters.

In my earlier post, linked to above, I used Apache mod_rewrite to look for requests from GoogleBot, and send a redirect back to GoogleBot, redirecting it to the same URI it had initially requested, just stripped of the jsessionid parameter.

This time I’m going to use a Servlet Filter to prevent the jsessionid parameter from being inserted into the URL links on the page for GoogleBot requests. This is more elegant since there are no redirects.

First, I want to link to the web page which provided the starting point for the solution I used: JSESSIONID considered harmful

I took that approach and modified the filter code to only do this for GoogleBot requests, which will allow users who don’t support or allow cookies to still use the site.

I have one Java class: DisableUrlSessionFilter.java

package com.digitalsanctuary.util;

import java.io.IOException;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;

/**
 * Servlet filter which disables URL-encoded session identifiers.
 *
 *
 * Copyright (c) 2006, Craig Condit. All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 *
 * * Redistributions of source code must retain the above copyright notice,
 * this list of conditions and the following disclaimer.
 * * Redistributions in binary form must reproduce the above copyright notice,
 * this list of conditions and the following disclaimer in the documentation
 * and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGE.
 *
 * Modified by Devon Hillard (devon@digitalsanctuary.com) to only filter for GoogleBot,
 * not for users without cookies enabled.
 *
 */
@SuppressWarnings("deprecation")
public class DisableUrlSessionFilter implements Filter {

    /**
     * The string to look for in the User-Agent header to identify the GoogleBot.
     */
    private static final String GOOGLEBOT_AGENT_STRING = "googlebot";

    /**
     * The request header with the User-Agent information in it.
     */
    private static final String USER_AGENT_HEADER_NAME = "User-Agent";

    /**
     * Filters requests to disable URL-based session identifiers.
     *
     * @param pRequest
     *                the request
     * @param pResponse
     *                the response
     * @param pChain
     *                the chain
     *
     * @throws IOException
     *                 Signals that an I/O exception has occurred.
     * @throws ServletException
     *                 the servlet exception
     */
    public void doFilter(final ServletRequest pRequest, final ServletResponse pResponse, final FilterChain pChain)
	    throws IOException, ServletException {
	// skip non-http requests
	if (!(pRequest instanceof HttpServletRequest)) {
	    pChain.doFilter(pRequest, pResponse);
	    return;
	}

	HttpServletRequest httpRequest = (HttpServletRequest) pRequest;
	HttpServletResponse httpResponse = (HttpServletResponse) pResponse;

	boolean isGoogleBot = false;

	if (httpRequest != null) {
	    String userAgent = httpRequest.getHeader(USER_AGENT_HEADER_NAME);
	    if (StringUtils.isNotBlank(userAgent)) {
		if (userAgent.toLowerCase().indexOf(GOOGLEBOT_AGENT_STRING) > -1) {
		    isGoogleBot = true;
		}
	    }
	}

	if (isGoogleBot) {
	    // wrap response to remove URL encoding
	    HttpServletResponseWrapper wrappedResponse = new HttpServletResponseWrapper(httpResponse) {
		@Override
		public String encodeRedirectUrl(final String url) {
		    return url;
		}

		@Override
		public String encodeRedirectURL(final String url) {
		    return url;
		}

		@Override
		public String encodeUrl(final String url) {
		    return url;
		}

		@Override
		public String encodeURL(final String url) {
		    return url;
		}
	    };

	    // process next request in chain
	    pChain.doFilter(pRequest, wrappedResponse);
	} else {
	    pChain.doFilter(pRequest, pResponse);
	}
    }

    /**
     * Unused.
     *
     * @param pConfig
     *                the config
     *
     * @throws ServletException
     *                 the servlet exception
     */
    public void init(final FilterConfig pConfig) throws ServletException {
    }

    /**
     * Unused.
     */
    public void destroy() {
    }
}

and the servlet filter configuration in my web.xml file:

	<filter>
		<filter-name>DisableUrlSessionFilter</filter-name>
		<filter-class>
			com.digitalsantuary.util.DisableUrlSessionFilter
		</filter-class>
	</filter>

....

	<filter-mapping>
		<filter-name>DisableUrlSessionFilter</filter-name>
		<url-pattern>/*</url-pattern>
	</filter-mapping>

So far, it seems to be working beautifully. It only impacts the GoogleBot, and it successfully strips the jsessionid parameter from the links on the site.

Enjoy!

HowGoodIWas.com Beta Launch

The How Good I Was website has just launched it’s Friends and Family Beta. The company is not mine, but I did the development of the site.

The published Goal: To deliver on-line and community services that provide social networking and media distribution capabilities targeted at the non-professional ex-athlete and their teams.

  • Showcase your athletic accomplishments…
  • Preserve and share memories, photos, and videos…
  • Reconnect with former teammates, coaches and fans…
  • Discuss and debate all things sports..

Please check it out, and send us all your feedback either using the Contact Us link on the bottom of every page, or at this e-mail address: feedback@hgiw.com

RichFaces Modal Panels, s:graphicImage, and IE6

If, like me, you are using the Seam s:graphicImage tag to serve an image from within a RichFaces modal panel, you may have run into an issue where in IE6 the image does not get displayed, and you get the dreaded red X of failure. It works fine in all other browsers, including IE7, and works outside of the modal panel, but not from within the modal panel.

It’s not a problem with the image data (saving the image from another browser and serving it up directly works fine. I suspect it’s a delay issue with the rendering of the modal panel. For me, it was serving up the red X about 90% of the time under IE6.

The “fix” is to stop using the s:graphicImage tag within the modal, and use a Servlet to stream out the image data instead. It’s pretty easy.

One gotcha I had was that I already had a Servlet handling video output, and I couldn’t find an example of how to configure two separate paths into the seam web:context-filter Servlet Filter (which allows access to Seam components, like the entity manager to load up the video/image items). A helpful response on the forums gave me this solution:


Also, if you’re struggling to figure out what a library’s new feature isn’t working for you, no matter how many permutations of the documented usage you try, check the versions in the manifest files in the library’s jars. Maybe, like me, you upgraded the jars in one project, but forgot to upgrade them in this one….

Seam EntityHome Design Pattern

I’ve been using Seam for over a year. At some point the “Home” object was introduced to the documentation (Chapter 11). Reading the documentation didn’t convince me of the point. Being an ATG guy at heart, I still prefer using “form handlers” for managing my important entities. So I haven’t bothered.

However, just recently I ran into a little problem with LazyInitializationExceptions. I’m sure you’ve run into them yourself. Basically, when Hibernate loads an entity for you, it’s loaded by an entity manager which is available to manage that object within a specific scope. This effects persisting changes. Also, if you have properties on that entity that have a fetchType.LAZY, those properties can only be lazily loaded while the same entity manager is available. If it’s not, you get LazyInitializationExceptions. No fun.

So in Seam, what I usually do to avoid this, is to create a long running conversation, and load the entity within the context of that conversation. Then, as long as you’re still in that long running conversation, you can lazily load all the properties you want.

Normally this is fine. However, in my latest project the application will send e-mails to users when they get a new inter-user message. If they click on the link in the e-mail, they come to the site, but without the existing long running conversation query parameter. Their user component is session scoped, so they’re still logged in, but the user object is now outside of it’s conversation, and if you attempt to access lazily loaded properties, for instance the user’s messages they are trying to see based on the e-mail, blammo: exception central. Unhappy users.

I was pointed to this page: Using EntityHome for entities in long-running contexts

Which showed me how to use the EntityHome to avoid the whole problem. Basically it works like this:

When the user logs in, set the user entity’s id into a session scoped component. Don’t bother with long running conversations (at least not for the user). The User Home component’s Factory method creates a user component using the session scoped user id, anytime the user component is referenced. This component entity is loaded within the context of the current conversation (if it hasn’t already been loaded). So, presto-magic, no lazy loading issues.

It let me fix the issue with about 20 lines of code, and little trouble. It’s working perfectly so far.