Hiding jsessionid parameter from Google

Home/General/Hiding jsessionid parameter from Google

If you’re running a website on JBoss you may discover that Google has indexed your pages with a jsessionid query parameter in the links.

The Google crawl bot does not support cookies, therefore JBoss uses the jsessionid query parameter in order to maintain a session state without cookies. These query parameters can impact your Google rank and indexing efficiency as the same page can be indexed multiple times with different session ids, and dilute your ranking. Also, it leads to ugly links.

If you want to still be able to support non-cookie using users, but would like Google to see cleaner links, you can use Apache’s mod_rewrite to modify the links for the Google bot only, leaving the normal functionality available to the rest of your users.

Assuming you have mod_rewrite enabled in your Apache instance, use this configuration in your apache config:

	# This should strip out jsessionids from google
	RewriteCond %{HTTP_USER_AGENT} (googlebot) [NC]
	ReWriteRule ^(.*);jsessionid=[A-Za-z0-9]+(.*)$ $1$2 [L,R=301]

This rule says for request where the user agent contains “googlebot” (with case insensitive matching), rewrite the URL without the jsessionid. It seems to work nicely.

By | 2010-02-16T20:16:53+00:00 May 19th, 2008|General|3 Comments

About the Author:

3 Comments

  1. […] of just using the Apache mod_rewrite rules from my post on “Hiding jsessionid parameters from Google”, which uses redirects, wouldn’t it be better to simply not output the jsessionid parameter into […]

  2. […] posted two solutions to this issue in the past: Using Apache to ReWrite URLs to remove JSESSIONID and a more advanced solution of using a Servlet Filter to avoid adding JSESSIONID for GoogleBot […]

  3. Andrew February 24, 2010 at 11:23 am - Reply

    Thanks for the example for rewriting the urls for Google. We also use it for making sure that the spider we use does not create files with jsessionid in the file names. One thing to be aware of is that the list of valid characters for a JBoss jsession id can include “-“, “+” and “*”. We use the following:

    RewriteCond %{HTTP_USER_AGENT} (AESpider) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (googlebot) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (MSNBot) [NC]
    RewriteRule ^([^;]+);jsessionid=[A-Za-z0-9\-\+\*]+\.[A-Za-z0-9]+(.*)$ $1$2 [L,R=301]

    Andrew

Leave A Comment