If you’re running a website on JBoss you may discover that Google has indexed your pages with a jsessionid query parameter in the links.
The Google crawl bot does not support cookies, therefore JBoss uses the jsessionid query parameter in order to maintain a session state without cookies. These query parameters can impact your Google rank and indexing efficiency as the same page can be indexed multiple times with different session ids, and dilute your ranking. Also, it leads to ugly links.
If you want to still be able to support non-cookie using users, but would like Google to see cleaner links, you can use Apache’s mod_rewrite to modify the links for the Google bot only, leaving the normal functionality available to the rest of your users.
Assuming you have mod_rewrite enabled in your Apache instance, use this configuration in your apache config:
# This should strip out jsessionids from google
RewriteCond %{HTTP_USER_AGENT} (googlebot) [NC]
ReWriteRule ^(.*);jsessionid=[A-Za-z0-9]+(.*)$ $1$2 [L,R=301]
This rule says for request where the user agent contains “googlebot” (with case insensitive matching), rewrite the URL without the jsessionid. It seems to work nicely.
Pingback: JBoss jsessionid Query Parameter Removal | Devon Hillard Tech Blog
Pingback: Make Google Ignore JSESSIONID & Other Query String Parameters | Devon Hillard Tech Blog
Thanks for the example for rewriting the urls for Google. We also use it for making sure that the spider we use does not create files with jsessionid in the file names. One thing to be aware of is that the list of valid characters for a JBoss jsession id can include “-”, “+” and “*”. We use the following:
RewriteCond %{HTTP_USER_AGENT} (AESpider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (googlebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (MSNBot) [NC]
RewriteRule ^([^;]+);jsessionid=[A-Za-z0-9\-\+\*]+\.[A-Za-z0-9]+(.*)$ $1$2 [L,R=301]
Andrew