ASP.NET 2.0 Mozilla Browser Detection Hole

It has recently come to my attention that there is something drastically wrong with the way search engines have been indexing my ASP.NET 2.0 blog.

As I've started to explain previously, this is because of the way the browser detection is set up. To give a brief rundown ASP.NET 2.0 has a default browser definition which seems to assume that the default browser is fairly capable and supports common things such as javascript and cookies. A browser definition can get inherited into other definitions which can then override specific properties to update it for that specific browser or browser version.

Apparently in around March 2006 Google started rolling out updates that changed the Googlebot's useragent string from:

"Googlebot/2.1 (+http://www.googlebot.com/bot.html)" to
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Now the reason for this is so the Googlebot could identify itself as being Mozilla/5.0 compliant which should allow it to be accepted by more webservers. However this breaks the detection pattern in ASP.NET. (And was always broken in Yahoo Slurp, I just don't know if anyone ever noticed).

When the useragent was just "Googlebot/2.1" it wasn't able to be matched and used the "default.browser" detection file which defaulted to a browser of reasonable capabilities. After the change it found itself in the "mozilla.browser" file because it was detected on the "Mozilla" word. So all the following sets of instructions in the "mozilla.browser" file try to establish exactly what platform and variant of Mozilla it is, for example, if its Firefox running on OSX, or if it's the older Mozilla Gecko rendering engine. But because there is no definition for a Generic Mozilla/5.0 compatible browser it gets the most relevant match, being the lowest Mozilla/1.0 compatible settings. Bad!

Because of this bad detection the default Mozilla/1.0 settings assume NO COOKIES and insert the session ID into the url then issues a response status 302 (content temporarily moved). What makes this situation even worse is that the default behavior of search engines is to follow these redirects and index the content on the other side. So basically everytime some random User-agent that claims to be Mozilla/5.0 compliant hits the site it gets Mozilla/1.0 capabilities. What is needed is something to bridge this gap.

Fortunately there is something that can be done that won't even require a recompile of your ASP.NET 2.0 application. Simply create a "genericmozilla5.browser" file in your "/App_Browsers" folder in the root of your application with the following in contents:

<browsers>
<browser id="GenericMozilla5" parentID="Mozilla">
<identification>
<userAgent match="Mozilla/5\.(?'minor'\d+).*[C|c]ompatible; ?(?'browser'.+); ?\+?(http://.+)\)" />
</identification>
<capabilities>
<capability name="majorversion" value="5" />
<capability name="minorversion" value="${minor}" />
<capability name="browser" value="${browser}" />
<capability name="Version" value="5.${minor}" />
<capability name="activexcontrols" value="true" />
<capability name="backgroundsounds" value="true" />
<capability name="cookies" value="true" />
<capability name="css1" value="true" />
<capability name="css2" value="true" />
<capability name="ecmascriptversion" value="1.2" />
<capability name="frames" value="true" />
<capability name="javaapplets" value="true" />
<capability name="javascript" value="true" />
<capability name="jscriptversion" value="5.0" />
<capability name="supportsCallback" value="true" />
<capability name="supportsFileUpload" value="true" />
<capability name="supportsMultilineTextBoxDisplay" value="true" />
<capability name="supportsMaintainScrollPositionOnPostback" value="true" />
<capability name="supportsVCard" value="true" />
<capability name="supportsXmlHttp" value="true" />
<capability name="tables" value="true" />
<capability name="vbscript" value="true" />
<capability name="w3cdomversion" value="1.0" />
<capability name="xml" value="true" />
<capability name="tagwriter" value="System.Web.UI.HtmlTextWriter" />
</capabilities>
</browser>
</browsers>

This will match generic Mozilla compatible browsers and spiders with user-agents strings such as:

  • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
  • Mozilla/5.0 (compatible; AbiLogicBot/1.0; +http://www.abilogic.com/bot.html)
  • Mozilla/5.0 (compatible; AnyApexBot/1.0; +http://www.anyapex.com/bot.html)
  • Mozilla/5.0 (compatible; BecomeBot/3.0; MSIE 6.0 compatible; +http://www.become.com/site_owners.html)
  • Mozilla/5.0 (compatible; MojeekBot/2.0; http://www.mojeek.com/bot.html)
  • Mozilla/5.0 (compatible; Scrubby/2.2; +http://www.scrubtheweb.com/)

Other Notes

The MSNBOT also never had this problem because it like the original Googlebot string was never detected and thus received the "default.browser" file settings which support the cookies.

My solution is not a complete fix, I think Microsoft could have done one thing better here. Because the browser string goes into the "mozilla.browser" file, they need another level where when it knows its Mozilla/5.0 compliant it gets the appropriate defaults before it starts to figure out exactly what browser it is. Even though with this approach the exact browsing useragent wouldn't be established, it would at least support future browsers claiming to be compliant at a higher level then just "Mozilla".

Downloads

Print | posted on Monday, December 11, 2006 12:58 PM

&uot&uot

Comments on this post

# re: ASP.NET 2.0 Mozilla Browser Detection Hole

Requesting Gravatar...
I just wanted to thank you for this post. It has saved my life. I also wanted to comment on a couple of key words that may get this article many many more hits from search engines. URL rewriting is one of the keywords because this helps fix a bug that keeps many of the search engines from being able to index a friendly url in ASP.NET.

Here is a link to a site that goes into the problem in a little more detail. But your solution is a catch all solution where this one only does google. http://todotnet.com/archive/0001/01/01/7472.aspx

Thanks again!
Left by Very Thankful Reader on Jul 01, 2007 12:02 AM

# re: ASP.NET 2.0 Mozilla Browser Detection Hole

Requesting Gravatar...
You saved me! Could not figure our why every search engine coming to the site was getting 500 errors. I knew it had to do with the urlmappings but could not figure it out.

Thanks much!
Left by Steve on Jul 08, 2007 12:24 PM

# re: ASP.NET 2.0 Mozilla Browser Detection Hole

Requesting Gravatar...
BRILLIANT! I needed this solution and was about to write it myself. You just saved me some hours!
Thanks.
Left by Doug on Aug 12, 2007 6:49 PM

# re: ASP.NET 2.0 Mozilla Browser Detection Hole

Requesting Gravatar...
This is very helpful.
Thanks!
Left by Yair on Nov 11, 2007 8:31 AM

# re: ASP.NET 2.0 Mozilla Browser Detection Hole

Requesting Gravatar...
Is google aware of this bug?
Is there any chance they will crawl again the problematic sites?
Left by Yair Bar-On on Nov 11, 2007 6:13 PM

# re: ASP.NET 2.0 Mozilla Browser Detection Hole

Requesting Gravatar...
I think we have to thank microsoft for this horrible situation.

As all the people here i'm very thankful for this post... it saved me for many things... thank you thank you
Left by Paul on Nov 14, 2007 2:52 AM
Comments have been closed on this topic.