ASP.NET 2.0 Mozilla Browser Detection Hole

It has recently come to my attention that there is something drastically wrong with the way search engines have been indexing my ASP.NET 2.0 blog.

As I've started to explain previously, this is because of the way the browser detection is set up. To give a brief rundown ASP.NET 2.0 has a default browser definition which seems to assume that the default browser is fairly capable and supports common things such as javascript and cookies. A browser definition can get inherited into other definitions which can then override specific properties to update it for that specific browser or browser version.

Apparently in around March 2006 Google started rolling out updates that changed the Googlebot's useragent string from:

"Googlebot/2.1 (+http://www.googlebot.com/bot.html)" to
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Now the reason for this is so the Googlebot could identify itself as being Mozilla/5.0 compliant which should allow it to be accepted by more webservers. However this breaks the detection pattern in ASP.NET. (And was always broken in Yahoo Slurp, I just don't know if anyone ever noticed).

When the useragent was just "Googlebot/2.1" it wasn't able to be matched and used the "default.browser" detection file which defaulted to a browser of reasonable capabilities. After the change it found itself in the "mozilla.browser" file because it was detected on the "Mozilla" word. So all the following sets of instructions in the "mozilla.browser" file try to establish exactly what platform and variant of Mozilla it is, for example, if its Firefox running on OSX, or if it's the older Mozilla Gecko rendering engine. But because there is no definition for a Generic Mozilla/5.0 compatible browser it gets the most relevant match, being the lowest Mozilla/1.0 compatible settings. Bad!

Because of this bad detection the default Mozilla/1.0 settings assume NO COOKIES and insert the session ID into the url then issues a response status 302 (content temporarily moved). What makes this situation even worse is that the default behavior of search engines is to follow these redirects and index the content on the other side. So basically everytime some random User-agent that claims to be Mozilla/5.0 compliant hits the site it gets Mozilla/1.0 capabilities. What is needed is something to bridge this gap.

Fortunately there is something that can be done that won't even require a recompile of your ASP.NET 2.0 application. Simply create a "genericmozilla5.browser" file in your "/App_Browsers" folder in the root of your application with the following in contents:

<browsers>
<browser id="GenericMozilla5" parentID="Mozilla">
<identification>
<userAgent match="Mozilla/5\.(?'minor'\d+).*[C|c]ompatible; ?(?'browser'.+); ?\+?(http://.+)\)" />
</identification>
<capabilities>
<capability name="majorversion" value="5" />
<capability name="minorversion" value="${minor}" />
<capability name="browser" value="${browser}" />
<capability name="Version" value="5.${minor}" />
<capability name="activexcontrols" value="true" />
<capability name="backgroundsounds" value="true" />
<capability name="cookies" value="true" />
<capability name="css1" value="true" />
<capability name="css2" value="true" />
<capability name="ecmascriptversion" value="1.2" />
<capability name="frames" value="true" />
<capability name="javaapplets" value="true" />
<capability name="javascript" value="true" />
<capability name="jscriptversion" value="5.0" />
<capability name="supportsCallback" value="true" />
<capability name="supportsFileUpload" value="true" />
<capability name="supportsMultilineTextBoxDisplay" value="true" />
<capability name="supportsMaintainScrollPositionOnPostback" value="true" />
<capability name="supportsVCard" value="true" />
<capability name="supportsXmlHttp" value="true" />
<capability name="tables" value="true" />
<capability name="vbscript" value="true" />
<capability name="w3cdomversion" value="1.0" />
<capability name="xml" value="true" />
<capability name="tagwriter" value="System.Web.UI.HtmlTextWriter" />
</capabilities>
</browser>
</browsers>

This will match generic Mozilla compatible browsers and spiders with user-agents strings such as:

  • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
  • Mozilla/5.0 (compatible; AbiLogicBot/1.0; +http://www.abilogic.com/bot.html)
  • Mozilla/5.0 (compatible; AnyApexBot/1.0; +http://www.anyapex.com/bot.html)
  • Mozilla/5.0 (compatible; BecomeBot/3.0; MSIE 6.0 compatible; +http://www.become.com/site_owners.html)
  • Mozilla/5.0 (compatible; MojeekBot/2.0; http://www.mojeek.com/bot.html)
  • Mozilla/5.0 (compatible; Scrubby/2.2; +http://www.scrubtheweb.com/)

Other Notes

The MSNBOT also never had this problem because it like the original Googlebot string was never detected and thus received the "default.browser" file settings which support the cookies.

My solution is not a complete fix, I think Microsoft could have done one thing better here. Because the browser string goes into the "mozilla.browser" file, they need another level where when it knows its Mozilla/5.0 compliant it gets the appropriate defaults before it starts to figure out exactly what browser it is. Even though with this approach the exact browsing useragent wouldn't be established, it would at least support future browsers claiming to be compliant at a higher level then just "Mozilla".

Downloads

 
 
.NET Internet Code Snippets Errors and Bugs ASP.NET
Posted by: Brendan Kowitz
Last revised: 21 Sep 2013 12:15PM

Comments

7/1/2007 12:02:42 PM
I just wanted to thank you for this post. It has saved my life. I also wanted to comment on a couple of key words that may get this article many many more hits from search engines. URL rewriting is one of the keywords because this helps fix a bug that keeps many of the search engines from being able to index a friendly url in ASP.NET.

Here is a link to a site that goes into the problem in a little more detail. But your solution is a catch all solution where this one only does google. http://todotnet.com/archive/0001/01/01/7472.aspx

Thanks again!
7/9/2007 12:24:58 AM
You saved me! Could not figure our why every search engine coming to the site was getting 500 errors. I knew it had to do with the urlmappings but could not figure it out.

Thanks much!
8/13/2007 6:49:56 AM
BRILLIANT! I needed this solution and was about to write it myself. You just saved me some hours!
Thanks.
11/14/2007 3:52:47 PM
I think we have to thank microsoft for this horrible situation.

As all the people here i'm very thankful for this post... it saved me for many things... thank you thank you
11/23/2007 7:39:04 PM
this is microsoft ,it means too micro & too soft..
8/16/2008 5:52:35 AM
This is the most useful blog post I think I've ever read!
9/21/2008 6:33:58 AM
you are magic! thank you.
1/1/2009 2:30:44 PM
Very awesome researching and fixing skills. 100% METAL!
1/27/2009 2:45:10 PM
You're a lifesaver Brendan - total lifesaver!

Thanks!
2/10/2009 8:02:59 AM
Thanks for the post... really saved my bacon!

Just a quick note.. that Ask Jeeves will still be unable to index the site.

Fixed the issue by copying the fix above but changing the expression

<useragent match="Mozilla/2\.(?'minor'\d+).*[C|c]ompatible;"></useragent>

Hope this helps someone else out
3/26/2009 10:37:09 PM
I wonder why on heaven of this planet earth, you are not contacted back by Google?

Gem of a solution. Thanks a lot :)
7/3/2009 1:33:18 PM
Thank you Admin .

No new comments are allowed on this post.