Search Engine Bot Detection
So after I first noticed a large build up of strange session Urls in Google searches for my domain I've then done a little bit of research into the issue and discovered it was .NET not detecting that the search engine spider was Mozilla/5.0 compliant and inserting some rubbish session id into the url.
It's been nearly a week since my changes to correct this and there's already a small indication of the 'healing' process. The pages are not out of the index yet, (and yes I do know that I 'could' remove them manually) but there...
It has recently come to my attention that there is something drastically wrong with the way search engines have been indexing my ASP.NET 2.0 blog.
As I've started to explain previously, this is because of the way the browser detection is set up. To give a brief rundown ASP.NET 2.0 has a default browser definition which seems to assume that the default browser is fairly capable and supports common things such as javascript and cookies. A browser definition can get inherited into other definitions which can then override specific properties to update it for that specific browser or browser version.
Apparently in...
ASP.NET and Dirty Urls
There are two things that have been bothering me about pages that are getting indexed in Google from an ASP.NET application. The first is somehow there are ASP.NET Session Urls ending up in the Google index. This is bad because searchers that actually do click these links are likely to get a 500 error (internal server error) because they will be trying to access a page of an expired session.
How is Google finding all these 'bad' urls?
Well apparently there is no browser definition in ASP.NET 2.0 for the Googlebot's useragent string, so when the spider...