I’ve had idle chats with techie friends wondering whether search engines such as Google simply index the raw HTML of your web page, or do they actually parse and “display” it as a true browser would.

I’ve generally believed they must be doing the latter. These days, a lot of a page’s content may be generated by Javascript and therefore would not appear in the raw HTML as served by the web server. A lot of relevant content would be missed by the simpler method.

More broadly, the users’ concept of a page is “the stuff I see in the browser”. The customer expectation is that Google “sees” what they see. Running Javascript would be necessary to achieve this.

Today I got a bit of confirmation from my SiteMeter stats:

SiteMeter runs a client-side Javascript in the footer of my page, which communicates the traffic data back to their servers and also displays a little image. Unlike server-side logs, the page has to be rendered for SiteMeter to count the visit.

The fact that the above visit from the Googlebot was counted at all is a good hint that they rendered my page to index it. Further, you’ll note that SiteMeter thinks the “browser” has Javascript enabled.

So I imagine that the big indexing farm at Google actually uses Chrome, which downloads the page, renders it, and then saves the fully-realized DOM results. It’s like a thousand automated users.