Overview

The correct mime type for XHTML 1.0, of course, is application/xhtml+xml. At this point in time (2007), most modern browsers understand this mime type and correctly handle the served document as an XHTML (that is, XML) document. And since browsers generally advertise the mime-types they accept when they make a request, things should ?just work.? However, some browsers say they accept everything even though they do not know what to do with application/xhtml+xml.

Microsoft?s Internet Explorer (MSIE) is the primary culprit. It advertises that it accepts all mime-types (by sending ?Accept: */*? with the request), but when you serve it such a file, all it does is ask you where you would like to save it, perhaps so you can open it in another browser that knows how to read it.

If you?d like your MSIE-using readers to be able to view your pages without hassle, you?ll have to take matters into your own hands.

The actual changes required are quite simple, but it took some time and experimentation to get things working perfectly, so I wrote this how-to to save you the time and hassle I went through.

The general approach we?ll take is to user the .xhtml file extension for XHTML files, configure Apache to serve the correct mime-type (application/xhtml+xml) for files that end in .xhtml, and add a rewrite rule to override this for browsers that advertise that they don?t accept application/xhtml+xml or who say that they accept everything but are really lying (MSIE.

Can't see Adsense statistics in Google Analytics

Preliminaries

For the purposes of this how-to, I assume you have Apache installed on your server and are reasonably comfortable configuring it, and that you are aware of the perils of using XHTML properly. I?m not going to cover what XHTML is or why document.write won?t work correctly or why a page might appear horribly grotesque when correctly served as XHTML.

If you?re still with me, great! There?s just one last thing to cover before we begin: which Apache files to alter.

Since the details of Apache?s configuration files vary between different GNU/Linux distributions, I?m just going to say that all of the recommended Apache changes go in whatever your global Apache configuration file is. On my server, which runs Gentoo GNU/Linux and is configured to use virtual hosts, the appropriate file is /etc/apache/vhosts.d/00_default_vhost.conf, which gets included into /etc/apache/httpd.conf. On your server, the main configuration file may be something other than httpd.conf, or you might want to add the changes to a per-directory configuration file (usually .htaccess). Check the documentation of your distribution if unsure.

Can't see Adsense statistics in Google Analytics

Configure Fallback Behavior

First, we must ensure that files with an .xhtml extension are served with the application/xhtml+xml mime-type. Apache?s mod_mime is the module responsible for associating a mime-type with a served file based on the file?s extension. In order to serve .xhtml files as application/xhtml+xml, we add the following mod-mime directive:

AddType application/xhtml+xml;q=0.8 .xhtml

If we wish to serve index.xhtml to requests that don?t specify a filename, such as this tutorial if you?re viewing it at the intented location, we add the following line to ensure that directory requests return the index.xhtml if present (note that .xhtml comes before.html):

DirectoryIndex index.xhtml index.html

Can't see Adsense statistics in Google Analytics

Configure mod_rewrite

At this point, Apache will serve all.xhtml files as application/xhtml+xml. In order to serve a file as text/html if the user-agent states that it doesn?t accept application/xhtml+xml (or if the user-agent is MSIE), we use Apache?s mod_rewrite module to override the default mime-type when necessary.

<IfModule mod_rewrite.c>
RewriteEngine on
# Uncomment RewriteBase line if adding inside per-directory
# configuration files (e.g., .htaccess):
# RewriteBase /
RewriteCond %{REQUEST_URI} \.xhtml$
RewriteCond %{HTTP_USER_AGENT} MSIE [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml\s*;\s*q=0\.?0*(\s|,|$)
RewriteRule .* - [T=text/html]
</IfModule>

The first rewrite condition is that the request is for an .xhtml file. Otherwise, rewrite processing stops immediately, and we make no change to the mime-type. The second and third conditions together are that either the user-agent that made the request is Internet Explorer or that the agent explicitly states that it does not accept application/xhtml+xml. Taken all together, the conditions and the rule state that if we get a request for a .xhtml file and the user-agent is MSIE, or if the agent states that it does not accept application/xhtml+xml files, then the file should be served with the text/html mime-type.

Can't see Adsense statistics in Google Analytics

Change XHTML extensions

If you need to change some of your file extensions to .xhtml, now is the time to do so. If you?re on Unix, Linux, or Mac OS X, and you just want to change all .html files to .xhtml, here?s a handy one-liner that will change all files in a directory, recursively:

for f in $(find base_html_dir/ -type f -name '*.html'); do \
mv -i "$f" "$(dirname $f)/$(basename $f .html).xhtml"; done

Can't see Adsense statistics in Google Analytics

Verify Apache Modifications

At this point, all that remains is to restart Apache.

If you wish to verify Apache?s behavior, you can use Curl or a similar tool to make a request with different Accept and User-Agent headers. Here is a brief transcript that shows correct behavior requesting my homepage:

calvins@turing ~ $ curl --head --header "Accept: " --user-agent "" \
protempore.net/~calvins/
HTTP/1.1 200 OK
Date: Mon, 24 Sep 2007 20:47:22 GMT
Server: Apache
Last-Modified: Mon, 24 Sep 2007 08:55:12 GMT
ETag: "1e477f-79b-43addc741bc00"
Accept-Ranges: bytes
Content-Length: 1947
Content-Type: application/xhtml+xml; q=0.8

calvins@turing ~ $ curl --head --header \
"Accept: */*" --user-agent "" \
protempore.net/~calvins/
HTTP/1.1 200 OK
Date: Mon, 24 Sep 2007 20:47:56 GMT
Server: Apache
Last-Modified: Mon, 24 Sep 2007 08:55:12 GMT
ETag: "1e477f-79b-43addc741bc00"
Accept-Ranges: bytes
Content-Length: 1947
Content-Type: application/xhtml+xml; q=0.8

calvins@turing ~ $ curl --head --header "Accept: */*" --user-agent \
"Windows-RSS-Platform/1.0 (MSIE 7.0; Windows NT 5.1)" \
protempore.net/~calvins/
HTTP/1.1 200 OK
Date: Mon, 24 Sep 2007 20:48:11 GMT
Server: Apache
Last-Modified: Mon, 24 Sep 2007 08:55:12 GMT
ETag: "1e477f-79b-43addc741bc00"
Accept-Ranges: bytes
Content-Length: 1947
Content-Type: text/html

calvins@turing ~ $ curl --head --header \
"Accept: */*, application/xhtml+xml;q=0" \
--user-agent "" protempore.net/~calvins/
HTTP/1.1 200 OK
Date: Mon, 24 Sep 2007 20:49:10 GMT
Server: Apache
Vary: Accept
Last-Modified: Mon, 24 Sep 2007 08:55:12 GMT
ETag: "1e477f-79b-43addc741bc00"
Accept-Ranges: bytes
Content-Length: 1947
Content-Type: text/html

calvins@turing ~ $ curl --head --header \
"Accept: */*, application/xhtml+xml;q=0.8" \
--user-agent "" protempore.net/~calvins/
HTTP/1.1 200 OK
Date: Mon, 24 Sep 2007 20:49:16 GMT
Server: Apache
Last-Modified: Mon, 24 Sep 2007 08:55:12 GMT
ETag: "1e477f-79b-43addc741bc00"
Accept-Ranges: bytes
Content-Length: 1947
Content-Type: application/xhtml+xml; q=0.8

calvins@turing ~ $

As you can see, the Content-Type request header is text/html when application/xhtml+xml is not in the Accept request header or is present with a q value of 0, as well as when the user-agent is Internet Explorer. Otherwise, the content type is application/xhtml+xml, as it should be.