Inguza Technology AB

technology, analysis and solutions

URL canonicalization

One vital part of Search Engine Optimization is to ensure that there is just one link to the same information page.

This is harder to achieve than most people think. Consider a normal html page with the following URL:

/somelink/index.html

This one can actually be accessed using three different URLs.

http://somehost/somelink/index.html
http://somehost//somelink
http://somehost/somelink/

If the site have encryption (https) then the page can be accessed through three more

https://somehost/somelink/index.html
https://somehost//somelink
https://somehost/somelink/

There are several techniques to avoid this.

Meta tag

The most important technique is to make sure the page have one canonical URL mentioned the header. The HTML syntax is like this:

<html>
...
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
  <link rel="canonical" href="http://somehost/somelink/" /> ...

URL rewriting

Redirect index.html to /

Using apache mod_rewrite you do it like this:

alternative 1

RewriteRule ^(.*)/index.html? $1/ [R=301,L]

alternative 2

#RewriteCond %{REQUEST_URI} ^/(.*)/index.htm$
#RewriteRule ^/.*/index.htm %1/ [R=301,L]

Redirect somelink to somelink/

Using apache mod_rewrite you do it like this:
RewriteCond %{REQUEST_URI} ^((?:.*/)?[^/\.]+)$
RewriteRule ^.*$ %1/ [L,R=301]