The Express newspaper has cocked up its implementation of the rel=canonical command SO BADLY that it has created an infinite number of duplicate webpages … many of which now have links from elsewhere on the internet.
Using rel = canonical properly
You use the rel=canonical command to tell Google that a given URL is actually a version of another URL – and that the search engine should treat the second version as if it was that main URL.
It’s useful if you have multiple copies of a page in different directories, have lots of versions of the same page due to EG WordPress making 2 versions of every page, or allow anyone to rewrite your URLs so it looks like your insulting Pippa Middleton’s sister.
Make a mistake with rel=canonical, however, and it can wipe your website off the face of the internet.
Using rel = canonical to make infinite URLs
The Express site’s CMS is creating a duplicate version of every single page via the rel=canonical tag. And then a 3rd version, and then a 4th … and it’s never stopping until it gets to infinity.
Take a sample page like this one: http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-system
If you look at the HTML code, you can find:
<link rel=”canonical” href=”http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-systemAV-referendum-Why-we-must-vote-NO”>
The CMS has miscoded the canonical URL to include the first bit of the URL relating to the individual page (the AV-referendum-Why-we-must-vote-NO bit) twice.
If you visit that supposedly canonical URL, you see this, with the page-specific bit in there three times.
<link rel=”canonical” href=”http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-systemAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NO”>
Go to that URL, and you find it there 4 times. Etc.
I got bored at http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-systemAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NO
but this will never stop. Each time you visit the canonical URL, a new canonical URL is created.
All these URLs are working pages because the Express only looks at the number in the URL to decide what content to show. So http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-system is the same as http://www.express.co.uk/features/view/244786/vote-YES is the same as http://www.express.co.uk/features/view/244786/who-exactly-specced-this-CMS.
Dozens of URls for each Express story
Sometimes these duplicate canonical URLs aren’t in Google’s index (I guess as each one is cancelled out by the next one). Although you can find them. This search, for instance, has this URL showing up: http://www.express.co.uk/posts/view/242092/DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-DEBATE-Is-Britain-a-soft-touch-for-benefit-spongers-
Even worse, the first URL that appears for that search is the printable URL of the page with no adverts on!
And as that search, with 55 results, reveals, the Express has a massive problem with duplicate content.
The Express then makes the problem even worse …
This is a problem it makes worse via its use of Tynt to add URLs when you copy and paste content. So if you copy and paste the first sentence from this URL: http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-system, what you end up with is this:
“BY the time you read this you will have probably already voted No to AV in today’s referendum.
Read more: http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-systemAV-referendum-Why-we-must-vote-NO#ixzz1LW2s00ge”.
The Express uses Tynt to add the read more bit and the URL to what you’ve copied.
But, yes, the code they are adding contains the wrong URL with two versions of the page slug. Follow that link and copy a sentence and you end up with this:
“BY the time you read this you will have probably already voted No to AV in today’s referendum.
Read more: http://www.express.co.uk/features/view/244786/AV-referendum-Why-we-must-vote-NO-to-the-new-voting-systemAV-referendum-Why-we-must-vote-NOAV-referendum-Why-we-must-vote-NO#ixzz1LW31La3e”
Yup, another new URL created by the system that’s designed to channel links to the main story.
You can see this in action on this page on the Daily Mail where someone has copied the opening para from some other bat shit story, and the Tynt URL is to http://www.express.co.uk/posts/view/244206/EU-wants-to-merge-uk-with-franceEU-wants-to-merge-uk-with-franceEU-wants-to-merge-uk-with-france#ixzz1LCIcD5jI.
This might explain why the Express can’t rank in first place for a paragraph from its own story.
To sum up
The Express isn’t appearing top of Google’s results for searches using their own content and Google is serving up versions of its pages with no adverts on – all because Google can’t work out which page is the correct one because the Express constantly points to yet another URL for every single page – even the made up ones.
You might also like
- Google Instant filters put gay and lesbian on a par with rape, racism and paedophilia
- How to work out what Google Instant means for your business
- Google Instant keyboard navigation increases likelihood of clicking PPC ads
- Google autocomplete now fixes spelling problems
- Google puts the anal in analytics
Leave a comment!