Canonical Tag


A canonical tag, also known as a canonical link or "rel canonical," is a tag in the source code of a page that indicates to search engines that a master copy of the page exists. Canonical tags are used in SEO to help search engines index the correct URL and avoid duplicate content. 

What is a canonical tag?

A canonical tag is an HTML specification in the header area of a website's source code. It refers to the master page - the canonical URL - for websites with the same or similar content. If a canonical URL is correctly marked, search engines will index this source only, meaning that duplicate content issues can be avoided. 

Canonical Tag en.png

Canonical URL example

In general, when indicating a canonical URL, Google recommends absolute URLs, i.e. the entire URL including the protocol.

The following two URLs have the same content.

https://www.example.com/example.htm
https://www.example.com/examplepage/?session_id=xyz.htm

The first one is the standard resource, and the second one is a session as commonly used by online shops in order to be able to store user-related data as e.g. items in the shopping cart.

As the first URL is more important, this should become the canonical version, and the canonical tag should be integrated into the head element of the second URL to refer to the first page.

This will indicate to Google and other search engines that the first URL is more important and that it should be crawled and indexed in the SERPs.

In this example, the canonical tag is placed in the metadata of the second URL and should look like this:

<link rel="canonical" href="https://www.example.com/examplepage.htm"> />

When to use a canonical tag

It's important to use a canonical tag when the same or very similar content exists on more than one URL. Here are some common use cases for canonical tags:

  • The homepage can be reached from different URLs (for example www.domain.com, domain.com, www.domain.com/index.html, and so on)
  • Pages can be reached with and without Trailing Slashes (“/”) and with case sensitivity
  • Because of URL rewriting, the server only pays attention to one ID and admits variations of the address
  • IDs (as Session-IDs or product filters) are used that don’t change the content
  • Content is presented in different versions (e.g. print version, PDF, etc.)
  • There are HTTPS variants of the site
  • The URL is still available under an HTTP version without SSL encryption
  • Additional content is published on other, external websites

Canonical tag best practices

  • Canonical tags & pagination: When paginating websites with rel= "next" and rel= "prev", each page should refer to itself via canonical, or there should be a "view-all" page, where all products can be visible in one overview. This ensures that pages linked to from those paginated pages can also be found by search engine crawlers. 
  • Canonical tags &' hreflang:' If a website uses hreflang, the URLs should either refer to themselves with a canonical tag or should not use a canonical tag at all. If both hreflang and canonical tags are used, Google receives conflicting signals. While the hreflang tag shows that there is another language version available, the canonical tag would make this version the original URL.
  • Canonical tags & Noindex: With the noindex tag, webmasters can convey to Google that a URL should not be indexed. If a canonical tag refers to this page, Google receives unclear signals, as a canonical URL is a relevant page a webmaster wants to be indexed. Webmasters should therefore decide between a canonical and noindex tag.

Common canonical tag errors

Canonical tags are powerful. However, if applied incorrectly, websites or certain pages of a website may be completely ignored by Google, which could be a disaster for traffic and sales. Before implementing a canonical tag pointing to another page, a webmaster should first decide whether the content is in fact the same and then familiarize themselves with the common canonical tag errors.  

Common errors include: 

  • A canonical URL responds with a 404 status code. Canonical URLs must always be available, as a 404 error will confuse the crawler.
  • Combining “noindex”, “disallow” or “nofollow” tags and canonical URLs is not recommended.[1]
  • The canonical link element is in the body of a document and should not be used repeatedly in the meta data.
  • A relative Path is specified as a canonical link element. This may cause the Googlebot to misinterpret the tag and it, therefore, loses its effect. For this reason, the link should always be specified as a complete URL in the canonical tag.
  • The syntax is ignored. All characters should be taken into account when specifying the URL.  It makes a difference if the canonical tag refers to https://page.com/ or https://page.com - the canonical tag should not refer from HTTPS to the HTTP protocol. In January 2017, Google stated that the use of a secure HTTPS connection would become an important ranking factor for websites. Since then, Google has preferred HTTPS pages to canonical URLs.[2]. The Canonical tag should therefore point from HTTP protocol to the HTTPS page, not vice versa.
  • The canonical tag refers to the homepage of a website. This would be incorrect as it would indicate that there are duplicates of a page. 
  • Canonical chains or cross-references: Incorrect use of the canonical tag results in canonical chains or cross-references. Target pages of a canonical link should not refer to other canonical. 

How to check canonical tags

To avoid checking the source code of every page on your website, it's best to run a crawl using free software like Ryte to check the canonical tags across your entire website all at once.

Ryte has a full report dedicated to canonical tags that quickly provides an overview of potential problems with all the canonical tags implemented on your website and allows you to drill down into canonical status codes, usage, and target categorization.  

Importance for SEO

Canonical tags tell search engines that there is a standard resource, or more relevant page, helping to resolve issues with similar or duplicate content.

Duplicate content has a negative impact on SEO and can be detected using a duplicate content checker or by running a crawl with free crawling technology like Ryte. 

References

  1. Mueller (Google) regarding the combination of noindex and canonical reddit.com. Accessed on November 28, 2018
  2. General Guidelines for All Canonicalization Methods support.google.com. Accessed on November 28, 2018.

Web Links