You might have read many times that Duplicate content may lead to a penalty but I will show you how it is affecting the page authority and the primitive measures to duplicate URL. This article is a case study with examples of some industry-leading websites. let me break down from the start.
Duplicate Content vs Duplicate URL
When content appears more than once on the internet whether the appearance will be on the same website or on the different websites then it is known as duplicate content.
When a single webpage is accessible or inaccessible by multiple URLs within the same website is know as Duplicate URL. I say accessible or inaccessible because the status of the webpage may be live or left alone as 404.
Difference Duplicate content and Duplicate URL
They might look the same but
- Duplicate content can happen across websites while duplicate URL can happen within a website and almost have the same URL.
- Duplicate content will have almost or at least some of the content that is already available on the internet while duplicate URL will have the same content that is in the accessible state or inaccessible state within a website.
- Duplicate content can lead to penalization by Google, while duplicate URL won’t cause penalization in most cases, but have more effects in the ranking shifts and diversification of page authority.
How Duplicate URL will affect the ranking?
If you have multiple (duplicate) URLs, Google will choose one URL as the canonical version and crawl that and all other URLs will be considered duplicate URLs and crawled less often.
Duplicate URLs have the ability to dilute the page authority
The above-listed URLs will be pointing to the same webpage content, if any of them is not canonicalized or redirected to one single version then it is known as duplicate URL.
All the above URLs are treated as different pages. If 10 backlinks link to www.example.com and 12 backlinks link to example.com, the link juice is passed to two different pages and not to one.
Even though both the pages provide the same information to the users, search engines will treat them as different webpages, if it doesn’t canonicalized.
This may also lead to content duplication but most sites won’t be flagged for the content duplication within a website. Because search engines now are smart they run across the duplicate content and rank the most authoritative version, unless the intent of the duplicate content is deceptive or manipulative.
I have used very simple and free tools to analyze the effect of duplicate URL and a simple way find it.
- Monitor Backlinks
- Google Sheets
- Merkle Technical SEO tool
- Uptime Robot
Why should you worry about the Duplicate URL’s?
As we already know duplicate URL’s are bad for user experience and might divide the link juice. Let me show you an example of Wappalyzer.
Case 1 – Wappalyzer
“Have you ever used wappalyzer.com? Wappalyzer is a cross-platform utility that uncovers the technologies used on websites. It detects content management systems, e-commerce platforms etc,.
Duplicate URL: https://wappalyzer.com
Error: Invalid SSL
You might have received “This site can’t be reached” or “Connection is not protected” unless they resolved the problem at the time you reading this article. So I’ll add the screenshot of the error.
But normally you can visit the webpage from searching on Google because as Google’s statement that “it will find the authoritative version and index it”.
Check out the working version – www.wappalyzer.com (It works, ain’t it?)
The URLs are distinctly different when it has a scheme or without, the same applies to the subdomain. So here the https://wappalyzer.com acts as one version which is unreachable and unmaintained.
learn about the URL parts.
Coming back to Why you need to worry about duplicate URL?
Let’s take a look at their backlink profile just for the not working URL.
That’s 2.4K backlinks from 615 Domains. The list includes 13 backlinks from drupal, 1808+ from GitHub and more. And just for the one URL version.
Whenever a user clicks the backlink they are directed to the “unreachable” page so there might be a huge dropout of users who might even consider using their premium service. This is even apart from the dropping of link juice. By fixing the webpage they can increase the user experience, they can increase the traffic and might even see an increase in conversion.
It’s the same for the search engine bots, when it crawls the webpage with a backlink of the target URL it records the same error. So the page authority is not passed in this case. It doesn’t matter if they have tons of URLs for this dead page.
Case 2 – BigCommerce
Bigcommerce is one of the leading E-commerce solution providers.
Duplicate URL – https://bigcommerce.com/
Error: No status code/Not reachable – Blank page.
The URL doesn’t work and does don’t throw any error but a blank page. Upon analyzing using the technical SEO tool or Https header it’s the same.
Let’s take a look at their backlink profile
And I have been monitoring this quite a long period, previously there were around 200k+ backlinks pointing to the source URL but later I think the webmasters of the regarding backlinked websites would have changed it to the working one.
But anyway the duplicate URL still got the juice and many websites still pointing it to including neilpatel blog. Even if they don’t use the URL to market or don’t have a backlink. It would be a great choice if they fix it because their business is based online.
And the sad part is one of their premium theme (Peak by Pixel Union) costs $195 which has the unreachable duplicate URL at the footer.
Case 3 – Teem
This is quite different from the other websites mentioned above because they don’t seem to have a duplicate URL but an error in the canonical URL.
URL – https://www.teem.com/
Error – Canonical URL
When digging into their source code you could able to notice the Canonical URL of the webpage or even with the extensions like MozBar or something similar. I found out using my simple solution which is attached at the end of the article.
There might be confusion in the rel=”publisher” and rel=”canonical” but the search bots will understand the page as the canonical version of the Google+ profile.
How to fix it?
The website link juice will be different for distinct webpages even though they serve the same purpose. You have 2 options to fix it.
Canonical vs Redirect – You can canonicalize the other URLs to one single piece but before canonicalization, you must fix the webpage so it won’t have any error to the end-user and bots. I suggest, if you have duplicate URL within your website it’s better to 301 redirect them to one single piece so you won’t have multiple versions to monitor all the time.
You can prefer canonical if it’s duplicate content.
Now you worry about the duplicate URL, Right?
How to find the Duplicate URL?
You can simply find the status of the URLs by typing into the browser but I have created a simple worksheet with URL parts separated that could help you to find the duplicate URLs much more easily. Please feel free to make a copy of it.
Get a copy from here – Duplicate URL checker
The sheet is created with the possible misled variations of a webpage. Just enter the domain name in the C2 cell then drag till the last cell. The sheet will automatically provide you the Status Code, Redirected URL and Canonical URL of the webpage. You can even change the subdomain or filename as per the website.
It is good if a webpage has only two “200” status codes. That is one will have a trailing slash (/) at the end another URL will not have a trailing slash.
Let me know if you find any other websites in the comment section, I will update that here and add your LinkedIn profile in the section. And subscribe to my newsletter to keep you posted.