Avoid
duplicate content by noindex search label and mobile page on your Blogger is very important if you don't want to suffer the so-called Google Panda algorithm hit.
Duplicate content is probably the worst enemy for every blogger that practicing SEO.
Yes I have read many times, including the one
from Matt Cutts saying that duplicate content won't hurt. But bitter realities we suffer from duplicate content is opposite from his saying. Duplicate content does hurt our site.
The easiest way to know whether your blog / website has duplicate content or not, is by searching your site using this key
site:www.yourblog.com then Google will display number of indexed pages.
And then compare the number of results from Google search result with the number of your own published articles.
For example if you own about 500 articles, but Google displays 1000 results. The rest of 500 is duplicate content.
My Google Panda penalty experience.
At the early 2014, my technology blog got hit by Panda for the first time, it was a bad hit that dropped my traffic almost %100. Below is the screenshot of Panda penalty suffered by my blog.
Usually I got 8 or 9 thousands visitors everyday, but since the January, 11, 2014 my traffic started to drop and keep dropping to 157 visitors/day.
Can you imagine how unpleasant the situation was for me? I had to make some accurate fix to bring back the traffics to its original level.
And now the traffic of the blog is gradually increasing, not yet to the point I am targeting, but it's increasing time to time.
What have I done to recover from Google Panda hit? The answer will be shared here just keep reading.
What is duplicate content?
In the realm of blogging, that simply means a page that has more than one URLs. For example is your A Article has 2 URLs like this;
- http://yourname.com/a-article.html.
- http://yourname.com/a-article.html?m=1.
Those two URLs will point to the same content page, that causes trouble for search engines like Google to determine which URL is the original and should be shown on their result page.
Panda algorithm
Google uses the so-called Panda algorithm to evaluates blogs indexed to its server, Panda's job to internally and routinely scan those blogs and then decide whether the blogs are important (deserve to get good rankings) or just bunch of paltry blogs.
And duplicate content is one of the unknown factors used by Panda algorithm to evaluate and measure whether the blog should be ranked good or be dropped as lowest possible.
And of course Panda receives all the data from Google Robots (bots) that regularly crawl and index our blog. Those bots could make many mistakes during crawling and indexing our blogs.
Do you know that Google bots can make a mistake? Yes the can make a destructive mistake which is still indexing the
no-index pages, albeit we have set no index sign on robots.txt (disallow) file or meta robot tag on head area.
And then those Google bots hand the indexed data to Panda to examine and decide to rise or topple the ranking of blogs in questions.
Google panda algorithm penalty is one domain level penalty that means if one, two, three or fours of your articles penalized or got hit by Panda than your entire domain will also bear the consequence, being punished entirely, your rank will drop to the level of your worst nightmare.
Panda targets on...
Several main areas that this algorithm is targeting are all internal side of your blog, they are...
- Duplicate contents among your posts, URLs, or from other blogs.
- Low quality blog and articles.
- Content farm blog.
- Article that is too short, less than 50 words, can trigger Panda hit (they say).
- Place many advertisements could raise Panda eyebrows too.
But in this post I just want to focus on the duplicate contents that happen to my blog, not this blog, the other one.
Therefore if you feel like owning low quality articles, or too short articles or your blog has too many advertisements, then you know what to do.
Following are my quick tips to deal with thin contents and advertisements.
Deal with too short articles or thin articles.
Edit those too short articles or remove them from your blog, yes remove them if you think that's a better solution.
Place only three ads at maximum.
As for the advertisements be sure to only place 3 ads at maximum for a single page.
And again this post only focuses on duplicate contents by URLs.
So before the mistakes kill your blog entirely, it's time to direct these bots to the right pages and leave the shouldn't be indexed pages alone.
No!..Canonical URL doesn't help too much, at least for me!
People routinely suggest to use URL canonization by that Google bots will only take the canonized URL as reference.
Yet, the rest of non-canonical URLs are still indexed and appear on Google result pages that create duplication of the real contents or canonized URL.
In Blogger, we are provided with labels to categorized our articles, and those articles are archived monthly and then there is Mobile friendly URL too for optimizing the look to the smaller devices.
But that's the main issue, because URLs from labels, archives and also mobile are all indexed and those unnecessary index has created so many not-useful contents appear on Google search result pages.
As the result they are marked as duplicate contents by Panda algorithm.
It is time to put an end of the misery and set the search labels URLs, mobile URL and archives URLs excluded from Google index.
In this post you will find of what I have done to recover from Google Panda Algorithm hit that caused my blog dying and need to be taken care of.
Recover from Panda Algorithm Hit by removing duplicate contents in Google result pages for Blogger.
In this section you will find several steps that I consecutively performed to optimize number of areas in my Blogger blog.
All of the steps relate to removing duplicate contents. In Blogger, area that can
How to noindex unnecessary URL to avoid duplicate content in Blogspot
We will be dealing with search label, archive and mobile URL (m). These kind of urls contain too many words that duplicate with the real posts that should be indexed and ranked as high as possible in Google result pages.
Noindex and remove search label URL Blogspot
Blogger's Search label URLs can contribute considerable amount of duplicate contents if your blog has many labels.
It is recommended for these kind of URLs not be indexed if you suffer content duplications and currently get Panda hit.
Use robots.txt.
So to noindex is you can use Robot.txt and place it like this.
Disallow: /search/labels/
Use meta tag.
Aside using robot.txt, you can also place the meta tag to tell Google robots no to index the search label and its url. So inside <head>, paste the following tag.
<b:if cond='data:blog.searchLabel'>
<meta content='noindex,nofollow' name='robots'/>
</b:if>
See the image below for example.
Remove search label URL from Google
You can also remove the already indexed search label URLs from Google, by using remove URL utility in Google Webmaster Tools.
- Find the Blogger search label URL that's already indexed, (do so by typing this in Google; site:http://yourblogname.com/search/label).
- Then login to your Google Webmaster Tool account, then pick your blog from the list.
- Google Index -> Remove URL, then paste the blogger search label URL to the provided form (Create A New Removal Request) and then hit Continue.
Now all you need is waiting for Google to update by excluding all the indexed search label URLs from their result pages.
Noindex and remove Blogspot archive URL from Google
Archive is for our readers finding the articles published in certain times, monthly, weekly or yearly.
The idea of archiving our posts is good, but since Google robots are like a 3 years old boy, they hardly differentiate between archive URL and the real URL, they index them all and mark them as duplicate contents, that's bad.
Thus it's better to keep these URLs away from Google's bots sight by set it noindex and remove them all from Google index.
Use meta robot tag.
You can disallow Google from indexing archive by simply paste the following command line in your <head> template.
<b:if cond='data:blog.pageType == "archive"'>
<meta content='noindex,nofollow' name='robots'/>
</b:if>
See the image as reference.
Use header tag
You can also utilize the owned blogger Custom Robots Header Tags to noindex archive URL that can create duplicate content for single post of yours. You can set it just like in the following picture.
(Settings -> Search Preferences).
Remove archive URL from Google index.
If your archive URLs indexed by Google and they are shown on the search result pages.
You should also remove the Archive URLs from Google result the same way you have removed search label URLs through Google Webmaster Tools (see the image above.
Noindex and remove mobile URL (m) from Google result
Mobile url like this; ?m=0 or ?m=1 at the end of your URL can definetely create a trouble.
All those M urls appear on the Google result pages, sometime they compete to the original URL, and this can alarm Google Panda to think your blog is not correctly maintained.
I suggest you to also deal with these kind of URLs too.
Use Robots.txt
Robots.txt can be our helper to deal with almost many duplicate contents, it can also help you disallow Google to index mobile URLs, therefore paste the lines of cones below to forbid Google from indexing them.
Disallow: /?m=1
Disallow: /?m=0
Disallow: /*?m=1
Disallow: /*?m=0
Disallow: /*/*/*.html?m=0
Disallow: /*/*/*.html?m=1
Use meta robots tag.
Sometime we need to add more warning to emphasize our command to Google robots, therefore add the following meta tag inside your <head>.
<b:if cond='data:blog.isMobile'>
<meta content='noindex,nofollow' name='robots'/>
</b:if>
See the image.
Remove Mobile (m) URL from Google results
Complete your efforts removing the entire mobile URLs from Google by removing the already indexed mobile URLs. The procedure is the same like removing search label URLs in Google Webmaster Tools.
Additional.
Beside applying the above techniques, I also checked all the low quality articles on my blog. The low quality articles that only has less words, not too informative and also duplicate with other articles.
So I edit those articles and combine the number of short articles into one lengthy articles, and I also delete many of other articles that I know won't help me if they are kept.
And those action seem to works. The blog traffic is gradually increasing now.
Conclusion.
All the above procedures will of course omitting the possible duplicate contents that attacking your Blogger Blogspot, by applying the procedures you have at least do something to fix the problem that possible caused you a Panda algorithm hit.
Now all you need to do is wait for the update, hopefully you will recover from Panda hit.