What Can You Do About Duplicate Content?

Duplicate content is one of those SEO things that many people are confused about. In fact, one of the reasons it's so confusing is because of a video Google put out to try and answer the question. In that video, Greg Grothaus explains what duplicate content is and how Google looks at it. But the confusing part is what he says right near the beginning of the video:

“First and foremost I want to clear up a myth that has been going around called the duplicate content penalty. Generally speaking, people are worried that Google has a penalty for sites that have duplicate content on them.... What we're doing for that specific query we are ommitting the print article. It's not a penalty.”

And on the Webmaster Central Blog they write:

“There's no such thing as a ‘duplicate content penalty.’

These statements have made many content authors relax a bit too much in regards to duplicate content. When in fact, while there is no “duplicate content penalty,” duplicate content can still hurt your website rankings. Here's a look at why duplicate content really is bad, and what you can do about it.

When is Duplicate Content Bad?

One of the easiest ways to determine if your duplicate content is going to hurt your site is by asking yourself why you want to duplicate the content. If your reasons have more to do with you, such as you want more eyes on the content, you want to get the content to display higher in search, or you want to manipulate search results then you shouldn't do it.

In their article on duplicate content, Google specifically says:

“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.”

Google may take action against your site, up to removing it completely from their results if they feel you are trying to manipulate search results. However, in this case, the duplicate content is a symptom of spam, and spamming is the reason your site would be removed.

But duplicate content can also be bad, or yield less effective results even if you're not trying to manipulate search results. Search engine providers want to provide their customers with a varied group of results, not 10 or 100 pages with all exactly the same content. To alleviate this, many search engines will write something like this in the search results:

“In order to show you the most relevant results, we have omitted some entries very similar to the results already displayed.“

These additional results are often available if the customer clicks another link, but most people don't click it and so don't see the additional results.

If you have written an article and placed it on one website, and then a month later decide you want to put it up on another site, that second site may end up in the additional results that aren't displayed. This is not a penalty. This is Google acknowledging that the first site was the one that posted the article first, and thus is considered the owner of the content. If someone did a search they would not want or need to read the same article twice.

When is Duplicate Content Okay?

The most common type of duplicate content is caused by URLs on a website. For example, these three URLs point to the exact same page, my home page:

http://webdesign.about.com
http://webdesign.about.com/
http://webdesign.about.com/index.htm

It is possible that a search engine could index all three URLs separately, and thus would have three pages of duplicate content in their index. Yes, they all point to just one page, but from the search engine's perspective, it has been indexed three ways. This can happen with URL parameters, such as on data-driven sites as well. Search engines treat this type of duplication as accidental, and often simply select the best URL to represent them all.

Another form of duplicate content is when you create platform- or function-specific versions of your site. For instance, if you recreate your entire site with mobile-friendly styles or you have printer-friendly versions of your pages. This type of duplicate content has a definite reader-centric purpose, and will not result in any penalties or problems in search engines. These pages may appear in the “additional results” section of a search, but if a reader then ammended their search to look for mobile or print, these pages would show up instead of the standard web pages.

The first thing you should do is avoid creating it in the first place. But once you have it, there are some things you can do to reduce the load on search engines and help your site be better indexed:

Use permanent redirects. 301 redirects tell search engine spiders that the page has moved permanently, so they will remove the old URL from their index and replace it with the new.
Link to index pages the same way every time. Above you saw three ways to point to my home page, but to help search engines you should choose one of the methods and use it every time. This is true for any page that uses the index.html file name. Personally, I prefer to include the trailing slash, but leave off the file name. So I always link to http://webdesign.about.com/.
Use noindex on syndicated content. Ask people who use your syndicated content to include the noindex meta tag on your articles so that the duplicate content is not included in search engine indexes.
Avoid repetition of boilerplate copy. If you have pages that have long strings of copyright or other boilerplate information at the bottom, it's better to move that content to a separate page, and link to it from your content pages.
Expand similar content pages. If you have pages that have very similar information — such as a travel site with limited information about multiple cities — you should add to your pages to provide more unique information about the page topic. If you can't do that, you should combine the similar pages into one.
Provide more unique content. If you are running an affilate site, it can be tempting to simply copy and paste the product details from the store, such as Amazon, onto your pages, but this doesn't add any value so search engines will simply link to the original site instead. By providing unique content about the products, your site will avoid duplicating the store's site and get your pages better ranking.
Use rel=canonical. This is a link relationship that Google (and possibly some other search engines) uses to define the preferred version of content. It is a way of telling the search engine that that page is the parent page and any duplicates should defer to this one.