<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>elettrosmo &#187; SE Optimization</title>
	<atom:link href="http://elettrosmo.wordpress.com/category/se-optimization/feed/" rel="self" type="application/rss+xml" />
	<link>http://elettrosmo.wordpress.com</link>
	<description>Just another WordPress.com weblog</description>
	<lastBuildDate>Wed, 29 Nov 2006 18:41:11 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='elettrosmo.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/a4e667513ea0012add4235eadf698405?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>elettrosmo &#187; SE Optimization</title>
		<link>http://elettrosmo.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://elettrosmo.wordpress.com/osd.xml" title="elettrosmo" />
		<item>
		<title>Duplicate Content: What You Ought to Know About</title>
		<link>http://elettrosmo.wordpress.com/2006/11/29/duplicate-content-what-you-ought-to-know-about/</link>
		<comments>http://elettrosmo.wordpress.com/2006/11/29/duplicate-content-what-you-ought-to-know-about/#comments</comments>
		<pubDate>Wed, 29 Nov 2006 18:41:11 +0000</pubDate>
		<dc:creator>elettrosmo</dc:creator>
				<category><![CDATA[SE Optimization]]></category>

		<guid isPermaLink="false">http://elettrosmo.wordpress.com/2006/11/29/duplicate-content-what-you-ought-to-know-about/</guid>
		<description><![CDATA[Originally published at &#8220;Duplicate   Content: What You Ought to Know About&#8221;.
Take a look at your website. How much of your content might be considered   as duplicate by a search engine algorithm? Even though you never copy anyone   you can&#8217;t answer &#8216;none&#8217; because someone can be copying you. Duplicate  [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=elettrosmo.wordpress.com&blog=580447&post=11&subd=elettrosmo&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Originally published at <a href="http://www.seoresearcher.com/duplicate-content-what-everybody-ought-to-know-about.htm">&#8220;Duplicate   Content: What You Ought to Know About&#8221;</a>.</p>
<p>Take a look at your website. How much of your content might be considered   as duplicate by a search engine algorithm? Even though you never copy anyone   you can&#8217;t answer &#8216;none&#8217; because someone can be copying you. <strong>Duplicate   content </strong>is one of the biggest issues both for search engines trying   to keep their results&#8217; relevancy high, and webmasters trying to avoid search   engine penalties.</p>
<p><strong>Penalties</strong> for having duplicate content can be really harmful.   This is not just a downgrade in rankings but a move to supplementary results   which are hardly visible to the most of the web users. Normally it is expected   that Google would select one URL over another to display in SERPs, while duplicates   could be found in <strong>supplemental results</strong>. Unfortunately this   is not always so. In the thread &#8220;Duplicate content observation&#8221; in   the WebmasterWorld.com forum you can read about a case when an original high   quality and authoritative page was removed from Google&#8217;s index together with   its duplicates. Considering that this can happen even to the most honest webmaster,   one can imagine the amount of attention this issue gets on any SEO forum.</p>
<p><strong>Types of Duplicate Content</strong></p>
<p>Duplicate content has a wider definition than the &#8216;copy-paste&#8217;   plagiarism; it is not just content scrapped from a competitor&#8217;s site,   a SERP or a RSS feed. Apart from this there are few more aspects that are generally   referred to as duplicate content.</p>
<p><strong>Circular Navigation</strong></p>
<p><strong>Jake Baille</strong> from <em>TrueLocal</em> vaguely defines circular   navigation as having multiple paths across website. This can be understood as   the same content being accessible via different URLs. An example of the circular   navigation could be an article that is retrieved by links like<br />
<em>- www.example.com/articles/1/ ,<br />
- www.mysite.com/article1/<br />
- www.mysite.com/articles.php?id=1 </em></p>
<p>Another legitimate use of multiple URLs is forum threads. Each thread can   be accessible by a link like <em>www.myforum.com/index.php/topic.1201.html</em>   , <a href="http://digilander.libero.it/dellagricoltura">and</a> each message within the tread has a URL like <em>www.myforum.com/index.php/topic.1201.msg.01.html</em>   . In the eyes of a search engine all the links lead to different pages with   identical content. Solution? Think of a consistent way of linking, or apply   robot.txt exclusion rules.</p>
<p>This can also be the case when other people link to you using differently looking   URLs. Since these external links are out of your control, you should create   a 301 redirect to the canonical URL you choose to be displayed.</p>
<p><strong>Printer-Friendly Versions</strong></p>
<p>Making a printer friendly version is a common practice and it adds value to   the visitors. But printer-friendly version is also a prominent example of duplicate   content! Fortunately a simple solution like adding a &#8216;noindex&#8217; meta   tag to your print pages solves the issue.</p>
<p><strong>Product-Only Pages</strong></p>
<p>Product pages looking similar are common among online stores. Typically they   are created using a single template. Often two different product pages share   a description <a href="http://digilander.libero.it/designtemplate/">that</a> varies in just few words or numbers, which causes them to   be filtered out as duplicate content. This issue has no easy solution. Either   you rewrite robot.txt to allow only one product description to be crawled and   lose SE traffic to the rest of them, or you roll up your sleeves and add something   different to each product page, like testimonials, which is time consuming or   nearly impossible depending on the number of product types in your stock.</p>
<p><strong>How Do Duplicate Content Filters Work?</strong></p>
<p>There are several algorithms in data mining aiming to detect similar text passages.   The one claimed to be used by search engines is w-shingling. Each document <a href="http://digilander.libero.it/templatebusiness">has</a>   a unique fingerprint or shinglings &#8211; the contiguous subsequences of tokens (blocks   of text). The ratio of magnitude of union and intersection of two documents&#8217;   shinglings can be used to determine their resemblance. Another algorithm that   can be used for duplicates detection is Levenshtein&#8217;s distance</p>
<p>It is naturally to expect from a duplicate content filter to be able to discover   the origin and rank it higher. The simplest way to detect the origin would be   comparing the date of indexing implying that the original source is uploaded   and crawled earlier than its copies. But with the advent of the RSS feeds the   new content can be distributed instantaneously and this approach is no longer   valid.</p>
<p>Concerning the origin&#8217;s right to be ranked higher &#8211; this is not always implemented.   In this <a href="http://www.webconfs.com/duplicate-content-filter-article-1.php">article</a>   you can read about an experiment of an article distribution. An article <a href="http://digilander.libero.it/templatefantasy">was</a>   syndicated twice scoring as many as 19000 copies. After some time Google, Yahoo   and MSN have purged their indices leaving just few of the duplicates. MSN&#8217;s   filter managed not only to discover the origin but also put it to the top of   the search results. Yahoo has also discovered the origin, but in the results   page to the title of the article, the origin&#8217;s position fluctuated obviously   responding to the way Yahoo counts relevancy and authority.</p>
<p>To the tester&#8217;s amusement Google&#8217;s refined index did not include the original   at all! Evidently Google featured only those pages with copies of the same article   which it considered relevant and authoritative with no regard to the original   source of the content! I&#8217;ve already mentioned a thread where a similar problem   is discussed. The both stories took place in 2005 and early 2006 and so far   I found no evidence that this issue is resolved.</p>
<p>For more information on search   engine optimization and marketing check out our <a href="http://www.seoresearcher.com/">SEO   Training Materials</a> website.</p>
<p>About the AuthorOleg Ishenko, an Internet marketing professional gives useful advice on search engine optimization at his website <a href="http://www.seoresearcher.com/">SEO Training Materials</a></p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/elettrosmo.wordpress.com/11/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/elettrosmo.wordpress.com/11/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/elettrosmo.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/elettrosmo.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/elettrosmo.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/elettrosmo.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/elettrosmo.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/elettrosmo.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/elettrosmo.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/elettrosmo.wordpress.com/11/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/elettrosmo.wordpress.com/11/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/elettrosmo.wordpress.com/11/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=elettrosmo.wordpress.com&blog=580447&post=11&subd=elettrosmo&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://elettrosmo.wordpress.com/2006/11/29/duplicate-content-what-you-ought-to-know-about/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/44b340149818a9f5aefcd5922213a97a?s=96&#38;d=identicon" medium="image">
			<media:title type="html">elettrosmo</media:title>
		</media:content>
	</item>
	</channel>
</rss>