<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Digital fingerprints for content &#8212; would this help against plagiarism?</title>
	<atom:link href="http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/</link>
	<description>empowered by monkeys</description>
	<pubDate>Sat, 31 Jul 2010 01:40:42 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
		<item>
		<title>By: Misc Links &#171; Security Enthusiast</title>
		<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-104542</link>
		<dc:creator>Misc Links &#171; Security Enthusiast</dc:creator>
		<pubDate>Tue, 28 Jul 2009 04:56:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-104542</guid>
		<description>[...] Website Maxpower Digital fingerprints for content — would this help against plagiarism? Found this interesting &#8211; Saved link here for later reference. [...]</description>
		<content:encoded><![CDATA[<p>[...] Website Maxpower Digital fingerprints for content — would this help against plagiarism? Found this interesting &#8211; Saved link here for later reference. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathon</title>
		<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-96358</link>
		<dc:creator>Jonathon</dc:creator>
		<pubDate>Sun, 20 Jan 2008 13:27:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-96358</guid>
		<description>Despite adding spaces around the less than and greater than signs (is there a coding-related term for those?), the code still did not display.
So now I've removed the greater than and less than signs:

br/   br/   small    [[this post from example.com]]  /small  !--test_-_content_taken_from_example.com--</description>
		<content:encoded><![CDATA[<p>Despite adding spaces around the less than and greater than signs (is there a coding-related term for those?), the code still did not display.<br />
So now I&#8217;ve removed the greater than and less than signs:</p>
<p>br/   br/   small    [[this post from example.com]]  /small  !&#8211;test_-_content_taken_from_example.com&#8211;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathon</title>
		<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-96357</link>
		<dc:creator>Jonathon</dc:creator>
		<pubDate>Sun, 20 Jan 2008 13:24:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-96357</guid>
		<description>the code didn't display, despite using backticks and &lt;code&gt; &lt;/code&gt; tags, so here it is again, this time with extra spaces around the :

[[this post from example.com]]</description>
		<content:encoded><![CDATA[<p>the code didn&#8217;t display, despite using backticks and <code> </code> tags, so here it is again, this time with extra spaces around the :</p>
<p>[[this post from example.com]]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathon</title>
		<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-96356</link>
		<dc:creator>Jonathon</dc:creator>
		<pubDate>Sun, 20 Jan 2008 13:21:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-96356</guid>
		<description>To not confuse my readers, I tried 'commenting-out' the fingerprint, so it's not visible, making it small (if commenting out didn't work), and adding line breaks so the fingerprint is in a stand-alone paragraph, not stuck up against the tail end of the first paragraph.  

In Feeds, none of my code is treated as code - it is all displayed.  

Is there a way to feeds to treat code as code?

I hope the code will display below
&lt;code&gt;`[[this post from example.com]]&lt;!--test_-_content_taken_from_example.com--&gt;`&lt;/code&gt;</description>
		<content:encoded><![CDATA[<p>To not confuse my readers, I tried &#8216;commenting-out&#8217; the fingerprint, so it&#8217;s not visible, making it small (if commenting out didn&#8217;t work), and adding line breaks so the fingerprint is in a stand-alone paragraph, not stuck up against the tail end of the first paragraph.  </p>
<p>In Feeds, none of my code is treated as code - it is all displayed.  </p>
<p>Is there a way to feeds to treat code as code?</p>
<p>I hope the code will display below<br />
<code>`[[this post from example.com]]<!--test_-_content_taken_from_example.com-->`</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The Swamp &#187; A summary of the internet (circa June 10, 2006)</title>
		<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4893</link>
		<dc:creator>The Swamp &#187; A summary of the internet (circa June 10, 2006)</dc:creator>
		<pubDate>Sat, 10 Jun 2006 20:17:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4893</guid>
		<description>[...] There&#8217;s also a goldmine of content about digital fingerprinting. [...]</description>
		<content:encoded><![CDATA[<p>[...] There&#8217;s also a goldmine of content about digital fingerprinting. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Bailey</title>
		<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4151</link>
		<dc:creator>Jonathan Bailey</dc:creator>
		<pubDate>Tue, 02 May 2006 14:35:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4151</guid>
		<description>First off, it's pretty trivial to put ESNs into your RSS feed. While it requires a little bit of extra effort to edit the Wordpress RSS feed, you have to change permissions on the RSS template file and may have to edit it by hand, it can be done with little knowledge.

All you have to do is input the exact same Numly template tag you used for your regular template and place it within the content section of the post. You will probably have to edit the plugin to remove the DIV element (so it will validate) and you can then modify it so that it blends in better. You could theoretically, change the number itself to text and just link to the validation page, for example.

The Numly plugin is pretty easy to do these kinds of edits on, well made in that regard especially. I'm the worst PHP programmer out there but it only took me a few minutes to switch to the new Numly server when the change happened.

Regardless, the problems you list aren't unique to any digital fingerprinting system, but to all digital fingerprints. Pretty much every system out there is in danger of being hacked off either by machines or by humans. Numly Numbers, SIPs, even your fingerprints can all be hacked off. Right now some sploggers are stealing only the first few sentences of an entry, even if the full feed is available, others ignore everything inside special CSS structures and still others remove ALL code from the post.

There's no way to completely secure any fingerprint against being hacked off, either on accident or by intent. 

Also, all methods suffer from the same problem of maintaining large database. Whether you use SIPs, ESNs or tags, your database grows quickly and creates problems. One post a day for a year creates 365 entries in a database that have to be tracked. No small feat no matter the format.

Without some way to cull old and unneeded entries, even with RSS feeds, the process could get very taxing.

Finally, we might want to look into working with Feedburner. Not only do they track uses of a feed, including scrapers, but their FeedFlare service has an interesting API that might be useful.

Personally though, I don't think that there is any one right method for handling this. All methods have advantages and disadvantages but doubling up risks doubling the burden.

I am excited about some companies that I'm hearing about that are using new methods to protect RSS feeds and posts. But their services are probably weeks, if not months away.

In the meantime, there's no easy route to protection and, though I still love the ideas you present, especially the one you mentioned about adding an SIP box, we have to realize that total protection isn't possible and all systems will have flaws.</description>
		<content:encoded><![CDATA[<p>First off, it&#8217;s pretty trivial to put ESNs into your RSS feed. While it requires a little bit of extra effort to edit the Wordpress RSS feed, you have to change permissions on the RSS template file and may have to edit it by hand, it can be done with little knowledge.</p>
<p>All you have to do is input the exact same Numly template tag you used for your regular template and place it within the content section of the post. You will probably have to edit the plugin to remove the DIV element (so it will validate) and you can then modify it so that it blends in better. You could theoretically, change the number itself to text and just link to the validation page, for example.</p>
<p>The Numly plugin is pretty easy to do these kinds of edits on, well made in that regard especially. I&#8217;m the worst PHP programmer out there but it only took me a few minutes to switch to the new Numly server when the change happened.</p>
<p>Regardless, the problems you list aren&#8217;t unique to any digital fingerprinting system, but to all digital fingerprints. Pretty much every system out there is in danger of being hacked off either by machines or by humans. Numly Numbers, SIPs, even your fingerprints can all be hacked off. Right now some sploggers are stealing only the first few sentences of an entry, even if the full feed is available, others ignore everything inside special CSS structures and still others remove ALL code from the post.</p>
<p>There&#8217;s no way to completely secure any fingerprint against being hacked off, either on accident or by intent. </p>
<p>Also, all methods suffer from the same problem of maintaining large database. Whether you use SIPs, ESNs or tags, your database grows quickly and creates problems. One post a day for a year creates 365 entries in a database that have to be tracked. No small feat no matter the format.</p>
<p>Without some way to cull old and unneeded entries, even with RSS feeds, the process could get very taxing.</p>
<p>Finally, we might want to look into working with Feedburner. Not only do they track uses of a feed, including scrapers, but their FeedFlare service has an interesting API that might be useful.</p>
<p>Personally though, I don&#8217;t think that there is any one right method for handling this. All methods have advantages and disadvantages but doubling up risks doubling the burden.</p>
<p>I am excited about some companies that I&#8217;m hearing about that are using new methods to protect RSS feeds and posts. But their services are probably weeks, if not months away.</p>
<p>In the meantime, there&#8217;s no easy route to protection and, though I still love the ideas you present, especially the one you mentioned about adding an SIP box, we have to realize that total protection isn&#8217;t possible and all systems will have flaws.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: deepthought</title>
		<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4122</link>
		<dc:creator>deepthought</dc:creator>
		<pubDate>Tue, 02 May 2006 04:46:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4122</guid>
		<description>Great comments and thoughts thus far.  Firstly, I don't want to reinvent the wheel.  If something exists out there that can work -- I'd choose that.  While I really like numly numbers, the current implementation (at least as I have it configured) doesn't have the number as part of the content and the numbers aren't part of the RSS feed.  Again, this could be the way I have it set up.  

Irregardless, a numly number looks out of place.  Any human content scraper will easily be able to determine that they probably shouldn't copy it too.  

Feed copyright plugins that add custom text sound like a good idea.  My only concern is that if your fingerprint is only in your feed, that doesn't help you find the sourcecode scraper or a cut and paster.

Using a statistically improbably phrase is a good idea.  However, as Jonathen points out it requires a SIP for each piece of content you make.  Managing your SIP database and searching for each one on any kind of regular basis will be a challenging endeavour.

Invisible text &lt;em&gt;is&lt;/em&gt; generally considered bad form (and for good reason).  However, there is precedence for having invisible text.  Look at the source code for therapistfinder.net (thats 'therapist finder', not 'the rapist finder' as I first read it).  Invisible text is used for speech readers and accessibility (see &lt;a href="http://www.seologic.com/faq/hidden-text.php" rel="nofollow"&gt;seologic&lt;/a&gt; for an explanation).

There would be no violation of the google TOS if the end users sign up for an API key.  See &lt;a href="http://www.benhammersley.com/projects/google_to_rss_using_soap_api.html" rel="nofollow" &gt;Google Search to RSS using SOAP API&lt;/a&gt; for an example script.

Also, google blog search already has rss feeds for search.  Here is a
Google Blog Search RSS for '&lt;a href="http://blogsearch.google.com/blogsearch_feeds?hl=en&#038;q=digital+fingerprint&#038;ie=utf-8&#038;num=10&#038;output=atom"&gt;digital fingerprint'&lt;/a&gt;.  This could be good for finding bloggers at least.

You all raise valid issues and so far, it seems you guys like the idea in principle.  Going from idea to the 'right' idea to code to practice is a tricky undertaking.  Many hands make light work. :)

Another idea just occurred: how about additional textarea inserted in the wordpress post editing screen where you can enter a SIP or phrase from your content.  On publish, a link (visible only to the admin and similar to the 'edit' links used on many themes) to the google search for the SIP is created.  Now, anytime the author wants to check if the post may have been plagiarised, they just click the SIP link.  Hrmmmm...</description>
		<content:encoded><![CDATA[<p>Great comments and thoughts thus far.  Firstly, I don&#8217;t want to reinvent the wheel.  If something exists out there that can work &#8212; I&#8217;d choose that.  While I really like numly numbers, the current implementation (at least as I have it configured) doesn&#8217;t have the number as part of the content and the numbers aren&#8217;t part of the RSS feed.  Again, this could be the way I have it set up.  </p>
<p>Irregardless, a numly number looks out of place.  Any human content scraper will easily be able to determine that they probably shouldn&#8217;t copy it too.  </p>
<p>Feed copyright plugins that add custom text sound like a good idea.  My only concern is that if your fingerprint is only in your feed, that doesn&#8217;t help you find the sourcecode scraper or a cut and paster.</p>
<p>Using a statistically improbably phrase is a good idea.  However, as Jonathen points out it requires a SIP for each piece of content you make.  Managing your SIP database and searching for each one on any kind of regular basis will be a challenging endeavour.</p>
<p>Invisible text <em>is</em> generally considered bad form (and for good reason).  However, there is precedence for having invisible text.  Look at the source code for therapistfinder.net (thats &#8216;therapist finder&#8217;, not &#8216;the rapist finder&#8217; as I first read it).  Invisible text is used for speech readers and accessibility (see <a href="http://www.seologic.com/faq/hidden-text.php"  rel="nofollow">seologic</a> for an explanation).</p>
<p>There would be no violation of the google TOS if the end users sign up for an API key.  See <a href="http://www.benhammersley.com/projects/google_to_rss_using_soap_api.html"  rel="nofollow" >Google Search to RSS using SOAP API</a> for an example script.</p>
<p>Also, google blog search already has rss feeds for search.  Here is a<br />
Google Blog Search RSS for &#8216;<a href="http://blogsearch.google.com/blogsearch_feeds?hl=en&#038;q=digital+fingerprint&#038;ie=utf-8&#038;num=10&#038;output=atom" >digital fingerprint&#8217;</a>.  This could be good for finding bloggers at least.</p>
<p>You all raise valid issues and so far, it seems you guys like the idea in principle.  Going from idea to the &#8216;right&#8217; idea to code to practice is a tricky undertaking.  Many hands make light work. <img src='http://www.maxpower.ca/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Another idea just occurred: how about additional textarea inserted in the wordpress post editing screen where you can enter a SIP or phrase from your content.  On publish, a link (visible only to the admin and similar to the &#8216;edit&#8217; links used on many themes) to the google search for the SIP is created.  Now, anytime the author wants to check if the post may have been plagiarised, they just click the SIP link.  Hrmmmm&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Bailey</title>
		<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4111</link>
		<dc:creator>Jonathan Bailey</dc:creator>
		<pubDate>Tue, 02 May 2006 02:22:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4111</guid>
		<description>Ok, Ill take a few cracks at this. 

First, the idea seems fairly sound. I've seen similar ideas discussed elsewhere and generally fallen behind it. However, there are a few things to consider before launching into it.

For starters, you don't want to reinvent the wheel. Numly Numbers already provide digital fingerprinting. Each ESN will be unique to that entry and, since you already use the Numly plugin, it's trivial to add the numbers to your feed.

Second, there are already feed copyright plugins that append unique information to each entry in the feed. You can customize this to your liking, adding any kind of strange term you want.

Third, I have to wonder how much is gained versus just using statistically improbable phrases from the work itself. While having a fingerprint might simplify the process, if the fingerprint is omitted, the simplicity is for naught. 

Fourth, using an invisible CSS could injure you in several ways. First, it could enable scrapers to skip it as many do now look at the page they're scraping. Second, it wouldn't inhibit copy and paste plagiarism. Finally, search engines tend to penalize sites that hide text, any text. How much of a penalty I can't say, but I've seen it happen before.

On that note, the idea has a lot of merit. I do have to warn though that, if you're automatically generating Google searches, you might want to make sure you're not violating the TOS. Several have gotten in trouble for that.

Personally, I like the idea and would be interested in playing around with this plugin. 

Hope that you are well!</description>
		<content:encoded><![CDATA[<p>Ok, Ill take a few cracks at this. </p>
<p>First, the idea seems fairly sound. I&#8217;ve seen similar ideas discussed elsewhere and generally fallen behind it. However, there are a few things to consider before launching into it.</p>
<p>For starters, you don&#8217;t want to reinvent the wheel. Numly Numbers already provide digital fingerprinting. Each ESN will be unique to that entry and, since you already use the Numly plugin, it&#8217;s trivial to add the numbers to your feed.</p>
<p>Second, there are already feed copyright plugins that append unique information to each entry in the feed. You can customize this to your liking, adding any kind of strange term you want.</p>
<p>Third, I have to wonder how much is gained versus just using statistically improbable phrases from the work itself. While having a fingerprint might simplify the process, if the fingerprint is omitted, the simplicity is for naught. </p>
<p>Fourth, using an invisible CSS could injure you in several ways. First, it could enable scrapers to skip it as many do now look at the page they&#8217;re scraping. Second, it wouldn&#8217;t inhibit copy and paste plagiarism. Finally, search engines tend to penalize sites that hide text, any text. How much of a penalty I can&#8217;t say, but I&#8217;ve seen it happen before.</p>
<p>On that note, the idea has a lot of merit. I do have to warn though that, if you&#8217;re automatically generating Google searches, you might want to make sure you&#8217;re not violating the TOS. Several have gotten in trouble for that.</p>
<p>Personally, I like the idea and would be interested in playing around with this plugin. </p>
<p>Hope that you are well!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Matthieu</title>
		<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4108</link>
		<dc:creator>Chris Matthieu</dc:creator>
		<pubDate>Tue, 02 May 2006 01:23:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4108</guid>
		<description>I, too, am a huge believer in digital fingerprints!  Numly allows authors to fingerprint their uploaded works such as images, music, documents, etc.  We use this fingerprint to associate the work with the author.  This way, an orphaned work can be reassociated with the author and anyone wishing to use an image that they found on the web can upload it to Numly and determine if has an associated Numly Number thus an author/artists that can be contacted.

I like the idea of digitally fingerprinting content in blogs and rss feeds as well.  This is a similar idea to the micro id concept.  We would be willing to help in anyway that we could.  Feel free to contact me at chris at numly.com.</description>
		<content:encoded><![CDATA[<p>I, too, am a huge believer in digital fingerprints!  Numly allows authors to fingerprint their uploaded works such as images, music, documents, etc.  We use this fingerprint to associate the work with the author.  This way, an orphaned work can be reassociated with the author and anyone wishing to use an image that they found on the web can upload it to Numly and determine if has an associated Numly Number thus an author/artists that can be contacted.</p>
<p>I like the idea of digitally fingerprinting content in blogs and rss feeds as well.  This is a similar idea to the micro id concept.  We would be willing to help in anyway that we could.  Feel free to contact me at chris at numly.com.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Matt</title>
		<link>http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4105</link>
		<dc:creator>Matt</dc:creator>
		<pubDate>Tue, 02 May 2006 00:12:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.maxpower.ca/digital-fingerprints-for-content-would-this-help-against-plagiarism/2006/05/01/#comment-4105</guid>
		<description>I think something like that would be good.  While I have used content from other sites, I  only quote the article, link to the original and rarely do I steal an entire post from another blogger.</description>
		<content:encoded><![CDATA[<p>I think something like that would be good.  While I have used content from other sites, I  only quote the article, link to the original and rarely do I steal an entire post from another blogger.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
