<p>Re post #11 – one reason I would be  particularly concerned about false positives with plagiarism detecting software is the current prevalence of blogging, social media, and web sites with user-edited content, such as wikipedia.   For example, I sometimes edit wikipedia where I see factual errors in areas where I have some expertise; I’ve written a couple of books and it is easy enough for me to simply fix the wikipedia article by living the sentence or paragraph from one of my books that has the correct information.  </p>
<p>But if I were to submit some sort of academic paper that was partly a cut-and-paste job taken from my own previous writing on the subject… would my work be flagged as plagiarized from Wikipedia, because I made the mistake of lending my own work to that collaborative effort?  </p>
<p>Similarly – when my daughter was in high school living abroad, she kept a blog.  In that blog there were several posts that I thought would make good fodder for college admissions essays.  One in particular had an opening line that was particularly engaging; the post itself was revelatory, poignant and humorous – so I conveyed my feelings to my d.   Months later, when my d. was sitting with the Harry Baud book in hand, inspiration struck – and that blog post was revived.  She actually reworked and revised the initial idea considerably – but there were some phrases, including the opening sentence, that were simply too good to let go.   But even if she had not revised that blog post… it was her post – but for a time it was out there on the internet, subject to being spidered by Google and/or copied by others and reposted without attribution.  How would she have ever proved it was her own, even if given the chance?</p>
<p>Professionally, I have done a lot of web site content production, both with my own writings and in posting the properly attributed work of others on web sites  - and, as anyone can see, in doing my part to help keep the CC forums full of words.  Some of the content that I have written is under my real name – some, like the thousands of  CC postings, is done under internet pseudonyms, used to protect my family’s privacy.   I have also seen my work – and the copyrighted work of others that was posted on one of the web sites I manage – copied verbatim and reposted on many other web sites, often without attribution.   If and when I find it, I generally write to the owners of the other sites and request either removal of the text or proper attribution – but I don’t find everything.   There was a particular article written by someone I work with, that has been posted and reposted all over the internet for more than a dozen years – and this week I discovered it copied verbatim, without attribution, in a research journal publication (!)   </p>
<hr>
<p>Obviously, none of these observations apply to the situation where 30 separate essays seem to come from the same place.   If my daughter’s poignant blog post was disseminated around the globe and copied by 30 other students for use in their college essays, that would clearly have been plagiarism.  </p>
<p>I am just talking about the problem that arises in documenting the original source in a world in which youngsters are putting their words out for public dissemination from their early teens – and others are copying and reposting and recirculating.   </p>
<p>“Due process” and notice of the allegation of plagiarism would at least give an individual the opportunity to demonstrate the original source.  If someone copies stuff from my books, a large portion of which can be easily accessed via Google books – and then I reuse my own stuff – I can always go back to the original, copyrighted book and its date of publication to demonstrate the source of the work. </p>
<p>But I can see a situation where such proof of originality would not be so easy for a youthful blogger whose work is readily copied and reposted to other blogs, and who may very well write under a pseudonym.   </p>
<p>I myself often have great difficulty when I am writing and trying to source information to locate the original source – I’ve read it on one web site, but a Google search of a phrase turns up the same word-for-word content on other web sites – and there is no way to ascertain where it originated.</p>