Thoughts on plagiarism and the case against Claudine Gay

Plagiarism has been all over the news since the end of 2023 when Claudine Gay, President of Harvard University, was accused of plagiarizing parts of her 1998 doctoral dissertation, a dissertation that had previously won her a Harvard award for the best essay or dissertation related to political science [1]. These accusations were followed by further allegations of plagiarism in journal articles she published [2]. Ultimately, the accusations, combined with some earlier botched testimony in front of Congress related to anti-Semitism, forced her resignation in early January 2024 [3]. I was outraged when I first heard of the accusations, which I think was the point of the original stories. But then later, after learning the source of the stories, one being the Washington Freedom Beacon (WFB), a conservative newspaper akin to Breitbart, and the other being the New York Post, a right-leaning newspaper owned by Rupert Murdoch, I thought maybe this was part of a concerted campaign to discredit Harvard’s first Black woman president.

I then decided to look further into the specifics of the allegations and I have to admit that at face-value there are problems with how Gay cites her material and references in both her dissertation and publications. There are some instances where Gay’s material appears to be lifted without accreditation. There are also instances where Gay cites material almost verbatim from a reference, citing her source only at the end of the paragraph. Gay’s detractors claim she should have used quoted text in these instances. One of the examples reported by the WFB [2] is a paragraph in a report she wrote for the Public Policy Institute of America [4]. Gay wrote:

The Voting Rights Act of 1965 is often cited as one of the most significant pieces of civil rights legislation passed in our nation’s history”… The central parts of the measure are Sect. 2 and Sect. 5. Section 2 reiterates the guarantees of the 15th amendment, prohibiting any state or political subdivision from adopting voting practices that ‘deny or abridge the right of any citizen of the United States to vote on account of race or color.’ Sect. 5, imposed only on ‘covered’ jurisdictions with a history of past discrimination, requires Justice Department preclearance of changes in any electoral process or mechanism.

The WFB claims that Gay borrows several sentences from a 1999 book by David Canon [5] and, while making a few small changes to the text, does so without using quotation marks or citing Canon. Canon’s text is:

The VRA is often cited as one of the most significant pieces of civil rights legislation passed in our nation’s history (Days 1992; Parker 1990, 1)… The central parts of the VRA are Sect. 2 and Sect. 5. The former prohibits any state or political subdivision from imposing a voting practice that ‘will deny or abridge the right of any citizen of the United States to vote on account of race or color.’ The latter was imposed only on ‘covered’ jurisdictions with a history of past discrimination, which must submit changes in any electoral process or mechanism to the federal government for approval.

In Gay’s passage, I have highlighted text that appears in both that are not direct quotes of the language of the Voting Rights Act. You tell me, is this plagiarism? David Canon, the person from whom she supposedly plagiarized, says she did not plagiarize his text [2]. To say that the Voting Rights Act is “the most significant piece of legislation passed in our nation’s history” seems like everyday parlance today. A simple Google search quickly finds similar hyperbolic statements related to the VRA. Wikipedia cites in a History Channel documentary in 2009 that the VRA was “one of the most far-reaching pieces of civil rights legislation in U.S. history” [6]. That is not so different from what Canon and Gay wrote. This example is akin to my writing that “amino acids are the building blocks of proteins.” Do I need to cite this? Do I need to put this in quotations? Common knowledge and phrases should not need quotations. Nevertheless, this is but one example used by the WFB and NY Post, and there are many others like it with similar issues.

All of this got me thinking, ‘What exactly is plagiarism?’ and down the rabbit hole I went. My readings have led me to conclude that the field of plagiarism as a discipline is a hot mess. Plagiarism is one of those things that we think we know what it means, but when it comes down to it, it is difficult to define. Merriam-Webster’s dictionary defined plagiarize as “to steal and pass off (the ideas or words of another) as one’s own: use (another’s production) without crediting the source” or “to commit literary theft: present as new and original an idea or product derived from an existing source” [7]. There is no intent needed for plagiarism; simply doing it is enough.

While plagiarism is simple, it can be hard to define in practice. Certainly, copying parts or entire paragraphs of a source without attribution is plagiarism and is easy to say as such. But consider the concept of Mosaic or Paraphrasing Plagiarism [8], in which parts of a source are used and a “few words or phrases” are added or altered to modify it. Even though the source is credited, many still consider this plagiarism. Let me illustrate this with an example from the Harvard Guide to Using Sources [9]. In their example on Mosaic Plagiarism, they give an example of a student who plagiarizes a paper written by Persad et al. [10] (example 1):

In order to advocate for the use of medical television shows in the medical education system, it is also important to look at the current bioethical curriculum. In the more than 3500 h of training that students undergo in medical school, only about 60 h are focused on bioethics, health law, and health economics (Persad et al. 2008). It is also problematic that students receive this training before they actually have spent time treating patients in the clinical setting. Most of these hours are taught by instructors without current publications in the field.

This is one type of citation method used by Gay and criticized by the press, where the citation is at the end of the sentence. Harvard then presents how to correct this:

In order to advocate for the use of medical television shows in the medical education system, it is also important to look at the current bioethical curriculum. According to Persad et al. (2008), only about 1% of teaching time throughout the four years of medical school is spent on ethics. As the researchers argue, this presents a problem because the students are being taught about ethical issues before they have a chance to experience those issues themselves. They also note that more than 60% of instructors teaching bioethics to medical students have no recent publications in the subject.

Does this really improve matters? Has clarity been improved? In reading the "plagiarized text" is the writer trying to hide that they did not write the sentence? I recycle text from myself, am I supposed to say, “According to me…”? I disagree with the Harvard Guide to Using Sources and would argue that the original text by the student was acceptable. I am reasonably sure that many scientists are guilty of using this same type of citation method themselves. I also see other problems with the definition of paraphrasing and plagiarizing. What constitutes a “few words”? Two words? Three words? Whatever cutoff one chooses will be equally arbitrary. At what point is a sentence or paragraph sufficiently different from the original that it does not constitute plagiarism? Who makes that decision? A software program? A reviewer? A teacher? A right-wing activist? And therein lies the problem. If plagiarism is being used to kill or derail careers, such as in the case of Claudine Gay, plagiarism needs to be better defined. Its definition needs more quantitative rigor because right now, except for overt cases, plagiarism is in the eye of the beholder (Do I need to put that phrase in quotes? I don’t know).

There is disagreement among researchers and academicians on whether the evidence presented by the WFB and NY Post was sufficient to prove Gay plagiarized parts of her work. In my opinion, there is no smoking gun. Plagiarism, at its heart, is about stealing ideas. None of the examples accusing Gay of plagiarism were material to any of the central theses in her dissertation or article. By Merriam-Webster’s definition, and in my opinion, none of these rise to the level of plagiarism. They were all background information, such as provided in the introduction of a scientific paper. I am sure others would disagree with my assessment. And this is the problem – what is plagiarism cannot be agreed upon.

As editor of the journal, having published close to 100 papers and reviewed probably twice as many as that, I feel I have sufficient experience to form my own opinions on the subject. Here are some of my thoughts on plagiarism:

There are levels of plagiarism ranging from low-level to overt. The penalties must vary with the severity. Claudine Gay, in my opinion, had a series of low-level instances, none of any serious nature. The penalty she incurred far outweighed the crime.

I think that in the Introduction and Background sections of manuscripts, more leeway should be given regarding when to use quotations. Some have argued that three or four consecutive words are sufficient to require quotes. Balderdash. If we don’t allow latitude, the background section will soon become a swamp of quotation marks, disrupting the flow of the manuscript without offering anything significant to the paper. Proper attribution must be given if you find yourself using copy-paste from one article to yours. I believe it is acceptable to use an entire sentence, change a few words, and place the citation at the end of the sentence without resorting to quotation marks or stilted phrases like “According to the authors….”. Attribution must be given absolutely. In the background section, quotations are the solution of last resort.

For the discussion section, however, I believe quotations are needed because it is in this section where ideas are formulated and the authors put their research into context with prior research. It is necessary to distinguish whose ideas are whose and which ideas belong to the authors.

More allowance needs to be given to self-plagiarism, which, in my mind, is a nonsensical concept. Certainly, data-recycling or reusing data from prior publications is forbidden, but we are really concerned with stealing another’s intellectual property, i.e., ideas, words, etc. Whether you republish yourself is not in the spirit of what we are genuinely interested in—stealing from others. The idea that you can plagiarize your own ideas or sentences is not what people think when they think ‘plagiarism.’ Someone once said that you shouldn’t have to “torture” your words just to pass some software detection system [11]. Having said that, there may be copyright issues if you recycle previously published material from yourself, so authors should consider that before they do so. Guidelines have been published on when it is acceptable and not acceptable to self-copy [12]. In general, I agree with these.

Lastly, greater education about plagiarism must be given to other countries, where evidence suggests that it is more permissible in certain cultures [13, 14].

Every manuscript submitted to Springer journals, including this one, has an iThenticate® plagiarism report automatically generated for it (Fig. 1). At the top of every report is a similarity index (the percent of text matching content from an internal iThenticate database), followed by all instances of possible plagiarized text. Each journal’s editor needs to decide what level of similarity score is sufficiently high to warrant concern. At my daughter’s university, all submitted papers must have an iThenticate similarity score of less than 28%, which is an oddly specific number, almost as if they could not decide between 25% and 33% and then decided to split the difference. Maybe this arbitrary specificity is an attempt to give the chosen value an artificial air of credibility. From personal experience, I can tell you that the false positive rate of the similarity index is high. For example, from a recent submission to the journal, the statement “The first two authors contributed equally to this paper” was flagged as plagiarism. The statement “using physiologically based pharmacokinetic (PBPK) models. The GastroPlus™ (sic.) was used to develop the PBPK models, which were refined and validated with observed data” was flagged as plagiarism because of similarity to a paper by Rajoli et al. [15]:

“The manuscript describes the pharmacokinetics of tizoxanide, the active metabolite of nitazoxanide using a physiologically based pharmacokinetic (PBPK) model. The validated PBPK model was used to estimate the optimal doses required for SARS-CoV-2 treatment or prevention ”.

I bolded the iThenticate text in question. Neither of these examples is plagiarism. Today, hundreds of journals use software such as iThenticate to check for plagiarism in any submitted manuscript. Because of the ease with which they can be used, one could naively and wrongly conclude the authors to be plagiarists. I believe the simplistic reliance on these software programs without critical examination was the primary evidence for the accusations against Claudine Gay.

Plagiarism happens. On that, we can all agree. Unfortunately, the definitions employed are sufficiently vague that it is hard to prove plagiarism has occurred in all except the most extreme instances. Some schools have adopted a zero-tolerance policy that could result in students’ expulsion following plagiarism concerns. Perhaps we should thank the WFB and NY Post for bringing this issue to the forefront of research. I, for one, did not pay much attention to it before the Gay incident. But now, I can see how muddy the definitions are, how subjective the process is, how individual the interpretation is, and how arbitrary the penalties are. What is needed is for associations like the American Association for the Advancement of Science or the National Science Foundation to convene an expert panel to define better what is plagiarism, when inadequate text citations reach the level of plagiarism, what should the penalties for plagiarism be, when is it acceptable to self-plagiarize, etc. There are many unanswered questions that need to be addressed. Until more uniformity is reached regarding plagiarism, we risk companies, organizations, and individuals weaponizing our scientific integrity against us to further their own agendas rather than those of science.

Fig. 1figure 1

Portion of the iThenticate report for this commentary

留言 (0)

沒有登入
gif