The new paper Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting (home page) shows how to fool Google’s academic ranking bots. The researchers invented a fake author (“Marco Alberto Pantani-Contador”) and created six fake “working papers” under his name with the least possible effort:

In a process that lasted less than a half day’s work, we draft a small text, copied and pasted some more from the EC3 research group’s website, included several graphs and figures, translated it automatically into English using Google Translate [!] and divided it into six documents.

EC3 (Evaluación de la Ciencia y de la Comunicación Científica) is the authors’ real research group. Each fake document referenced 129 of the group’s papers, for a total of 774 citations.

Afterwards, we created a simple webpage under the University of Granada domain including references to the false papers and linking to the full text, in order to let Google Scholar index the content. […] Google indexed these documents nearly a month after they were uploaded, on 12 May, 2012. At that time the members of the research group used as study case along with the three co-authors of this paper, received an alert from GS Citations pointing out that some MA Pantani-Contador had cited their Works. The citation explosion was thrilling, especially in the case of the youngest researchers where their citation rates were multiplied by six, notoriously increasing in size their profiles.

The authors’ h-index and especially i10 index likewise improved. So would the h-index of the journals in which the real papers had been published, if they were to consider the fake references. Moreover, two weeks after deleting the fake papers from the Internet, Google Scholar had not reverted its statistics to their earlier state.

Although Google Scholar is only meant to index and retrieve all kinds of academic material in its widest sense, the inclusion of GS Citations and GS Metrics, which are evaluating tools, must include the introduction of monitoring tools and the establishment more rigid criteria for indexing documents.

At the time of this experiment, Google evidently did not perform any such monitoring, nor did it allow researchers to flag fraudulent citations. As it stands, Google’s naive approach is inadequate for reliable scholarly bibliometrics.

2012-12-10: Co-author Nicolás Robinson-García notified me of EC3’s overview article on the subject, Google Scholar Metrics: an unreliable tool for assessing scientific journals (PDF). Even without deliberate manipulation, GSM’s automated analysis of journals and articles has serious shortcomings, “making its use inadvisable for assessment purposes.”

2014-11-08: Nature speaks with Anurag Acharya, co-creator of Google Scholar. Aside from some insights into how the indexing system works, Acharya states that Google Scholar does not make any money – it’s basically a pet project by a small group of ex-academics. Accordingly, don’t expect Google to put any resources into policing fake research. The users are supposed to do that themselves.

