Article Text

Download PDFPDF

Original research
Does knowledge have a half-life? An observational study analyzing the use of older citations in medical and scientific publications
  1. Natalie L.Y. Chow1,
  2. Natalie Tateishi2,
  3. Alexa Goldhar3,
  4. Rabia Zaheer4,
  5. Donald A. Redelmeier5,6,
  6. Amy H. Cheung7,8,
  7. Ayal Schaffer7,8,
  8. Mark Sinyor7,8
  1. 1Department of Anatomy and Cell Biology, Western University, London, Ontario, Canada
  2. 2Department of Microbiology and Immunology, Western University, London, Ontario, Canada
  3. 3Department of Biology, Queen's University, Kingston, Ontario, Canada
  4. 4Department of Education Services, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
  5. 5Department of Medicine, University of Toronto Faculty of Medicine, Toronto, Ontario, Canada
  6. 6Department of Evaluative Clinical Sciences, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
  7. 7Department of Psychiatry, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
  8. 8Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
  1. Correspondence to Dr Mark Sinyor; mark.sinyor{at}sunnybrook.ca

Abstract

Objectives In the process of scientific progress, prior evidence is both relied on and supplanted by new discoveries. We use the term ‘knowledge half-life’ to refer to the phenomenon in which older knowledge is discounted in favour of newer research. By quantifying the knowledge half-life, we sought to determine whether research published in more recent years is preferentially cited over older research in medical and scientific articles.

Design An observational study employing a directed, systematic search of current literature.

Data sources BMJ, PNAS, JAMA, NEJM, The Annals of Internal Medicine, The Lancet, Science and Nature were searched.

Eligibility criteria for selecting studies Eight high-impact medical and scientific journals were sampled examining original research articles from the first issue of every year over a 25-year span (1996–2020). The outcome of interest was the difference between the publication year of the article and references cited, termed ‘citation lag’.

Data extraction and synthesis Analysis of variance was used to identify significant differences in citation lag.

Results A total of 726 articles and 17 895 references were included with a mean citation lag of 7.5±8.4 years. Across all journals, >70% of references had been published within 10 years of the citing article. Approximately 15%–20% of referenced articles were 10–19 years old, and articles more than 20 years old were cited infrequently. Medical journals articles had references with significantly shorter citation lags compared with general science journals (p≤0.01). Articles published before 2009 had references with significantly shorter citation lags compared with those published in 2010–2020 (p<0.001).

Conclusions This study found evidence of a small increase in the citation of older research in medical and scientific literature over the past decade. This phenomenon deserves further characterisation and scrutiny to ensure that ‘old knowledge’ is not being lost.

  • health informatics
  • public health
  • health equity
  • statistics & research methods
  • medical education & training
  • qualitative research

Data availability statement

Data are available on reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

  • Strengths of this study include the quantification of citation lag in both medical and scientific journals using a large, randomised sample of published articles from high-impact journals.

  • The main limitation of this study is that this design did not allow us to investigate why researchers chose to use certain references but merely quantified how they used them.

  • This study also only examined a select group of top journal articles dating back to 1996, and therefore, we are unable to comment on whether findings would differ for earlier publications, other major journals, or lower impact medical and science journals.

Introduction

The scientific endeavour involves the additive pursuit of knowledge through both observation and experimentation. Through that process, novel evidence and findings add to and, sometimes, supplant prior work. However, much like fine wine, the age of a scientific article does not necessarily indicate a decrease in intrinsic value. The fact that a particular study is ‘old’ does not make its findings out-dated or irrelevant. For example, in the field of suicide prevention, means restriction is known to be among the most effective population-based strategies1; yet some of the strongest evidence for this notion is decades old.2 3 The degree to which the scientific literature may be biased against (or in favour) of older research findings has yet to be investigated.

Our research group took an interest in this question when a peer reviewer suggested that the senior investigator (MS) find a more recent publication to cite rather than a high-quality article published in 2005 which was described as ‘old’. This led to the question of whether scientific literature, even from the relatively recent past, was being discounted based on its age. We, therefore, sought to identify whether the process of creation and publication of peer-reviewed research may have a hidden bias against older knowledge, specifically by examining the age of references cited. If newer research is favoured over older research with a rationale that something recent is more valuable, then it would indicate the presence of a ‘knowledge half-life’. Though many have previously investigated citation networks and the probability of citation for research articles, we are unaware of any previous rigorous scientific quantification of the concept of a ‘knowledge half-life’.4–7

This study aimed to characterise and measure change over time in the use of older citations in high-impact general medical journals. General science journals were included as comparators to identify whether any findings were specific to medicine or reflect trends in science in general. The a priori hypotheses were (A) that older citations (≥10 and ≥20 years old at the time of an article’s publication) would be relatively uncommon (eg, represent fewer than 25% of all citations), (B) that the magnitude of this finding would be significantly larger in general medical compared with general science journals and (C) that references to older citations would decrease over time.

Methods

Data sources

The journals of interest in this study were five of the highest impact general medical journals.8 These include the British Medical Journal (BMJ), the Journal of the American Medical Association (JAMA), The Lancet, the Annals of Internal Medicine and the New England Journal of Medicine (NEJM). Data were also abstracted from three of the highest impact non-medical, general science comparator journals9: Nature, Science and the Proceedings of the National Academy of Sciences of the United States of America (PNAS).

Inclusion criteria

Full-length original research articles with primary data were eligible for inclusion in this study. Only references found in the introduction and discussion of each article were eligible for inclusion given this study’s emphasis on quantifying the use of older or newer scientific knowledge to contextualise a study and its findings. References in the methods section were excluded as these citations frequently relate to specific methodologies and statistical software used, the age of which was not the focus of this study’s research question. Case reports, research letters, meta-analyses and reviews were excluded.

Sampling strategy and data abstraction

The epoch of interest was the most recent 25-year span at the time of data collection (1996–2020). Articles were accessed through online journal archives. As it was not practically feasible to abstract data from every issue of every journal, a systematic sampling strategy was employed in which the first issue published in each journal in each calendar year was sampled. If there were more than five articles in the selected issue, only the first five were abstracted. If the issue was sectioned, such as by discipline, five eligible articles were randomly selected using a random number generator. If no eligible articles were found in the first issue, articles from the next subsequent issue were used. If there were no eligible subsequent issues, then that year of the journal was excluded. There were three principal coders responsible for collecting data (NC, NT and AG) In order to ensure that data abstracted was consistent between the three individuals, a reliability test was performed demonstrating a satisfactory inter-rater agreement (kappa>0.8) for all variables.

For each reference, the difference between article publication year and reference year—heretofore referred to as the ‘citation lag’—was recorded. For example, the citation lag would be 5 for a reference published in 2015 that was cited in a 2020 paper.

Sex of the lead author was also abstracted, first based on name and, if ambiguous, an online search was performed for text or images to determine if sex could be identified. This information was gleaned to investigate whether male and female first authors differed in their reference to older citations.

Statistical analysis

A primary analysis of variance (ANOVA) test was performed to identify significant differences in citation lag by year. The independent and dependent variables used in the ANOVA test were year of article publication and mean citation lag, respectively. The first ANOVA test pooled together the data from all of the different journals. Additional secondary post hoc tests (independent samples tests alongside Levene’s test for equality of variances) were carried out, comparing the citation lag between classifications for three variables. Recent publications were classified as publications from 2010 to 2020, while remote publications were those published in and before 2009. The first test compared citation lag between medical and non-medical journal types. The second test compared citation lag between recent or remote publications. The third test compared citation lag between articles that had male lead authors, and those that had female lead authors. The outcome variable of the data was not normally distributed, however, we proceeded with a t-test and one-way ANOVA given the large sample size.

Patient and public involvement

None.

Results

A total of 726 articles were eligible for inclusion in the study with a total of 18 114 abstracted references. Data were missing or unavailable from 219 references (<2%) and thus 17 895 references were included in the statistical analysis.

Mean citation lag by year for each journal is presented in figure 1 and table 1. Overall, 72% of articles cited in scientific journals and 76% of articles cited in medical journals were published within the prior decade (figure 2). Overall, 20% and 18% of references cited in scientific and medical journals, respectively, were from 10 to 19 years prior, and <10% of referenced articles had mean citation lags of 20 years or more across all journals (figure 2). The overall trends by year for general medical and general science journals are shown in figure 3.

Table 1

Number and percentage of references cited across journals based on magnitude of citation lag (1996–2020)

Figure 1

Mean citation lag for articles published in medical and non-medical science journals from 1996 to 2020. (A) Citation lag for medical journals: the Journal of American Medical Association (JAMA), the New England Journal of Medicine (NEJM), the British Medical Journal (BMJ), Annals of Internal Medicine and The Lancet. (B) Citation lag for non-medical science journals: Science, PNAS, Nature.

Figure 2

Counts (A) and proportions (B) of references cited across journals based on magnitude of citation lag (1996–2020).

Figure 3

Mean citation lag for medical and non-medical science journals over time (1996–2020).

Overall, a small increasing trend in mean citation lag was present across the span of 25 years (a 0.055 increase in mean citation lag per year). There also appeared to be a slight difference in the trends of citation lag between journal type; medical journals demonstrated an overall citation lag increase of 0.122 per year, while non-medical science journals had a decreasing citation lag of 0.018 per year. Notably, there were several years with significantly shorter mean citation lags overall (p<0.05). The mean citation lag for articles published across all journals in 1997 was significantly shorter than that of 2000, 2008, 2012–2016 and 2018–2020. The mean citation lag for 1998 and 2002 were both significantly shorter than that of 2019. Lastly, the mean citation lag of 2006 was significantly shorter than that of 2000, 2008, 2012, 2014–2016 and 2018–2020.

General medical journals had a shorter mean citation lag of 7.15±8.03 years compared with 7.89±8.83 years for non-medical journals (p≤0.001, F=39.756). Across all journals, articles published before 2009 (remote) had significantly shorter mean citation lags (ie, inclusion of references to slightly more recent articles) than those published from 2010 to 2020 (p ≤0.001, F=62.240). Citation lag did not differ between articles with male or female lead authors (p=0.381, F=4.011).

Discussion

We believe this is the first study to investigate the effects of the phenomenon that we term a knowledge half-life. Our findings demonstrate that the overwhelming majority of references cited in both medical and scientific publications are published within the prior decade with very few citations of papers that are more than 20 years old. This is consistent with our first a priori hypothesis and indicates that a knowledge half-life exists and may be a substantial issue in the medical and scientific literature. Note that the concept of a knowledge half-life differs from other forms of selective citation such as citation bias, which occurs when journals selectively publish research findings with positive or favourable results.10 Also consistent with our second, a priori hypothesis, we did find some evidence that, in recent years, articles in general medical journals have tended to cite slightly more recent literature than non-medical science journals.

Contrary to our third a priori hypothesis, the knowledge half-life appeared to increase over time but not of a magnitude that appears practically meaningful. Outlier events may have contributed to ANOVA findings of specific years with shorter mean citation lags. An example would be if major discoveries prompted research in that year; lack of past research would make it hard to cite older articles. Additionally, lead author sex did not influence the vintage of the articles used as references. This is line with previous research showing that differences in gender of the participants involved in the process of peer review do not impact the publication of research articles.11

There are several potential reasons for these findings that are worth consideration. First, it is possible that some scientists favour the citation of more recent research due to new developments in rapidly changing fields that contradict past research or circumstances. Similarly, it is important to note that in fields like medicine, new technological advancements can make earlier observations obsolete– thus, current research is preferred in these fields. In contrast, fields such as physics that have always had advanced instrumentation tend to have a greater respect for older work and theories considered ‘building blocks’ of the field. As such, the role of technology within different areas of science is crucial for determining the recency of research articles that authors tend to cite.

Furthermore, authors may choose to cite the most recent meta-analyses, systematic reviews and/or scoping reviews, instead of referencing multiple older sources which would result in lower citation lags. Additionally, the literature on topics in science and medicine may be exponentially expanding such that there are simply a larger number of recent studies that are available for citation by researchers.

It seems beyond doubt that the above factors are at least partially responsible for the findings observed here and none should necessarily be a cause for concern. However, there are other contributors that should be of potential concern to the scientific community. First, industry sponsorship might play a role, particularly in medical research, as there may be bias in favour of referencing research on products, including medications, that are currently on-patent. Due to limitations in access to such information, sponsorship was not investigated in this study. The second and even more concerning potential contributor is that older research is just less accessible, less read or less known.12 Lack of access to certain vintage research is of great concern as researchers may miss some potential ‘Sleeping Beauty’ articles that are old, but nonetheless could guide and impact novel developments.13 Ironically, this was a limitation our research team encountered while collecting data for this study. One of the practical reasons that this study of knowledge half-life began with studies published in 1996 is that it was much more difficult to access earlier papers in some of the journals. It stands to reason that scientists would similarly choose to cite newer research if it is easier to access. This process could potentially explain our finding that more recent research is citing slightly older articles given that the epoch of digitally available publications has lengthened over time. Likewise, because the scientific literature is often overwhelmingly large, researchers may often be selective in what they read and may choose to ignore older papers. They may also have a bias that leads them to consider older research obsolete just like they might view an older smartphone. At least anecdotally, this would seem to be an issue. While working on this manuscript, the senior author received a review of another paper which read, in part, as follows: ‘I would include more up-to-date references, leaving out some that were published more than a decade ago’.

The conclusion remains that peer-reviewed research papers are not the same as smartphone technology, and such knowledge should be better retained. More research is certainly needed in this area, however, the scientific community ought to consider strategies to mitigate these potential problems to avoid losing valuable knowledge gathered in the past.

Limitations

The main limitation of this study is that this design did not allow us to investigate why researchers chose to use certain references but merely quantified how they used them. Previous research has shown the impact factor of an article’s original journal of publication is the best predictor for whether the article will be cited.14 Further research is needed to understand motivations behind how citations are selected, and similar studies should take into account the journal impact factors of references, as well as other known contributors to article citation, such as the reference articles’ research areas, placement in the journal, and population of study.15 Additionally, two of the comparator journals used (Science and Nature) tend to favour more recent and pressing developments; as such, authors may cite more recent research to make the paper look more up-to-date and important. This study also only examined a select group of top journal articles dating back to 1996, and therefore, we are unable to comment on whether findings would differ for earlier publications, other major journals, or lower impact medical and science journals. Lastly, as described in the methods section, publication years for a small proportion (<2%) of references could not be confirmed.

Conclusions

This is the first study to highlight the effects of knowledge half-life on the articles researchers choose to reference in their papers. It raises the question of whether important medical and scientific knowledge may be inadvertently lost over time. There are many potential reasons why knowledge half-life might exist, some of which are benign and some of which may pose a substantial risk of losing ‘old’ knowledge. Further research should be conducted on this topic to explore those potential issues and to identify any additional trends or influencing factors. Lastly, we hope that researchers will keep our findings in mind even after the half-life of this study.

Data availability statement

Data are available on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

As this study used secondary data already accessible to the public, a research ethics board approval was not required. Primary articles from high-impact medical journals were accessed through existing subscriptions held by the research team. Variables from primary articles were abstracted and recorded on to Excel spreadsheets for analysis. Researchers removed personal identifiers before recording, such as the author’s first and last name, to maintain anonymity.

References

Footnotes

  • Contributors All authors were involved in the conception, evaluation and critical review of this research, notably NC, NT, AG and MS. NC, NT and AG were responsible for data collection, conducting, reporting and writing the manuscript. NC and RZ collaborated on data analysis. MS, DR, AC and AS served as scientific advisors who aided the initial planning of the study and reviewed the study manuscript extensively. NC is the guarantor.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.