Article Text

PDF

Outcomes of National Institutes of Health Peer Review of Clinical Grant Applications
  1. Theodore A. Kotchen,
  2. Teresa Lindquist,
  3. Anita Miller Sostek,
  4. Raymond Hoffmann,
  5. Karl Malik,
  6. Brent Stanfield
  1. From the Departments of Medicine (T.A.K.) and Biostatistics (R.H.), Medical College of Wisconsin, Milwaukee, WI; Center for Scientific Review (T.K., A.M.S., K.M., B.S.), National Institutes of Health, Bethesda, MD
  1. Address correspondence to: Dr. Theodore Kotchen, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226; e-mail: tkotchen{at}mcw.edu.

Abstract

Purpose We previously reported that National Institutes of Health (NIH) peer review outcomes in 2002 were slightly but significantly less favorable for grant applications for clinical research than for laboratory research. The present analysis was undertaken to determine if factors related to the review process might contribute to this difference.

Methods The impact of each of the following factors on median priority scores and funding rates for clinical and nonclinical R01 grant applications was evaluated: (1) the percentage of clinical applications assigned for review to a study section, (2) the requested direct costs, and (3) the clinical research experience of the reviewers.

Results Confirming our previous observation, in both 1994 and 2004, median priority scores and funding rates for R01 applications were less favorable for clinical research. In 1994, clinical applications did not fare as well in study sections reviewing relatively low percentages of clinical applications. This was not the case in 2004. Although requested direct costs were greater for clinical than for nonclinical R01 applications, median priority scores within each category were actually more favorable for applications requesting greater funding. Assignment of priority scores was not different for reviewers with or without experience conducting clinical research.

Conclusion These data do not support the hypothesis that the less favorable review outcomes for clinical applications are related to these review factors. We suggest that peer review outcomes for clinical research will benefit from the recent refinement of NIH review criteria, emphasizing the unique contributions of clinical investigation, and from increased training opportunities for clinical investigators.

Key Words
  • clinical research
  • grant applications
  • peer review
  • research funding

Statistics from Altmetric.com

Key Words

The imperative to translate discoveries in the basic sciences into the clinical arena is widely acknowledged. Nevertheless, there is continuing concern about the vitality of clinical research and the viability of the clinical investigator.1-7In response to this concern, the National Institutes of Health (NIH) recently took a number of steps to facilitate clinical research, such as expanding support of training for careers in clinical research, establishing debt relief programs for clinical investigators, and increasing support of the budget of the General Clinical Research Centers.8Soon after becoming director of the NIH in May 2002, Dr. Elias Zerhouni convened a series of meetings to chart a “roadmap” for medical research in the twenty-first century.9An important goal of the roadmap is to provide direction and support for the translation of basic science discoveries into the clinical arena, and re-engineering the clinical research enterprise is one of its underlying themes.

Nevertheless, there is a perception among clinical investigators that the NIH peer review process may not adequately serve their area of research. Based on data from two grant cycles in 2002, a recent study demonstrated that median priority scores and funding rates were slightly but significantly less favorable for clinical than for nonclinical grant applications.10The present analysis was undertaken to attempt to identify some of the possible reasons for the less favorable review outcomes for clinical applications. Based on 1994 data, Williams and colleagues reported that clinical grant applications do not fare as well in the review process when evaluated in study sections reviewing relatively few clinical applications.11This observation was the impetus for the recommendation by a previous NIH Director's Panel on Clinical Research that “patient-oriented grant applications…be evaluated by study sections in which at least half the grant applications involve patient-oriented research.”12In the present analysis, we further evaluated the relationship between review outcomes and the “density” of clinical applications assigned to a study section. We also sought to determine if reviewers with experience conducting clinical research score applications differently from reviewers who have not participated in clinical research. Further, because, in general, clinical research is more expensive than nonclinical research, we evaluated the relationship between review outcomes and the requested direct costs in grant applications. Although reviewers are advised that requested costs should not influence their assignment of a priority score, the concern has been expressed that reviewers may be biased against more costly applications.

METHODS

To determine if peer review outcomes for clinical applications differ in study sections that review either relatively low or relatively high percentages of clinical applications, Center for Scientific Review (CSR) study sections were divided into four groups based on the percentage of clinical applications that they were assigned to review: (1) # 25%, (2) 26 to 50%, (3) 51 to 75%, and (4) > 75%. Review outcomes by density group were compared for R01 applications reviewed in 2004, and a similar analysis was also carried out for applications reviewed by Division of Research Grants study sections a decade earlier. For both time periods, data sets included applications reviewed in study sections that satisfied the following criteria: (1) study sections were chartered or met on a recurring basis, (2) reviewed primarily R01 applications, (3) met for three consecutive rounds, and (4) did not review exclusively either clinical or nonclinical applications. The data include review outcomes from 88 study sections in 1994 and 97 study sections in 2004.

An application was defined as “clinical” if the applicant checked “yes” on page 1 of the grant application in response to a query about involvement of human subjects. The only exception to this were those applications assigned a human subjects code, but with Exemption Code 4 (E4). These applications were considered nonclinical. The instructions for grant applicants (PHS form 398) define the E4 exemption as follows: “Research involving the collection or study of existing data, documents, records, pathological specimens, or diagnostic specimens, if these sources are publicly available or if the information is recorded by the investigator in such a manner that subjects cannot be identified… The applicants' coded designation of inclusion of human subjects and the appropriateness of the E4 exemption are verified at the time of review by the scientific review administrator and study section, and codes deemed to be incorrect are changed.

In an additional analysis, the relationship between requested direct costs and review outcomes was analyzed for all R01 applications submitted to the NIH for one grant cycle in 2003 (October Council). Clinical and nonclinical applications were considered separately.

Peer review outcomes included median priority scores and funding rates. Although funding decisions are made by the specific Institutes and Centers at the NIH and not by study sections, funding is generally closely related to the outcome of peer review. Priority scores ranged from 100 (most favorable) to 500 (least favorable). Unscored applications are those that are considered to be in the lower 50% in the initial review and are not discussed by the review group. In determining a median priority score, unscored applications were assigned a score of 501. To standardize scores among review groups, priority scores for R01 applications were converted to percentile rankings, based on priority scores assigned to applications (including unscored applications) reviewed during the current plus the two previous review rounds. Data of applications scoring within the 20th percentile are presented. These applications are considered to be in the “likely to be funded” range. The percentages of unscored applications, scoring within the 20th percentile, and funded applications were compared using the Yates corrected chi-square test. Median priority scores were compared using the Mann-Whitney U test. In all statistical tests, the threshold for significance was set at p # .05.

A separate analysis was undertaken with applications submitted for the October 2004 Council round to determine if study section members with clinical research experience evaluate clinical and nonclinical R01 applications differently from reviewers with no clinical research experience. To evaluate the interaction between reviewers and the type of grant applications, we selected study sections that reviewed both clinical and nonclinical applications and excluded study sections that reviewed only small percentages of either type of application. Review groups selected for this analysis included the 30 CSR study sections whose assignments in the May 2004 Council round included 25 to 75% clinical applications. The designation of reviewers with clinical research experience was determined by two independent observers on the basis of the reviewers' grant funding history and publication records derived from one or more of the following sources: the NIH biographical sketch included with a recent grant application, curriculum vitae, personal or institutional Web site, or PubMed. An investigator with clinical research experience was defined as an individual who has had grant funding (either as principal investigator or coinvestigator) and/or has contributed to a peer-reviewed publication for a research study in which the main focus involved interaction with living human subjects. Two exceptions to requiring direct interaction with human subjects included investigators with a grant and/or publication history for the following: (1) health services research that depends on review of databases and (2) epidemiologic research based on review of databases to describe the possible causes, incidence, prevalence, and characteristics of a disease. Use of deidentified human tissues or cells was not considered clinical research. The initial concurrence rate between the two coders for the determination of whether a reviewer did or did not have clinical research experience was 83%. Each of the two coders classified 35% of the reviewers as having clinical research experience. Generally, a lack of concurrence was related to insufficient information, and in those instances, additional information was sought by a more intensive review of grant applications and/or publications.

To standardize priority scores across the 30 study sections, each reviewer's priority score for each grant application was converted to a z score, based on study section-specific means and standard deviations. This resulted in 22,028 individual z scores. These z scores were then ranked across study sections by converting them into percentiles, using a table of the normal distribution. These model-based percentiles are to be distinguished from the usual NIH percentiling for each study section, which is based on the study section's previous voting record. Average z scores and corresponding percentiles were compared among the following four groups: clinical applications, clinical reviewers; clinical applications, nonclinical reviewers; nonclinical applications, clinical reviewers; and nonclinical applications, nonclinical reviewers. A two-way analysis of variance was used to estimate and test the effects of human subject inclusion, the reviewer's clinical research experience, and their interaction.13The corresponding confidence intervals include an adjustment to the standard error for the variability of multiple reviewers scoring the same application and multiple applications being scored by the same reviewers. SAS, version 9.1 (SAS Institute, Cary, NC), proc mixed with the restricted maximum likelihood fitting method was used for estimation of the parameters and their variances. Goodness of fit was evaluated with the Akaike Information Criterion.13Residual analysis was used to test the adequacy of the assumption of the normal distribution for each study section.

RESULTS

Impact of “Density” of Clinical Applications within a Study Section on Review Outcomes

Reflecting a change in NIH guidelines to study sections introduced in 1995, higher percentages of applications were “unscored” in 2004 than in 1994. Nevertheless, in both 1994 and 2004, median priority scores were less favorable (p < .001) for clinical than for nonclinical R01 applications, and smaller percentages (p < .001) of clinical applications scored within the 20th percentile and were funded than nonclinical applications (Table 1). During each of the two time periods, the differences between review outcomes, comparing clinical with nonclinical applications, were similar.

Table 1

Outcome Measures for Clinical and Nonclinical R01 Grant Applications in 1994 and in 2004

Between 1994 and 2004, the percentage of clinical R01 applications that were reviewed in study sections whose overall assignment included # 25% clinical applications decreased from 25.2 to 15.8%, whereas the percentage of clinical R01 applications reviewed in study sections whose overall assignment included > 75% clinical applications increased from 21.4 to 55.8%. In 1994, review outcomes (median priority scores, percentage of applications within the 20th percentile, and percentage of applications funded) for clinical applications were significantly less favorable than for nonclinical applications when reviewed in those study sections whose review assignments included fewer than 50% clinical applications (Table 2). However, in 1994, review outcomes for clinical and nonclinical applications did not differ in study sections reviewing more than 75% clinical applications. In contrast, in 2004, review outcomes were significantly less favorable for clinical applications in each of the four clinical density groups of study sections. Further, in 2004, within each density group, the magnitude of the differences between clinical and nonclinical applications was similar.

Table 2

Outcome Measures for Clinical and Nonclinical R01 Applications by Percentage of Clinical Applications Assigned to Study Sections in 1994 and in 2004

Impact of Requested Funds on Review Outcomes

The relationship between requested support and priority scores was evaluated for all R01 applications submitted to the NIH that were considered in the October 2003 Council round. Comparing clinical versus nonclinical applications, average amounts of requests for first-year direct costs ($321,584 vs $241,130) and total direct costs ($1,437,073 vs $1,101,826) were greater for clinical applications (p < .001). For both clinical and nonclinical applications, median priority scores were less favorable (p < .001) and funding rates tended to be lower for those applications requesting less than $250,000 in annual direct costs than for applications requesting $250,000 to $500,000 (Table 3). Overall, total requested direct costs tended to be inversely correlated with priority scores for both clinical (r = 2.09, p < .01) and nonclinical (r = 2.14, p < .01) applications; that is, higher requested costs were associated with more favorable priority scores.

Table 3

Outcome Measures for Clinical and Nonclinical R01 Applications by Requested Average Annual Direct Costs (October 2003 Council)

Impact of Reviewers' Clinical Research Experience on Review Outcomes

This analysis included 876 reviewers serving on 1 of 30 study sections. These study sections reviewed 1,469 applications, 39% of which were clinical (range 15.3-79.5%). The median priority scores for the clinical and nonclinical applications included in this analysis were 246.0 and 239.0, respectively—17.8% of the clinical applications and 23.7% of the nonclinical applications scored within the 20th percentile (p < .01). Thirty-five percent of the reviewers were determined to have experience conducting clinical research (range 2.9-79.4% within study sections). Further, 39.6% of the reviewers had the MD degree; 52.7% of MD reviewers and 23.8% of non-MD reviewers were determined to have clinical research experience. The percentage of clinical applications reviewed by each of the 30 study sections was positively correlated with the percentage of reviewers with clinical research experience (r = .67, p < .001).

Table 4 lists the number of individual z scores for each of the four study groups (clinical applications, clinical reviewers; clinical applications, nonclinical reviewers; nonclinical applications, clinical reviewers; and nonclinical applications, nonclinical reviewers) and the mean z scores and model-derived percentiles for each group. Mean z scores and mean percentiles were virtually identical for applications evaluated by reviewers with or without clinical research experience (p = .60), whereas these outcomes were considerably less favorable for applications involving human subjects (p = .0006). The estimated interaction between the effect of human subjects applications and the reviewers' clinical research experience was also not significant (p = .69). Table 5 demonstrates the magnitude of the “human subjects effect” within potentially fundable percentile ranges. The percentile ranges are not appreciably altered by the clinical research experience of the reviewers. However, the model predicts that percentile ranges for clinical applications are shifted to less favorable ranges, and the magnitude of this shift is not affected by inclusion of reviewers' clinical research experience into the model.

Table 4

Impact of Inclusion of Human Subjects in Grant Applications and Clinical Research Experience of Reviewers on Mean z Scores and Model-Based Mean Percentiles

Table 5

Predictions of Shifts in Model-Based Percentile Ranges Attributed to Inclusion of Human Subjects in Grant Applications and to Clinical Research Experience of Reviewers

DISCUSSION

In both 1994 and 2004, peer review outcomes for clinical research applications were less favorable than for applications not involving human subjects. Noteworthy is the similarity of the magnitude of the differences in review outcomes between clinical and nonclinical applications in both 1994 and 2004. These differences are somewhat greater than we have previously reported, based on 2002 data, perhaps because the definition of clinical research has been modified to exclude applications in which use of deidentified tissues or databases is the only involvement of human subjects. This revised definition of clinical research continues to encompass studies of mechanisms of disease, clinical trials or interventions, evaluation of new technologies, behavioral research, and epidemiologic studies. This inclusive definition of clinical research is consistent with that recommended by a previous NIH Director's Panel on Clinical Research12and with current NIH guidelines for defining clinical research.

Based on 1994 data, review outcomes for clinical applications were less favorable than outcomes for nonclinical applications when reviewed in study sections whose review assignments included fewer than 50% clinical applications but not in study sections reviewing predominantly clinical applications. This is consistent with an earlier observation that clinical applications do not fare as well in “low-density” study sections.11Consequently, one goal of a recent CSR study section reorganization has been to avoid assignment of clinical applications to study sections whose review assignments include fewer than 25% clinical applications, and in 2004, lower percentages of clinical applications were reviewed in low-density study sections than in 1994. Overall, considering all study sections combined, the differences in review outcomes and funding rates between clinical and nonclinical application were similar in 1994 and 2004. However, in contrast to 1994, in 2004, review outcomes were less favorable for clinical applications, even in study sections reviewing greater than 75% clinical applications. Although potentially of concern, the emergence of this gap between clinical and nonclinical applications in 2004 is related, at least in part, to the greater percentage of clinical applications reviewed in the high-density study section in 2004. By definition, only 20% of applications are included within the upper 20th percentile, and in 2004, 19.3% of clinical applications in the study section reviewing > 75% clinical applications were scored within the upper 20th percentile.

Our data suggest that the greater cost of clinical projects also does not account for their less favorable review outcomes. For both clinical and nonclinical projects, review outcomes were actually more favorable for applications requesting higher costs, possibly because applications requesting higher costs may have been submitted by more experienced investigators.

Review outcomes for both clinical and nonclinical applications were not influenced by the reviewers' experience, or lack of experience, in conducting clinical research. Further, comparison of review outcomes of clinical applications in study sections with either fewer or more than 25% MD reviewers also revealed no differences in median priority scores or percentage of applications scoring within the 20th percentile (data not shown). Consistent with the definition of clinical research that was applied to grant applications, a similarly inclusive definition was also applied to the clinical research experience of the reviewers. Approximately half of the MD reviewers were deemed not to have clinical research experience, and of those reviewers with clinical research experience, almost 25% were non-MDs (psychologists, epidemiologists, statisticians, dentists, and individuals with PhD degrees in a number of other disciplines).

Another review factor that might have contributed to the less favorable outcomes for clinical projects are the criteria for evaluating grant applications. Approximately 45 years ago, Ernest Allen, then chief of the Division of Research Grants (the predecessor of the CSR) at the NIH, described shortcomings in unsuccessful grant applications.14Among others, these included the research problem, the approach, and the experience and competence of the investigator. Over time, review criteria have been refined and clearly articulated (significance, approach, innovation, investigators, environment) to assist both applicants and peer reviewers. As of January 2005, these criteria have been refined for investigator-initiated grant applications to better accommodate interdisciplinary, translational, and clinical projects.15Adoption of these revisions by study sections may have a favorable impact on future review outcomes for clinical grant applications.

Other potential explanations for the discrepancy in review outcomes for clinical and nonclinical grant applications may be related to the complexity and limitations of conducting clinical research. For example, a previous study reported that human subject concerns raised at the time of review contribute to but do not totally explain the less favorable review outcomes for clinical applications.10Providing research training opportunities for clinical investigators will better prepare them to submit competitive grant applications. In 1999, the NIH introduced the Clinical Research Curriculum Award (K30) for academic institutions to develop and implement a clinical research curriculum. These are generally 2-year programs, and, on average, they produce a total of approximately 300 clinician-investigator graduates each year. Support for other career development awards for clinical scientists has also increased over the past 5 years, including the Mentored Clinical Scientist Development Award (K08), the Mentored Clinical Scientist Development Program Award (K12), the Mentored Patient-Oriented Research Career Development Award (K23), and the Midcareer Investigator Award in Patient-Oriented Research (K24). Perhaps related to the expansion of these training programs, in 2003, there was a substantial increase over the previous 6 years in the percentage of grant applications to the NIH submitted by physicians and by the overall percentage of NIH awards to physicians (Table 6).

Table 6

Grant Applications to the National Institutes of Health Submitted by and Awarded to MDs

It was not the intent of this analysis to provide a comprehensive overview of NIH support of clinical research. The focus of this study is review outcomes of R01 grant applications. When making funding decisions about grant applications, NIH institutes may not rely exclusively on priority scores. Funding rates also depend on the budgets and priorities of the individual institutes. In addition, review outcomes for other mechanisms used to support clinical research were not included in this analysis, for example, contracts (a mechanism often used to fund large multicenter clinical trials) and General Clinical Research Center support.

In summary, peer review outcomes for clinical grant applications to the NIH continue to be less favorable than review outcomes for nonclinical applications. The results of this retrospective analysis, based on 2003-2004 data, suggest that this discrepancy is not accounted for by the “density” of clinical applications reviewed in a study section, by the reviewers' experience in conducting clinical research, or by the greater cost of conducting clinical research. CSR will continue to evaluate the appropriateness of the assignment of grant applications to specific review groups and will track review outcomes of clinical grant applications on an ongoing basis. Recent modifications in the review criteria to accommodate the contributions of clinical research and increased funding opportunities for clinical research training represent strategies and opportunities for improving peer review outcomes for clinical grant applications.

ACKNOWLEDGMENTS

We are grateful to Michelle Arku, Chuck Dumais, Mary Elizabeth Mason, Karen Oden, Holley Sullivan, and Pam Sullivan for their invaluable assistance in providing data for this study. We also thank the 28 scientific review administrators who provided the information that allowed us to link individual reviewer's priority scores with each grant application while maintaining the anonymity of the reviewers.

References

View Abstract

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.