Inter-rater reliability and concurrent validity of ROBINS-I: Protocol for a cross-sectional study

Maya M. Jeyaraman, Rasheda Rabbani, Nameer Al-Yousif, Reid C. Robson, Leslie Copstein, Jun Xia, Michelle Pollock, Samer Mansour, Mohammed T. Ansari, Andrea C. Tricco, Ahmed M. Abou-Setta

Research output: Journal PublicationArticlepeer-review

14 Citations (Scopus)


Background: The Cochrane Bias Methods Group recently developed the "Risk of Bias (ROB) in Non-randomized Studies of Interventions" (ROBINS-I) tool to assess ROB for non-randomized studies of interventions (NRSI). It is important to establish consistency in its application and interpretation across review teams. In addition, it is important to understand if specialized training and guidance will improve the reliability of the results of the assessments. Therefore, the objective of this cross-sectional study is to establish the inter-rater reliability (IRR), inter-consensus reliability (ICR), and concurrent validity of ROBINS-I. Furthermore, as this is a relatively new tool, it is important to understand the barriers to using this tool (e.g., time to conduct assessments and reach consensus-evaluator burden). Methods: Reviewers from four participating centers will appraise the ROB of a sample of NRSI publications using the ROBINS-I tool in two stages. For IRR and ICR, two pairs of reviewers will assess the ROB for each NRSI publication. In the first stage, reviewers will assess the ROB without any formal guidance. In the second stage, reviewers will be provided customized training and guidance. At each stage, each pair of reviewers will resolve conflicts and arrive at a consensus. To calculate the IRR and ICR, we will use Gwet's AC1 statistic. For concurrent validity, reviewers will appraise a sample of NRSI publications using both the New-castle Ottawa Scale (NOS) and ROBINS-I. We will analyze the concordance between the two tools for similar domains and for the overall judgments using Kendall's tau coefficient. To measure the evaluator burden, we will assess the time taken to apply the ROBINS-I (without and with guidance), and the NOS. To assess the impact of customized training and guidance on the evaluator burden, we will use the generalized linear models. We will use Microsoft Excel and SAS 9.4 to manage and analyze study data, respectively. Discussion: The quality of evidence from systematic reviews that include NRS depends partly on the study-level ROB assessments. The findings of this study will contribute to an improved understanding of the ROBINS-I tool and how best to use it.

Original languageEnglish
Article number12
JournalSystematic Reviews
Issue number1
Publication statusPublished - 13 Jan 2020


  • Concurrent validity
  • Cross-sectional study
  • Inter-consensus reliability
  • Inter-rater reliability
  • Non-randomized studies

ASJC Scopus subject areas

  • Medicine (miscellaneous)


Dive into the research topics of 'Inter-rater reliability and concurrent validity of ROBINS-I: Protocol for a cross-sectional study'. Together they form a unique fingerprint.

Cite this