Cochrane's risk of bias tool for non-randomized studies (ROBINS-I) is frequently misapplied: A methodological systematic review.
Journal of clinical epidemiology
OBJECTIVES:We aimed to review how 'Risk of Bias In Non-randomized Studies-of Interventions' (ROBINS-I), a Cochrane risk of bias assessment tool, has been used in recent systematic reviews. STUDY DESIGN AND SETTING:Database and citation searches were conducted in March 2020 to identify recently published reviews using ROBINS-I. Reported ROBINS-I assessments and data on how ROBINS-I was used were extracted from each review. Methodological quality of reviews was assessed using AMSTAR 2 ('A MeaSurement Tool to Assess systematic Reviews'). RESULTS:Of 181 hits, 124 reviews were included. Risk of bias was serious/critical in 54% of assessments on average, most commonly due to confounding. Quality of reviews was mostly low, and modifications and incorrect use of ROBINS-I were common, with 20% reviews modifying the rating scale, 20% understating overall risk of bias, and 19% including critical-risk of bias studies in evidence synthesis. Poorly conducted reviews were more likely to report low/moderate risk of bias (predicted probability 57% [95% CI: 47-67] in critically low-quality reviews, 31% [19-46] in high/moderate-quality reviews). CONCLUSION:Low-quality reviews frequently apply ROBINS-I incorrectly, and may thus inappropriately include or give too much weight to uncertain evidence. Readers should be aware that such problems can lead to incorrect conclusions in reviews.
10.1016/j.jclinepi.2021.08.022
Evaluation of the risk of bias in non-randomized studies of interventions (ROBINS-I) and the 'target experiment' concept in studies of exposures: Rationale and preliminary instrument development.
Environment international
Assessing the risk of bias (RoB) of individual studies is a critical part in determining the certainty of a body of evidence from non-randomized studies (NRS) that evaluate potential health effects due to environmental exposures. The recently released RoB in NRS of Interventions (ROBINS-I) instrument has undergone careful development for health interventions. Using the fundamental design of ROBINS-I, which includes evaluating RoB against an ideal target trial, we explored developing a version of the instrument to evaluate RoB in exposure studies. During three sequential rounds of assessment, two or three raters (evaluators) independently applied ROBINS-I to studies from two systematic reviews and one case-study protocol that evaluated the relationship between environmental exposures and health outcomes. Feedback from raters, methodologists, and topic-specific experts informed important modifications to tailor the instrument to exposure studies. We identified the following areas of distinction for the modified instrument: terminology, formulation of the ideal target randomized experiment, guidance for cross-sectional studies and exposure assessment (both quality of measurement method and concern for potential exposure misclassification), and evaluation of issues related to study sensitivity. Using the target experiment approach significantly impacts the process for how environmental and occupational health studies are considered in the Grading of Recommendations Assessment, Development and Evaluation (GRADE) evidence-synthesis framework.
10.1016/j.envint.2018.08.018
GRADE guidelines: 18. How ROBINS-I and other tools to assess risk of bias in nonrandomized studies should be used to rate the certainty of a body of evidence.
Journal of clinical epidemiology
OBJECTIVE:To provide guidance on how systematic review authors, guideline developers, and health technology assessment practitioners should approach the use of the risk of bias in nonrandomized studies of interventions (ROBINS-I) tool as a part of GRADE's certainty rating process. STUDY DESIGN AND SETTING:The study design and setting comprised iterative discussions, testing in systematic reviews, and presentation at GRADE working group meetings with feedback from the GRADE working group. RESULTS:We describe where to start the initial assessment of a body of evidence with the use of ROBINS-I and where one would anticipate the final rating would end up. The GRADE accounted for issues that mitigate concerns about confounding and selection bias by introducing the upgrading domains: large effects, dose-effect relations, and when plausible residual confounders or other biases increase certainty. They will need to be considered in an assessment of a body of evidence when using ROBINS-I. CONCLUSIONS:The use of ROBINS-I in GRADE assessments may allow for a better comparison of evidence from randomized controlled trials (RCTs) and nonrandomized studies (NRSs) because they are placed on a common metric for risk of bias. Challenges remain, including appropriate presentation of evidence from RCTs and NRSs for decision-making and how to optimally integrate RCTs and NRSs in an evidence assessment.
10.1016/j.jclinepi.2018.01.012
Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment.
BMJ evidence-based medicine
Large language models (LLMs) may facilitate and expedite systematic reviews, although the approach to integrate LLMs in the review process is unclear. This study evaluates GPT-4 agreement with human reviewers in assessing the risk of bias using the Risk Of Bias In Non-randomised Studies of Interventions (ROBINS-I) tool and proposes a framework for integrating LLMs into systematic reviews. The case study demonstrated that raw per cent agreement was the highest for the ROBINS-I domain of 'Classification of Intervention'. Kendall agreement coefficient was highest for the domains of 'Participant Selection', 'Missing Data' and 'Measurement of Outcomes', suggesting moderate agreement in these domains. Raw agreement about the overall risk of bias across domains was 61% (Kendall coefficient=0.35). The proposed framework for integrating LLMs into systematic reviews consists of four domains: rationale for LLM use, protocol (task definition, model selection, prompt engineering, data entry methods, human role and success metrics), execution (iterative revisions to the protocol) and reporting. We identify five basic task types relevant to systematic reviews: selection, extraction, judgement, analysis and narration. Considering the agreement level with a human reviewer in the case study, pairing artificial intelligence with an independent human reviewer remains required.
10.1136/bmjebm-2023-112597
The ROBINS-I and the NOS had similar reliability but differed in applicability: A random sampling observational studies of systematic reviews/meta-analysis.
Journal of evidence-based medicine
OBJECTIVE:There is a lack of evidence on the usage of the quality assessment tool-the Risk Of Bias In Nonrandomized Studies-of Interventions (ROBINS-I). This article aimed to measure the reliability, criterion validity, and feasibility of the ROBINS-I and the Newcastle-Ottawa Scale (NOS). METHODS:A sample of systematic reviews or meta-analyses of observational studies were selected from Medline (2013-2017) and assessed by two reviewers using ROBINS-I and the NOS. We reported on reliability in terms of the first-order agreement coefficient (AC1) statistic. Correlation coefficient statistic was used to explore the criterion validity of the ROBINS-I. We compared the feasibility of the ROBINS-I and NOS by recording the time to complete an assessment and the instances where assessing was difficult. RESULTS:Five systematic reviews containing 41 cohort studies were finally included. Interobserver agreement on the individual domain of the ROBINS-I as well as the NOS was substantial with a mean AC1 statistic of 0.67 (95% CI: 0.50-0.83) and 0.73 (95% CI: 0.65-0.81), respectively. The criterion validity of the ROBNS-I was moderate (K = 0.52) against NOS. The time in assessing a single study by ROBINS-I varied from 7 hours initially to 3 hours compared with 30 minutes for the NOS. Both reviewers rated "bias due to departure from the intended interventions" the most time-consuming domain in the ROBINS-I, items in the NOS were equal. CONCLUSIONS:The ROBINS-I and the NOS seem to provide the same reliability but vary in applicability. The over-complicated feature of ROBINS-I may limit its usage and a simplified version is needed.
10.1111/jebm.12427
Inter-rater reliability and concurrent validity of ROBINS-I: protocol for a cross-sectional study.
Jeyaraman Maya M,Rabbani Rasheda,Al-Yousif Nameer,Robson Reid C,Copstein Leslie,Xia Jun,Pollock Michelle,Mansour Samer,Ansari Mohammed T,Tricco Andrea C,Abou-Setta Ahmed M
Systematic reviews
BACKGROUND:The Cochrane Bias Methods Group recently developed the "Risk of Bias (ROB) in Non-randomized Studies of Interventions" (ROBINS-I) tool to assess ROB for non-randomized studies of interventions (NRSI). It is important to establish consistency in its application and interpretation across review teams. In addition, it is important to understand if specialized training and guidance will improve the reliability of the results of the assessments. Therefore, the objective of this cross-sectional study is to establish the inter-rater reliability (IRR), inter-consensus reliability (ICR), and concurrent validity of ROBINS-I. Furthermore, as this is a relatively new tool, it is important to understand the barriers to using this tool (e.g., time to conduct assessments and reach consensus-evaluator burden). METHODS:Reviewers from four participating centers will appraise the ROB of a sample of NRSI publications using the ROBINS-I tool in two stages. For IRR and ICR, two pairs of reviewers will assess the ROB for each NRSI publication. In the first stage, reviewers will assess the ROB without any formal guidance. In the second stage, reviewers will be provided customized training and guidance. At each stage, each pair of reviewers will resolve conflicts and arrive at a consensus. To calculate the IRR and ICR, we will use Gwet's AC statistic. For concurrent validity, reviewers will appraise a sample of NRSI publications using both the New-castle Ottawa Scale (NOS) and ROBINS-I. We will analyze the concordance between the two tools for similar domains and for the overall judgments using Kendall's tau coefficient. To measure the evaluator burden, we will assess the time taken to apply the ROBINS-I (without and with guidance), and the NOS. To assess the impact of customized training and guidance on the evaluator burden, we will use the generalized linear models. We will use Microsoft Excel and SAS 9.4 to manage and analyze study data, respectively. DISCUSSION:The quality of evidence from systematic reviews that include NRS depends partly on the study-level ROB assessments. The findings of this study will contribute to an improved understanding of the ROBINS-I tool and how best to use it.
10.1186/s13643-020-1271-6