Systematic Reviews to Answer Health Care Questions
185
Chapter 12 • Assessing and Rating the Strength of the Body of Evidence
question is viewed across the analytic framework to determine whether there is adequate evi dence to support a complete chain of linkages connecting the preventive service to health out comes, and the degree to which the evidence directly addresses the populations, conditions, and outcomes identified in the research questions. The evidence is graded as good, fair, or poor. For this method, the systematic reviewers assess the evidence for the first step, and the USPSTF members assess the evidence for the second step similar to guideline groups rating the strength of a recommendation in a guideline. This chapter highlights the GRADE and AHRQ EPC methods because they closely align with accepted standards, are explicit and similar, and are widely used. This book refers to the eval uation of the body of evidence as strength of evidence while acknowledging that other groups use the term quality of evidence. Although both terms are accurate and acceptable, the term “quality” has also been applied to the assessment of internal validity (risk of bias) of individual studies and use of both terms could be confusing. ■ ■ HOW TO ASSESS THE STRENGTH OF EVIDENCE Systematic reviewers assess the strength of evidence, whereas guideline development groups determine the strength of a recommendation based on the evidence. This section describes how to assess the strength of evidence as a final step in synthesizing studies in a systematic review based on the GRADE and AHRQ EPC methods. Assessing the strength of evidence begins by determining how well studies address methodological domains ( characteristics ). These include study limitations , directness , consistency , precision , and reporting bias of the body of evi dence (Table 12.2). Additional domains for observational studies include magnitude of effect ( strength of association ), dose–response association , and plausible confounding that could change the observed effect. The GRADE method assigns ratings by identifying problems with the body of evidence rather than affirming the lack of a problem, 5,11 whereas the AHRQ EPC method uses the inverse approach. 3 In GRADE, a body of evidence consisting of randomized controlled trials (RCTs) begins with a high strength of evidence rating, whereas that consisting of observa tional studies begins with a low strength of evidence rating. It is important to be aware of which approach is used to avoid misinterpretation. Also, although the domains represent dis tinct concepts, they often overlap or are interwoven. Nonetheless, breaking the concepts into domains improves transparency and outlines the rationale behind the overall rating. The rat ings of strength of evidence were developed to be applied to individual outcomes. Depending on the purpose of the systematic review, ratings can also be applied to research questions. The approach is similar regardless. Each domain is evaluated separately and given a rating based on specific metrics. In the GRADE method, most domains are rated as no , serious , and very serious , whereas publication bias is rated as undetected or strongly suspected . Each domain starts at the highest level, and the levels are then reduced depending on the specific limitations of the evidence. GRADE pro vides guidance about how to reduce the levels within a domain for various types of limita tions. 5,11 In the AHRQ EPC method, the metrics are different for each domain. For example, study limitations are rated high , medium , or low ; directness is direct or indirect ; consistency is consistent , inconsistent or unknown/none ; and precision is rated precise or imprecise . AHRQ EPC ratings are based on global judgments about the evidence, rather than reducing levels based on limitations. 3 The overall rating is based on the ratings of the individual domains. Neither method uses a cumulative scoring system to reach an overall rating. Both methods use four categories, includ ing high , moderate , and low . The GRADE method uses a very low category, whereas the AHRQ EPC method uses an insufficient category (Table 12.3). When the body of evidence consists of both trials and observational studies, ratings can be done initially for the separate study designs to accommodate the inherent issues related
Copyright © 2024 Wolters Kluwer, Inc. Unauthorized reproduction of the content is prohibited.
Made with FlippingBook - professional solution for displaying marketing and sales documents online