RAND Appropriateness Method (RAM)
I love finding something that is new to me about a topic I thought I was well informed about. That happened recently while working on a project to create guidelines for publications about missed, rationed, or unfinished nursing care. Our work group is made up of two nurses from Switzerland, one from Italy, and me.
The graduate student from Switzerland has been responsible for setting up a Delphi-like process to gain information about what which elements of a research article are particularly important to include when reporting research results about missed, rationed, or unfinished nursing care. She settled on the Rand/UCLA Appropriateness Method (RAM).
Even though for my own dissertation in 1988 I used the Delphi method (which was also developed by RAND) I had never heard of the RAM. So, I searched for background and explanation of the method. I also checked to see what publications in nursing had reported using the RAM. I found very few examples of RAM being used to evaluate nursing interventions or policies. See the bottom of this post for an example that might be of interest for nurses, however.
The overview below came directly from the RAND/UCLA Appropriateness Users’ Manual.
With so few randomized controlled trials (termed the “gold standard” of research) on which to base decisions about appropriateness of nursing actions, I believe the RAM has a promising role to play in evidence-based decision making in nursing.
I would love to hear from you about your thoughts on the method and its role in nursing or in your own field.
An Overview of the Method The basic steps in applying the RAM are shown in Figure 1. First, a detailed literature review is performed to synthesise the latest available scientific evidence on the procedure to be rated. At the same time, a list of specific clinical scenarios or “indications” is produced in the form of a matrix which categorises patients who might present for the procedure in question in terms of their symptoms, past medical history and the results of relevant diagnostic tests. These indications are grouped into “chapters” based on the primary presenting symptom leading to a patient’s being referred for treatment or considered for a particular procedure.
Figure 1: The RAND/UCLA Appropriateness Method
An example of a specific indication for coronary revascularization in the chapter on “Chronic Stable Angina” is: A patient with severe angina (class III/IV) in spite of optimal medical therapy, who has 2-vessel disease without involvement of the proximal left anterior descending artery, an ejection fraction of between 30 and 50%, a very positive stress test, and who is at low to moderate surgical risk.
A panel of experts is identified, often based on recommendations from the relevant medical societies. The literature review and the list of indications, together with a list of definitions for all terms used in the indications list, are sent to the members of this panel.
For each indication, the panel members rate the benefit-to-harm ratio of the procedure on a scale of 1 to 9, where 1 means that the expected harms greatly outweigh the expected benefits, and 9 means that the expected benefits greatly outweigh the expected harms. A middle rating of 5 can mean either that the harms and benefits are about equal or that the rater cannot make the judgement for the patient described in the indication. The panellists rate each of the indications twice, in a two-round “modified Delphi” process. In the first round, the ratings are made individually at home, with no interaction among panellists.
In the second round, the panel members meet for 1-2 days under the leadership of a moderator experienced in using the List of indications and definitions
% of use that is:
Retrospective: Comparison with clinical records 1st round: no interaction 2nd round: panel meeting Prospective: Clinical decision aids Increase appropriateness Literature review and synthesis of the evidence Expert panel rates indications in two rounds page 5 method. Each panellist receives an individualised document showing the distribution of all the experts’ first round ratings, together with his/her own specific ratings. During the meeting, panellists discuss the ratings, focusing on areas of disagreement, and are given the opportunity to modify the original list of indications and/or definitions, if desired.
After discussing each chapter of the list of indications, they re-rate each indication individually. No attempt is made to force the panel to consensus. Instead, the two-round process is designed to sort out whether discrepant ratings are due to real clinical disagreement over the use of the procedure (“real” disagreement) or to fatigue or misunderstanding (“artifactual” disagreement).
Finally, each indication is classified as “appropriate,” “uncertain” or “inappropriate” for the procedure under review in accordance with the panellists’ median score and the level of disagreement among the panellists. Indications with median scores in the 1-3 range are classified as inappropriate, those in the 4-6 range as uncertain, and those in the 7-9 range as appropriate. However, all indications rated “with disagreement,” whatever the median, are classified as uncertain. “Disagreement” here basically means a lack of consensus, either because there is polarisation of the group or because judgements are spread over the entire 1 to 9 rating scale.
As discussed in Chapter 8, various alternative definitions for disagreement have been used throughout the history of the RAM. Appropriateness studies sometimes categorise levels of agreement further to identify indications rated “with agreement” and those rated with “indeterminate” agreement (neither agreement nor disagreement). Depending on how the appropriateness criteria are to be used, it may sometimes be desirable to identify those indications rated with greater or lesser levels of agreement. If necessity criteria are also to be developed, a third round of ratings takes place, usually by mail, in which panellists are asked to rate the necessity of those indications that have been classified as appropriate by the panel. The RAM definition of necessity (Kahan et al., 1994a) is that:
- The procedure is appropriate, i.e., the health benefits exceed the risks by a sufficient margin to make it worth doing.
- It would be improper care not to offer the procedure to a patient.
- There is a reasonable chance that the procedure will benefit the patient.
- The magnitude of the expected benefit is not small.
All four of the preceding criteria must be met for a procedure to be considered as necessary for a particular indication. To determine necessity, indications rated appropriate by the panel are presented for a further rating of necessity. This rating is also done on a scale of 1 to 9, where 1 means the procedure is clearly not necessary and 9 means it clearly is necessary. If panellists disagree in their necessity ratings or if the median is less than 7, then the indication is judged as “appropriate but not necessary.” Only appropriate indications with a necessity rating of 7 or more without disagreement are judged “necessary.” Comparison with Other Group Judgement Methods
The RAM is only one of several methods that have been developed to identify the collective opinion of experts (Fink et al., 1984). Although it is often called a “consensus method,” it does not really belong in that category, because its objective is to detect when the experts agree, rather than to obtain a consensus among them. It is based on the so-called “Delphi method,” developed at RAND in the 1950s as a tool to predict the future, which was applied to political-military, technological and economic topics (Linstone et al., 1975).
The Delphi process has since also come to be used in a variety of health and medical settings. The method generally involves multiple rounds, in which a questionnaire is sent to a group of experts who answer the questions anonymously. The results of the survey are then tabulated and reported back to the group, and each person is asked to answer the questionnaire again. This iterative process continues until there is a convergence of opinion on the subject or no further substantial changes in the replies are elicited.
The RAM is sometimes miscast as an example of the Nominal Group Technique (NGT). NGT is a highly structured process in which participants are brought together and asked to write down all their ideas on a particular subject. The moderator asks each person to briefly describe the most important idea on his or her list, and continues around the table until everyone’s ideas have been listed. After discussion of each topic, participants are asked to individually rank order or rate their judgement of the item’s importance on a numerical scale. Different mathematical techniques are used to aggregate the results. The RAM, unlike the NGT, begins with a highly structured list of clinical indications, and the discussion is tightly linked to the basic measurement of appropriateness.
A third group judgement method is the Consensus Development Conference. The U.S. National Institutes of Health (NIH) have a mandate to evaluate and disseminate information about health care technologies and biomedical research (Kanouse, 1989). To this end, they have developed what are known as NIH Consensus Conferences, which bring together a wide variety of participants, including physicians, researchers and consumers, who are charged with developing a mutually acceptable consensus statement to answer specific, pre-defined questions about the topic. This process includes conducting a literature review, summarising the current state of knowledge, presentations by experts and advocates, and audience discussion. These conferences frequently last 2 or more days, and do not end until the participants have agreed on a written statement.
Many European countries have developed their own versions of Consensus Conferences. At its centre, the RAM is a modified Delphi method that, unlike the original Delphi, provides panellists with the opportunity to discuss their judgements between the rating rounds. Contrary to the fears of the original developers of Delphi, experience with the RAM and the contemporaneous literature on group processes both indicate that the potential for bias in a face-to-face group can be largely controlled by effective group leadership (e.g., Kahan et al., 1994b). Thus, while panellists receive feedback on the group’s responses, as is done in the classic Delphi method, they have a chance to discuss their answers in a face-to-face meeting, similar to the NGT and NIH Consensus Conferences
The following article used a modified RAM and illustrates the application of the method to quality of life.
Improving Methods for Measuring Quality of Care: A Patient-Centered Approach in Chronic Disease Barbara G. Bokhour, Mary Jo Pugh, Jaya K. Rao, Ruzan Avetisyan, Dan R. Berlowitz, and Lewis E. Kazis Medical Care Research and Review Vol 66, Issue 2, pp. 147 – 166