Last week, the United States Medical Licensing Examination (USMLE) announced that Step 1, the first of three required licensing examinations for medical trainees, will stop reporting three-digit scores and instead report only a pass/fail designation as early as January 2022. The three-digit scoring systems for Step 2 Clinical Knowledge (CK) and Step 3, and the pass/fail system for Step 2 Clinical Skills, will remain unchanged.
In explaining its decision to change the Step 1 scoring system, the USMLE noted that its co-sponsors, the National Board of Medical Examiners (NBME) and the Federation of State Medical Boards (FSMB), “believe that changing Step 1 score reporting to pass/fail can help reduce some of the current overemphasis on USMLE performance, while also retaining the ability of medical licensing authorities to use the exam for its primary purpose of medical licensure eligibility.” We have also made similar arguments advocating for a shift to a pass/fail scoring system in the past, due to concerns regarding the three-digit score’s pernicious effect on medical curricula, medical student well-being, and its misuse in selecting candidates for specialty training.
It is only fair that we give the USMLE and its corporate sponsors credit for making the right decision. As we noted in a piece for in-House prior to the Invitational Conference on USMLE Scoring last year, there were plenty of reasons, including glaring financial conflicts of interest and worrisome written comments from the leaders of the NBME and FSMB, to suspect that the USMLE and its sponsors would maintain the status quo. Ultimately, the harms of maintaining the “Step 1 Climate” became too great to ignore, even for the test’s corporate sponsors.
Since the news broke, many have reacted with strong feelings. Many medical students, residents, and faculty are overjoyed, viewing the change as a long overdue course correction for medical education. Others have strongly criticized the decision, arguing that it will remove an essential “objective” metric for selecting medical students for residencies and harm individuals from international medical schools and less prominent American medical schools.
These criticisms are not entirely without merit, but ignore the realities of how we got here. USMLE Step 1, a multiple choice question test of basic science knowledge, was never intended to be used as it is today. The reliance on Step 1 scores in resident selection was a decision rooted in convenience, not evidence; since all students have Step 1 scores available by the time they apply for residency, Step 1 scores can be used as a “filter” to reduce a mountain of applications to a manageable pile. However, the skills required to score well on Step 1 are not the same as those required to become a good doctor.
Step 1 scores are objective, but that doesn’t mean they are meaningful, precise, or predictive of residency success. All program directors would agree that a Step 1 score of 250 is higher than a score of 235. But does that mean that an applicant with the former score will become a better resident than the latter? In fact, because the standard error of difference for USMLE Step 1 is 8 points, scores must differ by 16 points or more for a program director to conclude with 95% confidence that there is even a significant difference in test performance between the two applicants.
On the other hand, who will be a better resident: a student with an extensive research background, or one who started a free clinic? A student who is the first in their family to attend college, or a member of the Gold Humanism Honor Society? Deciding which of these applicants is more likely to succeed in a particular residency program requires a human judgment, and different programs will likely make different decisions. Outsourcing that decision to a three-digit score was convenient, but prevents program directors from viewing applicants as individuals and making mission-based selection decisions.
Lastly, while a few international medical graduates (IMGs) with high Step 1 scores and ambitions to practice medicine in the United States have benefited from the scored Step 1 system, the truth is that the vast majority have not. Average Step 1 scores and match rates for IMGs lag significantly behind those for American allopathic and osteopathic students. The majority of United States residency programs do not consider non-U.S. citizen IMGs, and IMGs are especially rare in the most selective specialties. As it currently stands, as Dr. Benjamin Mazer has pointed out, the school from which a medical student graduates significantly impacts where he or she matches. Making Step 1 pass/fail does not change that unpleasant truth.
To some degree, however, these criticisms highlight a larger point: Step 1’s transformation to pass/fail should not be viewed as the end in itself. There is a larger root rot in the way in which candidates are selected for medical residencies. Today, many physicians do not practice in the field that most appealed to them, privilege and race often impede many from training in medicine, and medical education often focuses on esoteric basic science facts in an effort to teach to the test rather than emphasize the larger art and science of medicine. As a result, programs screen and select candidates based on narrow and arbitrary measures unrelated to their ability to practice as a doctor, and a small group of applicants takes up an outsized number of interview slots. If the ultimate outcome of this transition is to emphasize yet another imperfect, albeit more clinically relevant, numeric metric like Step 2 CK, another arms race will ensue, and an opportunity for transformative change would be missed.
What would transformative change look like? We believe residency programs should move away from the idea that any one-size-fits all numeric metric can identify the applicants best suited for success in every specialty and program, and instead should embrace holistic review. Students from international and less-prestigious medical schools rightly point out that they may not have the same access to C.V.-boosting research opportunities or high-profile letters of recommendation as their peers at more prestigious schools. But it is only by considering applicants as individuals that “distance traveled” and success with the available resources can even be considered. No objective measure can do that for us.
Considering an applicant’s experiences, personal attributes, and academic qualifications in combination will likely better match applicants with the right residency program. But unless we couple the pass/fail Step 1 with efforts to limit the volume of applications programs receive, program directors will gravitate to another numeric screening metric out of necessity. Accordingly, applicant and interview caps, lotteries, and early acceptance programs to residency should all be considered as ways to stop application fever and make applying to residency more fair and rational. These proposed changes are just the start, and the next two years provide ample opportunity to rethink how we teach medical students and select them for residency.
Ending Step 1 Mania was a great step forward for medical education in America. But the time for change is just beginning.
Image credit: The (Blue) Study Stack (CC BY-NC-ND 2.0) by wenzday01