The Case for Antidepressants in 2022
A Resource Guide for Clinicians, Journalists, Patients, and the General Public
“As for the antidepressant controversy, in retrospect it can be understood as one of those social phenomena that arise when long-standing grievance meets an opportunity for pushback…” Peter Kramer, Ordinarily Well
Skepticism about efficacy of antidepressants is nothing new. The debate has been raging for decades, and even in my short career as a psychiatrist I myself have gone through periods in which I have “lost faith” in their efficacy. Skeptical stance about antidepressants often tends to dominate media coverage in waves. The publication of the serotonin hypothesis paper in Molecular Psychiatry in July of this year has unleashed a new and intense wave of such stories in the media (even though the paper had nothing direct to say about antidepressants). A recent story in Newsweek covers a lot of familiar territory in terms of doubts about antidepressant efficacy. I have spoken to numerous journalists in these past months and I can attest that the journalists struggle to make sense of the scientific evidence as well and to reconcile research evidence with the experience of clinicians and patients.
Despite frequent negative coverage, clinicians and patients remain unpersuaded — for the most part — that these medications lack efficacy. Antidepressants continue to be one of the most commonly prescribed medications in the Western world. The curious thing is that most clinicians, at least in my experience, demonstrate tremendous difficulty in countering the common arguments that are put forth in favor of antidepressant inefficacy. They appeal either to clinical experience (which is a legitimate source of knowledge, for sure) or to treatment constraints (“what choice do we have?”). These arguments are not without merit, but such responses embolden the critics who views themselves as “evidence-based” and it cement their certainty that the scientific evidence is on their side.
There is, however, a lot of scientific evidence to support the efficacy of antidepressants. For whatever reason, this body of research is not common knowledge and not accessible in the same way as research that questions the efficacy of antidepressants. So my goal in this post is to bring together the evidence in favor of antidepressants that challenges that claim that antidepressants lack clinically meaningful efficacy in the treatment of depression.
An important qualification: The antidepressant debate is complicated and multifaceted. The assertion that antidepressants lack clinically meaningful efficacy is different from the claim that antidepressants are overprescribed, or the claim that adverse effects of antidepressants have historically been under-recognized, or the claim that psychosocial treatments should be prioritized over pharmacological treatment for the average patient. These assertions should not be conflated. In making a case that antidepressants possess clinically meaningful efficacy, I am not addressing these other claims.
Another qualification: I am also not making a claim that the scientific evidence of efficacy is so persuasive that the most strident skeptics will be convinced. Aside from the general difficulties of persuading skeptics — evident in areas such as climate change and COVID-19 vaccines — the evidence in favor of antidepressants is nonetheless limited in crucial ways that allows skeptics to remain unpersuaded. One can almost always ask for more rigorous evidence. The crucial thing in my opinion is whether the evidence is strong enough to generate a scientific consensus, and in fact a consensus does exist in favor of antidepressants in the form of multiple, independent evidence-based practice guidelines.
Antidepressants outperform placebo in randomized clinical trials in a manner that is statistically significant (that is, the results are unlikely to have occurred by chance alone). This was confirmed in one of the largest meta-analysis ever conducted that included 522 clinical trials and 116K subjects: “In terms of efficacy, all antidepressants were more effective than placebo, with ORs ranging between 2·13 (95% credible interval [CrI] 1·89–2·41) for amitriptyline and 1·37 (1·16–1·63) for reboxetine.” This finding has been demonstrated in multiple other meta-analyses, and is not by itself subject to dispute. Pretty much everyone accepts that antidepressants outperform placebo in a statistically significant fashion.
The problem, however, is that the average difference between improvement in depressive symptoms — as measured by the most commonly used rating scale Hamilton Depression Rating Scale (HDRS) — is 2 points, or an effect size of 0.3 which is considered small. In a recent analysis of trials in the FDA database, HDRS score improved by 8 points on average in the placebo group vs 9.8 points in the antidepressant group. HDRS-17 total score ranges from 0 to 52. A difference of 2 points is not very meaningful at face-value.
This is the heart of the controversy: if antidepressants are clinically effective, why only a 2 point difference from placebo? How could a 2 point difference be indicative of anything but lack of clinically significant efficacy?
There are several non-mutually exclusive ways of responding to this:
An average different of 2 points obscures variation in treatment response and there are subgroups that display a substantial difference from placebo.
HDRS total score is an inappropriate measure of antidepressant efficacy; it may be sufficient to demonstrate a statistical separation from placebo, but does not appropriately quantify the magnitude of benefit.
Despite appearances, a 2 point HDRS average difference from placebo is indeed clinically meaningful.
#1. Different trajectories of response
Traditionally this argument has been made using “response rates” (50% reduction in symptom severity) and “remission rates” (near complete resolution of depressive symptoms). The average response to placebo in meta-analyses is around 35% compared to about 50% for antidepressants. While clinicians find thinking in terms of response and remission rates to be more clinically meaningful than average HDRS score (for good reasons), critics have objected to such binary categorizations as arbitrary and artificially inflating efficacy. There is an argument to be made in favor of response and remission rates, but I won’t dwell on that here.
Fortunately, differences in response to antidepressants can also be demonstrated without appeal to arbitrary thresholds of response rate and remission rate.
Thase et al. 2011 used a special statistical model to demonstrate that patient subgroups of those benefiting or not benefiting from treatment could be identified, and that about 20% of patients benefited from escitalopram but not from placebo treatment (which corresponds to a number needed to treat - NNT - of 5).
In the recent Stone et al. 2022 analysis of antidepressant trials in the FDA database, 3 trajectories of response patterns were found, with average HDRS-17 score reductions of 16.0 (large response), 8.9 (nonspecific response), and 1.7 points (minimal response). Compared to placebo, antidepressant treatment was more likely to show large responses (24.5% v 9.6%) and less likely to show minimal responses (12.2% v 21.5%). Most responses (60-70%), however, were in the non-specific category without prominent difference between antidepressant and placebo. The NNT for large response for antidepressant vs placebo was 6.7. (There is a case to be made that this may be an underestimate based on differences among antidepressants in the database and because of inclusion of trials with pediatric patients, but I’ll bypass that here to keep things simple.)
A medication that demonstrates a large response in 25% of patients (vs 10% of those in the placebo group) and reduces the likelihood of a minimal response is by no means a medication with “marginal efficacy” and with therapeutic effects that are “clinically meaningless.” This constitutes a clear signal of efficacy in a subset of patients that is otherwise obscured by a small average difference. The NNT of 6-7 also falls well within the respectable range when it comes to treatments in general medicine. Anyone who thinks that this constitutes marginal efficacy lack a sense of perspective of what treatment efficacy generally looks like in medicine and clinical psychology.
The efficacy certainly leaves much to be desired. It’s not as high as we would like it to be. It can be nonetheless be described as, in words of Peter Kramer, “ordinarily well.” I would call it “reasonable.” It’s also notable that this is efficacy data for a single trial of antidepressant. In clinical practice, it often takes 2 or 3 trials to find an antidepressant that the patient finds to be efficacious and tolerable, and the success rates are correspondingly higher in clinical practice.
#2. HDRS total score is not the right measure
Another response to the dilemma of “2 point HDRS difference from placebo” is to point out the limitations of using HDRS-17 for the purpose of evaluating antidepressant efficacy. The 17 items included in the scale represent an idiosyncratic mix. There is, for example, only one item about depressed mood, but three items about insomnia. It also includes symptoms peripherally related to depression, such as hypochondriasis, sexual symptoms, and gastrointestinal symptoms. The inclusion of sexual and gastrointestinal symptoms is particularly problematic because these are also recognized side effects of antidepressant medications, so the scale poorly discriminates between depressive symptoms and medication side effects. Furthermore, each item is given equal weight and the validity of adding these items into a single meaningful score is questioned by many.
The paper that makes this most obvious is Hieronymus et al, 2016 in which they conducted a patient-level post-hoc analyses of antidepressant effects on individual items of HDRS, in particular the depressed mood item. You can see Table 3 from the article here, which shows an average effect size of 0.27 for HDRS total score, but 0.40 for the depressed mood item. It is also interesting to see the pattern of response among different items — there are many that change poorly, again emphasizing the inability of total score to illuminate the magnitude of antidepressant efficacy.
Furthermore, they also note that the depressed mood item more consistently separates active treatment from placebo at week 6: “While 18 out of 32 comparisons (56%) failed to separate active drug from placebo at week 6 with respect to reduction in HDRS-17-sum, only 3 out of 32 comparisons (9%) were negative when depressed mood was used as an effect parameter (P<0.001). The observation that 29 out of 32 comparisons detected an antidepressant signal from the tested SSRI suggests the effect of these drugs to be more consistent across trials than previously assumed.”
Considerations such as these may be related to the fact that in the PANDA study in primary care patients, while the differences in the depression rating scales (PHQ-9 and BDI-II) were unimpressive (not statistically significant for PHQ-9 and only statistically significant for BDI-II at 12 weeks), a crude 1-item measure of “Feeling better (self-rated improvement)” showed prominent separation: 59% felt better with antidepressant at 12 weeks compared to 42% with placebo.
#3. A 2-point average difference from placebo is nonetheless meaningful
There are different ways of supporting this point. The first relates to what we have discussed in #1 — averages can obscure subgroup differences.
A second way is to compare it to the effect of other standard treatments. In the case of depression, psychotherapy provides a meaningful comparison. A number of trials have compared antidepressant treatment to short-term manualized psychotherapy (usually CBT) and the results are pretty clear across multiple trials: both antidepressant and short-term manualized psychotherapy have equal effects on depression rating scales, and the combination of both is superior to individual treatment. This is best illustrated in this Cuijpers, et al. 2020 meta-analysis. A consequence of this is that anyone firmly committed to the idea of antidepressants being marginally effective is also forced to accept that short-term manualized psychotherapy, which constitutes the bulk of RCT evidence for psychotherapy, is also marginally effective in the acute treatment of depression. (There is some data to suggest that psychotherapy and combination of psychotherapy and antidepressant have more sustained long-term effects than antidepressant alone, but I won’t go into that here.)
Another comparison is with other treatments in general medicine. The go-to reference for that is Leucht et al, 2012, which shows that antidepressants have an effect size (vs placebo) that is comparable to other standard treatments in medicine.
A third sort of response makes the point that 2-point difference from placebo group is different from a 2-point difference from no treatment. Treatment in the placebo group is not simply a dummy pill, but also involves extensive weekly evaluations with a lot of inbuilt psychosocial support and at times financial compensation. The improvement seen in the placebo group is likely a mixture of natural history, regression to the mean, expectancy effects, and therapeutic effects of indirect psychosocial support.
This points to a problem with the additivity of antidepressant and “placebo” effects. The best discussion of this that I’ve seen comes from Peter Kramer in Ordinarily Well. He uses the example of vodka vs placebo tonic water to make the point. If we see “4 points” of intoxication/incoordination on an imaginary scale with consumption of vodka, vs “2 points” of intoxication/incoordination with placebo tonic water, the “real” effect of vodka is not vodka minus placebo. We have subtracted too much because effects of alcohol and expectancy are not additive. Vodka + placebo/expectancy (4 points + 2 points) does not equal 6 points; it is still 4 points.
In the case of depression, we already know that antidepressant and psychotherapy effects are not additive. “[T]he common outcome in psychiatric research is, two and two do not equal four.” Kramer writes. “Placebo and antidepressant effects are unlikely to be additive. Much of what medication accomplishes, it achieves on its own... In antidepressant trials, almost certainly, full additivity does not apply—and yet our calculations, including ones for effect sizes, assume it. Virtually every formal estimate of antidepressant efficacy arises from a premise, the right to subtract, that is unproven and likely wrong. Our estimates of drug efficacy run too low.”
The additivity assumption is also noted by Irving Kirsch, who is otherwise skeptical of antidepressant efficacy: “If antidepressant drug effects and antidepressant placebo effects are not additive… then antidepressant drugs have substantial pharmacologic effects that are duplicated or masked by placebo.”
According to Kramer, this is precisely what is going on, and leads us to a certain uncomfortable conclusion: “we cannot count on additivity. This uncertainty presents a challenge for evidence-based psychiatry: Our controlled trials, conventionally analyzed, may not reflect reality. Despite our use of randomization, they are likely subject to a consistent confound, arising from a technical bias against antidepressants. We know that antidepressants work. We cannot say how well.”
This post is already long enough as it is but I have a few more points to raise, which I will do briefly.
# It has been suggested by critics that antidepressant act as “active placebos” which cause unblinding from side effects, leading to enhanced expectancy, accounting for the 2-point average difference. This suggestion is countered by Hieronymus et al 2018 and Lisinski et al 2020, who found no relationship between superiority over placebo and report of adverse effects, and neither did they find an association between adverse event severity and antidepressant response. They concluded that the antidepressant effect is not dependent on side effects breaking the blind.
# Some critics have also speculated with undue confidence that the antidepressant benefit over placebo can be accounted for by antidepressant effects on emotional blunting. While emotional blunting is a recognized adverse effect of antidepressants that occurs in some patients, evidence from clinical trials does not support the explanation that emotional blunting mediates treatment response. (See also)
# It has also been expressed by some critics that in the absence of low serotonergic activity or “serotonergic dysfunction,” there is nothing for antidepressants to “correct” and thus there is no plausible mechanism of action other than enhanced placebo and blunting. That is not true. While the jury is still out on whether depression involves some sort of alterations in the serotonin system given the limitations of our current methods, there are numerous hypothesized mechanisms of action that do not involve correcting low serotonin activity. Antidepressants may not correct a serotonergic deficiency or deficit, but they do act on serotonergic and monoaminergic systems that are involved in regulation of mood and behavior. These pathways need not be “dysfunctional” for antidepressants to act on them (just as analgesics, antipyretics, and antitussives work on pathways that are not inherently dysfunctional to produce symptomatic effects). Furthermore, antidepressant effects on neurogenesis and neuroplasticity are at present leading scientific hypotheses regarding the mechanisms of action.
# The maintenance efficacy of antidepressants is in all likelihood currently over-estimated. This is due to both problems in trial design (see Ghaemi’s discussion of this) and due to poor accounting of antidepressant withdrawal effects that emerge when antidepressants are rapidly switched to placebo. I find it difficult to avoid the conclusion that maintenance efficacy data is severely confounded. The long-term use of antidepressants on a routine basis requires re-examination by researchers. In the recent Lewis et al, 2021 study which had a relatively short taper period over 2-months (generally considered inadequate to control antidepressant withdrawal), by 52 weeks, relapse had occurred in 39% in the maintenance group and in 56% in the discontinuation group.
# The propensity of antidepressants to cause physiological dependence (not the same as “addiction”) and withdrawal has historically been under-recognized. The consensus around this issue has changed considerably in recent years, thanks in part to the voices of the harmed patient community. See guidance from the Royal College of Psychiatrists and this article by Adele Framer: “What I have learnt from helping thousands of people taper off antidepressants and other psychotropic medications”
# Existing RCTs generally exclude patients with clinically severe depression of the variety that is associated with melancholic, catatonic or psychotic features, that may require psychiatric hospitalization, and for which treatments like ECT are often utilized.
# Practice guidelines currently recommend lifestyle changes, psychoeducation, and psychological interventions for mild depression and encourage antidepressant plus psychotherapy for more severe depression, but “all treatments (psychological interventions, psychoeducation, exercise, antidepressants and ECT) can be administered from the outset, dependent on the clinical presentation and patient preference.” (See Malhi et al 2022 for a comparison of guidelines from the UK and Australia/New Zealand, which show a lot of convergence in recommendations)
# “Antidepressants” have efficacy not only in depression but across a range of “internalizing” disorders which include anxiety disorders, obsessive-compulsive disorder, and post-traumatic stress disorder.
# What I find reassuring is the triangulation of evidence from multiple sources: animal models, neurobiology of depression and antidepressant mechanisms, clinical trials (both RCTs and open-label), clinical experience, and patient experience all point towards clinically meaningful effects of antidepressants. (That said, there are probably around 1/4 to 1/3 of patients with depression who do not respond well to standard first-line antidepressants, hence the need and urgency to discover other effective pharmacological treatments.)
# My interactive opinion piece for the New York Times: How Is Depression Treated? Let Me Show You. (Sep 2022)
# While I contest the idea that antidepressants have marginal and clinically irrelevant efficacy — indeed their modest but clinically relevant effects are acknowledged by pretty much every depression treatment guideline — I am sympathetic to the assertions that antidepressants are overprescribed in practice, often continued in the maintenance phase unreflectively, and that other forms of psychological and social interventions are alarmingly under-utilized. Their adverse effects are also poorly incorporated in clinical decision-making and informed consent. It is unfortunate that so much time and energy is spent arguing whether antidepressants “work” or “don’t work” when that time and energy could instead be spent examining the appropriate place of antidepressant in the clinical treatment of depression, how to personalize treatment, and how to utilize and improve access to non-pharmacological interventions.
Nicely done, Awais! The points you make dovetail very neatly with the piece I did for Psychiatric Times, in which I discuss the nature of the placebo condition; what the "15%" figure in the Stone et al study really means for clinicians; and why a narrow fixation on numbers can obscure the main reason psychiatric medications are worthwhile; i.e., because they reduce, even if modestly, the suffering and incapacity of extremely debilitating and sometimes lethal illnesses.
https://www.psychiatrictimes.com/view/antidepressants-placebos-and-lithium-some-parting-thoughts
All the best,
Ron
Ronald W. Pies, MD
Professor Emeritus of Psychiatry
SUNY Upstate Medical University
I think there is an inconsistency in this argument. It was stated that the maintenance efficacy data is severely confounded. However, I believe the justification for this, primarily withdrawal effects, also casts doubt on the RCTs that were referenced in support of antidepressant efficacy.
Stone et al. (2022)¹ stated that one of the limitations of their paper was that "the effects of treatment history of antidepressants or discontinuations before study entry on our results are unknown". They cited Hunter et al. (2015)², who found that previous exposure to antidepressants caused a greater separation between drug and placebo in a subsequent trial.
The vast majority of trials do not exclude people who have had previous exposure to antidepressants. This means that any drug-placebo difference could be due to this and may not be a true treatment effect that would manifest in people who are drug-naive.
While withdrawal effects from antidepressants are poorly understood, there are many patient reports of symptoms persisting for months and sometimes years after discontinuation. Presumably lasting changes have occurred in these people, which is the mechanism behind these persistent symptoms. These changes may alter their response to antidepressants at a later date. It's also possible that those in the placebo arm may have a decreased response, due to experiencing withdrawal from prior treatment.
With this in mind, I think it's difficult to argue that the rest of the evidence base is less confounded than the maintenance trials. Unless a large RCT is conducted on purely drug-naive people, we can't know what is a true drug effect and what is confounded by prior antidepressant use.
¹ Stone M B, Yaseen Z S, Miller B J, Richardville K, Kalaria S N, Kirsch I et al. Response to acute monotherapy for major depressive disorder in randomized, placebo controlled trials submitted to the US Food and Drug Administration: individual participant data analysis
² Hunter AM, Cook IA, Tartter M, Sharma SK, Disse GD, Leuchter AF. Antidepressant treatment history and drug-placebo separation in a placebo-controlled trial in major depressive disorder.