Forgetting curve · spaced retesting · the retest problem

Spaced retesting flattens the curve only when the retest is retrieval.

Ebbinghaus drew the original forgetting curve in 1885; Murre and Dros replicated it in 2015 and recovered the same exponential decay. The textbook fix, spread the practice across days and weeks, is real. Every guide on the curve stops there. The piece they skip is the mechanism that makes retesting fail in practice even when the intervals are correct.

A static flashcard stem stops being a retrieval somewhere around revisit three. By then the first few words of the wording have become a cue for the stored answer; you fire back the letter in a second, the scheduler reads the speed as mastery and pushes the card out, and the underlying fact decays without any retrieval to rebuild it. The curve is still falling. The deck just no longer has a way to see it.

Skip to the worked example →

Matthew Diakonov, Written with AI

Published May 12, 20266 min read

Direct answer · verified 2026-05-12

Does spaced retesting flatten the forgetting curve?

Only when each retest is a genuine retrieval. Spaced retesting against a stable-stem flashcard typically collapses into recognition by attempt three; the curve keeps falling under a high accuracy score because the scheduler is measuring stem familiarity, not memory of the fact. Rotating the stem on each revisit (same card ID, same FSRS schedule, only the wording changes) keeps retests honest, and that is what actually flattens the curve.

The mechanism is documented in Murre and Dros's 2015 replication of Ebbinghaus and in the spacing literature (Karpicke and Roediger 2008; Cepeda et al. 2008). What none of the standard write-ups address is the retrieval-to-recognition drift inside the retest itself.

What every other guide on this stops at

The standard story goes like this. Ebbinghaus learned lists of nonsense syllables, measured how much he could re-learn at various delays, and produced the now-famous curve: most of what you have studied is gone within 24 hours, the rest decays over days and weeks. Spaced retesting bends the curve back up before it bottoms out. Repeat the retest at increasing intervals and the decay flattens. Anki, Quizlet, RemNote, Brainscape all build on this. End of article, the implication being that any spaced-retesting app will do the work.

That account is true at the level of the schedule and false at the level of the retest. The math behind the spacing effect assumes that when the card resurfaces you actually try to retrieve the fact. The schedule has no way to distinguish a retrieval from a recognition pass. It treats your speed and accuracy as memory strength. If a card has drifted from a retrieval prompt into a recognition prompt, the schedule will keep pushing it to longer intervals on the assumption that you are mastering the fact, when what you are mastering is the wording of the stem.

The silent fade, traced through one card

Here is the same renal physiology card across three revisits in a static-stem deck. The wording does not change. The response time falls. The schedule reads the speed gain as mastery and pushes the card out. What is actually happening inside your head is annotated on each line.

static-stem-deck.log

The same card with the stem rotated

Now the working version. The card ID is unchanged, the FSRS schedule is unchanged, the correct answer slot is unchanged, and the source citation (slide 23 of the cardiology deck this fact came from) is unchanged. Only the wording of the stem rotates. Each revisit lands on a stem the brain has not seen before, so the retrieval has to reach the property rather than the opener.

rotated-stem-deck.log

Recognition pretending to be retrieval

Feature	Static-stem retest (Anki, Quizlet, ChatGPT-generated decks)	Rotating-stem retest (Studyly)
What revisit three actually measures	Recognition of a memorized stem shape	Retrieval of the underlying fact
Response time on the same card	Falls pass over pass; mostly stem familiarity	Stable; lookup work is real on every pass
Scheduler signal	Tracks a recognition trace pretending to be memory	Tracks the underlying memory across rotating stems
Card ID and FSRS schedule	Stable, but pointed at the wrong signal	Stable, pointed at the right signal
Exam-day transfer	Brittle on stems worded differently than the deck	Robust; you have already retrieved against rotated wordings
Effort cost to set up	Hand-authoring decks, 1 to 2 hours per 90-slide lecture	Around 60 seconds per 90-slide deck, 200 questions across four formats

You can watch the mechanic on the homepage

The carousel in section 03 of studyly.io shows this in motion. One underlying fact (the thick ascending limb of the loop of Henle is impermeable to water), three rephrased stems, cycling every 2.8 seconds. The correct option stays at slot B; the source citation stays fixed; only the wording of the stem changes. The point of the demo is to make the variation legible at a glance, because reading about stem rotation does not produce the same feeling as watching the same fact get tested three different ways in eight seconds.

The implementation lives in src/components/RephraseCarousel.tsx in the studyly-website repo. Lines 5 through 39 hold the three stem variants; line 60 sets the interval to 2,800 ms; the same card state, correct slot, and source slide are stable across all three variants. The drill loop in the app does exactly this on real cards, with the variants regenerated against the source span on each surfacing instead of being picked from a fixed list of three.

The four numbers that matter

0 msRotation interval on the live homepage demo

0Question-quality score on a three-document eval

0 sTime to convert a 90-slide deck to 200 questions

0 min/dayWorking cadence over 8 to 12 weeks

Two signals that your retests have drifted

You do not need a tool to check this on your own deck. There are two signals that the retest has drifted into recognition. The first: response time on a card keeps dropping pass over pass while the wording stays identical, and you do not feel the lookup happening anymore. The gap between fluent on this card and fluent on this concept is recognition gain. The schedule cannot tell them apart and you can.

The second: before you click the option, try to write the fact in your own words. On a card you have actually retained, this is easy. On a card you have only memorized the wording of, this is surprisingly hard, even though the answer letter is on the tip of your tongue and you can pick it on the multiple-choice screen in under two seconds. The gap between the two is the silent fade the scheduler does not see.

If either signal shows up on a card you care about, the fix is not a different schedule. The schedule is fine. The fix is rotating the stem on revisit, by hand if you must (write three or four wordings of the same question and surface them in rotation against the same card) or by using a tool that holds a stable card and rotates the stem for you.

What this does not replace

The retesting layer is the concept layer. Time to recall a property, when to use which structure, what a slide claimed, how a mechanism works, what the correct option among four nearby distractors is, given a clinical vignette. Rotating the stem on the retest is the move that keeps that layer honest.

It is not a substitute for working a problem set, for writing out a derivation by hand, or for the slow re-reading of a primary source that a thesis requires. Those are different practices with their own forgetting curves. The retest is the part of your studying that asks: do I still have this fact cold, in twenty seconds, six weeks after the lecture. That is the part that drifts silently into recognition under a static stem, and the part this page is about.

Forgetting curve and spaced retesting, frequently asked

Does spaced retesting actually flatten the forgetting curve?

Yes, but only to the extent that each retest is genuine retrieval. The 2015 replication of Ebbinghaus by Murre and Dros recovered the same exponential decay shape in modern subjects, and the spacing literature (Cepeda et al. 2008; Karpicke and Roediger 2008) is unambiguous that distributing the same total practice time across more sessions roughly doubles long-term retention. The catch the textbook explanation skips is that the math assumes every revisit is a retrieval. If your retest has quietly become a recognition pass against a stem you have seen four times in identical wording, the spacing schedule is measuring sentence familiarity, not memory of the fact.

What is the failure mode the standard explanations skip?

Static-stem flashcards drift from retrieval into recognition somewhere around revisit three. The brain begins indexing the answer on the first few words of the stem rather than on the underlying fact, your response time drops, the spaced-repetition scheduler interprets that speed as mastery and pushes the card to longer intervals, and the underlying memory decays without any retrieval to rebuild it. The curve is still falling; the deck just no longer has a way to see it. By the exam, the fluent feeling on the deck does not transfer to a stem worded differently than the one you memorized.

How can I tell my retest has become recognition rather than retrieval?

Two practical signals. First, your response time on a card keeps dropping pass over pass while the wording stays identical; that gap, between fluent on this card and fluent on this concept, is recognition gain rather than retrieval gain. Second, if you write out the same fact in your own words before answering, you will sometimes find you cannot, even though you would click the right multiple-choice option in under two seconds. That gap is the silent fade the scheduler does not see.

Does this mean spaced repetition does not work?

It works. The schedule, the intervals, the FSRS math are not the problem. The retest itself is the problem when the stem stops varying. Rotate the stem on each revisit while keeping the same card ID and the same schedule and the spacing math goes back to measuring what it is supposed to measure. The point is not to abandon spacing; it is to keep the retest honest so the spacing can do its job.

How is this different from generating new questions from the same source?

Generating new questions makes new cards with new IDs. The FSRS schedule resets and you lose the trajectory that told you you knew this fact on day one, struggled with it on day eight, and got it back on day twenty-three. Auto-rephrasing keeps the same card and the same schedule and only rotates the stem on each surfacing. The underlying fact, the correct answer slot, and the source citation stay fixed. You preserve the entire trajectory and the retest stays a real retrieval.

Can I see this rotation working anywhere?

Yes, the homepage at studyly.io shows it live. The carousel cycles three rephrased stems for the same loop-of-Henle fact every 2.8 seconds; the correct option (the thick ascending limb) is always option B and the source citation is fixed; only the wording of the question changes. The implementation is in src/components/RephraseCarousel.tsx, lines 5 to 39 for the three stem variants, line 60 for the setInterval that drives the rotation.

Does ChatGPT solve this if I ask it to reword the question on each revisit?

It can produce a rewording. What it cannot do is hold a stable card ID, a stable answer key, a stable source citation, and an FSRS schedule across rewordings, then guarantee the rewrite is still testing the same concept and not a similar-sounding one. The first three are scheduling problems; the fourth is a quality problem. Studyly scored 81.3 on a held-out three-document eval (factual correctness, stem clarity, distractor plausibility, question-type coverage). Turbolearn scored 57.8 on the same documents and rubric. The eval is the rubric a rewording has to pass on every revisit, not just on first generation.

Will this work for exam boards like USMLE, NCLEX, MCAT?

Yes, and it fits those exams particularly well because boards reuse the same underlying facts across many surface forms (declarative, vignette, negative-form, image-based). A drill loop where the stem rotates the way the exam rotates is the closest practice condition to the test condition. The case-style generator runs on every fact slide, so a 90-slide cardiology deck produces roughly fifty case-style stems alongside the MCQs and free-response cards, and any of them can resurface in a rotated wording.

What is the working daily interval for this to actually flatten the curve?

Five to ten minutes a day for eight to twelve weeks beats four hours a day for one week and forgetting in four. The FSRS-style schedule handles the exact intervals (a card you got right last week resurfaces in a few days, a card you missed resurfaces in a day, a well-rooted card pushes out to weeks then months). The variable you control is consistency, not interval math. Each deck grows a tree, decks chain into a river, and weekly leagues run on the side. Those mechanics exist because the failure mode of spaced repetition over an eight-to-twelve-week horizon is not the schedule; it is the human quitting in week three.

Drop a lecture deck and watch the rotation

Upload a lecture PPTX or PDF, a textbook chapter, or a YouTube lecture URL. About 60 seconds later you have around 200 questions across four formats per fact slide, each card auto-rephrasing on every revisit so the retest stays a retrieval. Free tier on app.jungleai.com, no card required.