USMLE Step 1 · question generators

Two tools share the name "Step 1 question generator." Only one shows its work.

Look up "USMLE Step 1 question generator" and the first things you find are saved ChatGPT and Claude prompts. They take a topic name, no source file, and emit a fluent vignette with a confidently labelled answer. Nothing in that loop ever checks the answer. It is whatever the model decided.

That is one kind of generator. The other kind takes a real document, your professor's slide deck, a First Aid chapter, a textbook PDF, and builds questions whose answer keys are verified against that file. This page is about the difference, why it matters more for Step 1 than for almost any other exam, and how to generate questions you can actually trust.

See the two kinds →

Matthew Diakonov, Written with AI

Published May 15, 20269 min read

Direct answer · verified 2026-05-15

Yes, tools generate Step 1 questions. The one that matters is the one with a source document.

"USMLE Step 1 question generator" names two opposite tools. An ungrounded prompt generator writes questions from a topic name with no source, so its answer key is unverifiable: the model can be wrong and cannot tell you when it is. A document-grounded generator builds questions from a file you upload and traces every answer back to it. Studyly is the grounded kind. On a held-out three-document eval it scored 81.3, ahead of Unattle 78.0, Gauntlet 68.0, and Turbolearn 57.8. You can confirm the ungrounded pattern in the prompt behind the most prominent hit at docsbot.ai, and the eval methodology on the quality page.

The word "generator" is hiding two different tools

Most guides on this topic treat "Step 1 question generator" as one category and then list options inside it. That framing is the mistake. The options do not differ by polish or price. They differ by something structural: whether the tool has a source document to build from.

An ungrounded prompt generator is a saved instruction you paste into ChatGPT or Claude. You give it a topic. It gives you questions. There is no file involved, so the model has to invent both the stem and the answer from whatever it absorbed in training. The output looks like a real NBME item. Whether it is correct is a separate question that the tool never answers.

A document-grounded generatorstarts from a file: a lecture deck, a First Aid chapter, a chapter PDF, a set of notes. It extracts the testable facts from that file and writes questions anchored to them. The answer is not the model's opinion; it is what the source says. When you miss a question, the tool can hand you back the exact slide or page the fact came from. That single property, a source you can return to, is the whole difference.

What the most prominent "Step 1 question generator" actually is

This is not a strawman. Below is the literal instruction behind the generator that comes up first when you look this up, a saved prompt published on docsbot.ai. It is a clean example of the ungrounded pattern, and reading it once tells you exactly what the category can and cannot do.

ungrounded-step-1-prompt.txt

There is nothing wrong with the prompt as a prompt. It is well formed, it asks for four options and an explanation, it covers the right subjects. The problem is upstream of the wording: the only input is a topic name. With no source document, the model fills the stem and the key from training data, and no later step compares that key to anything. The generator cannot be more reliable than the model's memory of pathology, and it has no mechanism to flag the moments when that memory is wrong.

Why this matters more for Step 1 than for most exams

For a low-stakes quiz, an occasional wrong answer key is an annoyance. For Step 1 it is a specific kind of damage. The whole point of practice questions is repetition: you drill a fact until it is automatic. If the fact you are drilling is wrong, repetition does not protect you, it cements the error. You will recall it fast and confidently on exam day, and it will be wrong.

An ungrounded generator is most dangerous exactly where you need it most: the dense, detail-heavy corners of pharmacology, biochemistry, and microbiology where you are not sure of the answer yourself. If you already knew the material cold, you would catch the bad key. You use a generator because you do not, which means you are the least equipped person to audit its output. The tool that most needs a verification step is the one being used by the person least able to supply one manually.

So the real question to ask a Step 1 question generator is not "does it write good-looking vignettes." They all do. The question is "when this answer key is wrong, what catches it?" For an ungrounded prompt the honest answer is nothing. For a grounded generator the answer is the source document and the rubric, which is what the rest of this page is about.

The four checks a Step 1 question should pass before you drill it

A question is worth your time only if it survives four checks. These are the criteria Studyly scores every generated item on, and they double as a checklist for auditing any generator's output, including an ungrounded one.

Trust checklist · per question

The correct answer traces to a specific source you can open, not the model's memory
The stem reads cleanly on the first pass, with no re-reading to find what is being asked
The wrong answers are plausible and similar in length, not obvious throwaway options
The set mixes recall, application, and case-based items instead of all one shape

The first item is the one an ungrounded generator cannot satisfy by construction. Items two through four are about craft and can, with effort, be hit by a good prompt. Item one is about provenance, and provenance is not something a prompt can add after the fact. Either the question was built from a source you can open, or it was not.

Ungrounded prompt generator vs document-grounded generator

The same comparison, line by line. The left column is the saved-prompt category. The right column is a generator that starts from a file you upload.

Feature	Ungrounded prompt generator	Document-grounded generator
Source material	A topic name only	A file you upload: slides, PDF, First Aid chapter, notes
Answer-key check	None; the key is whatever the model emits	Verified against the actual source document
When you miss a question	No source to return to	Explain panel cites the slide or page the fact came from
Distractor calibration	Uncalibrated, often an obvious giveaway	Scored by a four-criterion rubric before it reaches you
Coverage	Generic blueprint topics	Exactly what your own source material covers
On revisit	Same wording every time	Rephrased so you cannot pattern-match the answer
Measured quality	Untested	81.3 on a held-out three-document eval

The rows are not a feature wishlist. Every one of them follows from the single structural fact at the top: whether a source document exists. The answer-key check, the explain panel, the coverage profile, the measured eval score, all of it is downstream of having a file to anchor to.

Anchor fact · the held-out three-document eval

81.3 on questions graded against the source, not the model's memory

Three source documents (a slide deck, a textbook chapter, a paper) were held out. Each tool generated questions from the same three documents. Every output was graded on the same four criteria: factual correctness, stem clarity, distractor quality, and question-type coverage. The factual-correctness criterion is the load-bearing one. In Studyly's own words, every answer is verified "against the actual PDF / slide content, not the model's pretrained knowledge." That is the check an ungrounded prompt has no way to run.

Studyly0

Unattle0

Gauntlet0

Turbolearn0

Higher is better. Field average across the four tools is 67.9. The full methodology and the per-document sub-scores are on the quality page. An ungrounded prompt is not on this board because it has no source document to be graded against; there is nothing to check factual correctness over.

How to generate Step 1 questions you can trust

The workflow is short. Pick the source document closest to what you actually need to know. For school block exams, that is the professor's lecture deck for the block. For the boards-wide blueprint, it is a First Aid section or a chapter PDF. For a topic you keep missing in UWorld, it is your own typed notes on that topic.

Upload it. Studyly converts a roughly 90-slide deck into about 60 to 80 questions in around 60 seconds, in multiple-choice, free-response, and case-style formats, plus image-occlusion cards on any labelled figures. Drill in short five-minute passes rather than one long sitting. When you miss an item, open the cited slide or page instead of arguing with the explanation, because the explanation points at a document you can read.

On revisit, the same fact comes back rephrased: a different demographic, a different presenting detail, a reordered option list. That matters because the failure mode of any question you have seen before is recognizing the wording instead of recalling the fact. An ungrounded prompt hands you the identical stem every time, which lets you pattern-match your way to a streak that does not survive contact with the real exam. The deeper write-up on that rephrase loop is in the vignette drill guide.

When an ungrounded prompt is fine, and when nothing beats UWorld

The honest version of this argument has to admit two things. First, an ungrounded prompt generator is genuinely fine for warm-up volume and for testing whether you can recall a topic at all. If you ask ChatGPT for ten quick glycolysis questions and you miss six of them, you have learned something useful about your prep regardless of whether all ten keys were perfect. Use it as a thermometer, not as a textbook.

Second, a generator of either kind is not a replacement for UWorld or AMBOSS. Those banks have the strongest item editors in the field: calibrated distractors, physician-written explanations, and items reviewed against the published content outline. For first-pass and second-pass timed-block work they are the standard, and they should be. A grounded generator does not compete with them on item-writing polish.

Where a grounded generator wins is the territory the commercial banks structurally cannot cover: your school's specific lectures, the notes only you wrote, and the situation where you have already exhausted the finite commercial banks and need fresh items on your own material. Use UWorld and AMBOSS for the blueprint and the timed format. Use a document-grounded generator for everything your own documents cover and the banks do not. Skip the ungrounded prompt for anything you intend to actually memorize.

Try it on a real document

Upload one lecture deck. See questions traced back to the slide.

Free tier on app.jungleai.com, no card. A roughly 90-slide deck converts into about 60 to 80 questions in around 60 seconds. Works on PowerPoint, PDF, Keynote, scanned slides, First Aid chapters, textbook chapters, and YouTube lectures.

Common questions about USMLE Step 1 question generators

Is there an AI tool that generates USMLE Step 1 questions?

Yes, several. But the term covers two opposite categories of tool. The first is the ungrounded prompt generator: a saved ChatGPT or Claude prompt that says, in effect, 'write me Step 1 questions on pharmacology.' It has no source file, so it produces a stem and an answer key entirely from the model's training data. The second is the document-grounded generator: you upload a real file (your professor's slide deck, a First Aid chapter, a textbook PDF), and it produces questions whose answer keys are checked against that file. Studyly is the second kind. The practical difference is whether you can open the document the answer came from when you get a question wrong.

Are the ChatGPT or Claude 'Step 1 question generator' prompts safe to study from?

They are useful for warm-up volume and dangerous for high-stakes recall. An ungrounded prompt cannot tell you when it is wrong. It will write a fluent, NBME-shaped vignette with a confidently labelled correct answer, and that answer is whatever the model emitted, not something checked against a reference. For Step 1, where you are trying to lock in facts you will be tested on once, studying from a question whose key you cannot verify means you can memorize an error and never find out. If you use an ungrounded prompt, treat every answer as a claim to confirm against First Aid or UWorld, not as ground truth.

I have finished UWorld and AMBOSS. Can a generator give me more Step 1 questions?

Yes, and this is one of the strongest honest use cases for a document-grounded generator. UWorld and AMBOSS are finite. Once you have seen every item once or twice, the marginal value of a third pass drops because you are recognizing items rather than reasoning through them. A grounded generator does not solve that with a bigger generic bank; it solves it by making questions from material the commercial banks never touched: your school's lecture decks, your own annotated notes, a First Aid section you keep missing. The questions are new because the source is yours, and the answer keys trace back to a document you can open.

My school's block exams do not match UWorld. Can I generate questions from my lecture slides?

That is the exact gap a lecture-grounded generator fills. UWorld is written against the published Step 1 content outline, not against what your professor emphasized this block. Upload the PowerPoint or PDF for a lecture and Studyly converts it into roughly 60 to 80 questions in about 60 seconds, anchored to specific slide numbers. When you miss one, the explain panel points you back to the slide you originally read. Use UWorld for the field-wide blueprint; use a grounded generator on your own decks for what your school will actually put on the block exam.

Can I turn a First Aid chapter or a textbook PDF into Step 1 questions?

Yes. A document-grounded generator does not care whether the source is a slide deck, a First Aid chapter photographed page by page, a textbook chapter PDF, a typed set of notes, or a YouTube lecture. It cares that there is a testable fact in the source to anchor a question to. The shorter the source, the fewer questions per upload, but the loop is identical: upload, get questions in about a minute, drill in short sessions, and trace any miss back to the page it came from.

Does a generated question count as 'new format' Step 1 practice?

The item content is the same. Effective May 14, 2026, Step 1 is fourteen 30-minute blocks of 20 questions each instead of seven 60-minute blocks of 40, but a clinical vignette is still a clinical vignette. A generated question drills the same cognitive task: read the stem, recover the fact, pick the option. What a generator does not replicate is the timed-block container, so use a generator for between-block drilling and content coverage, and run real 20-item, 30-minute timed blocks in a commercial QBank to rehearse the format itself.

How is a document-grounded generator different from a question bank like UWorld?

A question bank is a fixed library of items written and reviewed by full-time editors. Its strength is item quality: calibrated distractors, physician-written explanations, a known blueprint. Its limit is that it is the same library for every student in the country. A grounded generator is not a library; it is a function that turns whatever document you give it into questions on demand. Its strength is coverage of your specific material and unlimited fresh items. Its limit is that the output is machine-written, so quality depends on the grounding and the rubric behind it. The two are complements, not substitutes.

What stops a generated Step 1 question from having a wrong answer key?

Grounding plus a rubric. Grounding means the answer is verified against the source document, in Studyly's words 'against the actual PDF / slide content, not the model's pretrained knowledge.' The rubric means every generated item is scored on four criteria before it reaches you: factual correctness, stem clarity, distractor quality, and question-type coverage. That rubric is what produced the 81.3 score on a held-out three-document eval, ahead of Unattle at 78.0, Gauntlet at 68.0, and Turbolearn at 57.8. An ungrounded prompt has neither a source to check against nor a rubric gate, which is why its keys cannot be trusted.

Related on this site: USMLE vignette drill from your lectures (the slide-to-clinical-stem transformation behind the questions), and the May 14, 2026 Step 1 format change (how the fourteen 30-minute blocks change how you should drill).