Guide · question quality
NotebookLM quiz question quality: what the source-grounding fixes, and what it cannot.
NotebookLM's quiz feature is good at one thing and capped on another, and the two get blurred together every time someone reviews it. The facts in the questions are reliable. The format of the questions is not deep. This page separates those two, grades the quiz on the four things that actually decide whether a practice question helps you on exam day, and is honest about where a stronger tool changes the answer and where it does not.
Direct answer · verified May 17, 2026
NotebookLM quiz questions are factually reliable because they are grounded only in the sources you upload, and a wrong answer cites the exact line it came from. The ceiling is format: the quiz feature produces multiple-choice questions only, so it tests whether you can recognize a fact among four options, not whether you can apply it inside a scenario. For memorizing terms, that is enough. For an exam that is mostly application, recognition practice leaves the harder half untrained.
Verified against Google's official NotebookLM Help page on generating flashcards and quizzes and the Google announcement of the quiz feature.
What the quiz feature actually generates
NotebookLM added flashcards and quizzes on September 8, 2025, and brought them to the Android and iOS apps in November 2025. Before grading the output it helps to be exact about what the feature does, because most of it is genuinely well built.
What NotebookLM's quiz does well
- Grounds every question only in the sources you upload, not in the model's general knowledge of the topic.
- Cites the source on a wrong answer: click explain and it walks through why the keyed answer is correct, with a pointer back to your material.
- Lets you set difficulty to easy, medium, or hard, set the question count, and type a prompt to focus the quiz on one topic.
- Saves progress, marks cards got it or missed it, and lets you replay only what you missed (added in the April 2026 update).
None of that is filler. Source-grounding is the structural fix for the most common failure in AI question generators, a question whose keyed answer is quietly wrong because the model answered it from a textbook instead of from your slide. NotebookLM closes that. The free tier caps you at 10 quizzes per day, which is the limit a student converting a full day of lectures hits first, but on correctness the feature earns its reputation.
Grade it on the four things that decide a practice question
A practice question is good or bad on four axes: factual correctness of the keyed answer, clarity of the stem, plausibility of the distractors, and the mix of question types across the whole set. NotebookLM's quiz does not score evenly across them.
Factual correctness — strong
Because the answer is keyed from your uploaded source and the explanation cites it, the keyed answer matches what you actually study from. This is the dimension you cannot self-grade, and it is the one NotebookLM handles best.
Clarity — strong
Fluent, readable stems are the thing language models are genuinely good at. NotebookLM's questions rarely read as ambiguous or double-barreled.
Distractor quality — mixed
The wrong options are usually on-topic, but on recall-style questions one distractor is often eliminable on length or phrasing alone. A distractor only trains you if it is a mistake you could plausibly make. Setting difficulty to hard helps here; it does not fully close it.
Question-type coverage — weak
This is the real ceiling. The quiz feature produces multiple-choice questions and nothing else. There is no free-response, no clinical-vignette-style case question, no image-occlusion card. Every question is the same cognitive task: recognize the right option among four.
The recall ceiling: recognition is not application
A multiple-choice recognition question and an application question are different tasks. The first hands you the answer and four decoys and asks you to pick. The second hands you a situation and asks you to produce the fact yourself, under conditions the question writer chose. Most exams in memorization-heavy programs lean on the second. Toggle the panel to see the same fact tested both ways.
One fact, two kinds of question
Which heart chamber receives oxygenated blood from the pulmonary veins? A) Right atrium B) Right ventricle C) Left atrium D) Left ventricle. You scan four options, eliminate the two right-side ones, and pick C. You never had to retrieve the fact cold.
- Answer is one of four options already in front of you
- Distractors can often be eliminated without knowing the fact
- Trains recognition, which fades faster than retrieval
NotebookLM's quiz feature can only ever show you the left side of that toggle. Turning difficulty up to hard makes the multiple-choice question harder; it does not move it to the right side. If your exam is mostly the right side, a quiz tool that only produces the left side is training the easier half of the skill.
Shuffle is not rephrasing
NotebookLM's April 23, 2026 update was a real improvement: progress saving, got it and missed it marking, replay only the missed questions, and card shuffling. Shuffle reorders the cards so you cannot memorize that the third question's answer is B. That defeats positional pattern-matching, and it is worth having.
What shuffle does not touch is the wording of the question itself. The stem on retake three is the same sentence it was on retake one. After two or three passes you stop answering from the anatomy and start answering from the phrasing: the question that opens with “Which heart chamber receives” is the pulmonary-veins one, answer C, and you know that without re-deriving anything. A retake that does not change the stem quietly converts into a memory test for the question, not the material, and that is the moment a quiz stops teaching you.
This is a different problem from a wrong answer key, and it is specific to how you reuse a quiz rather than how it is first generated. It is also the gap that decides whether the same upload is worth drilling fifteen times or only once.
Where Studyly differs, and where it does not
NotebookLM and Studyly agree on the part that matters most: questions grounded only in your own material. The differences are in format coverage and how a quiz behaves on retake.
| Feature | NotebookLM quiz | Studyly |
|---|---|---|
| Questions grounded only in your uploaded material | Yes, quizzes draw only from your sources | Yes, generated against the exact slide deck, PDF, or lecture you upload |
| Wrong-answer explanation cites the source | Yes, click explain for a cited overview | Yes, explain-my-mistake links the original slide or PDF page |
| Question formats from one source | Multiple-choice only | MCQ, free-response, case-style, and image-occlusion flashcards |
| Question stem on retake | Card shuffle only; the stem wording stays the same | Stem auto-rephrased on revisit so you cannot pattern-match the wording |
| Spaced review and habit | Progress saved, replay missed cards (April 2026) | Spaced-repetition scheduling built in, one tree per deck for the daily habit |
| Published question-quality score | No public eval | 81.3 on a held-out three-document eval |
NotebookLM's quiz feature shipped September 8, 2025; details reflect its April 23, 2026 update. Studyly's 81.3 is an internal eval run by Jungle, the company behind Studyly, not an independent audit. NotebookLM was not a tool in that eval, so no comparable score exists for it.
“Studyly's score on a held-out three-document eval graded on factual correctness, clarity, distractor quality, and question-type coverage. Unattle scored 78.0, Gauntlet 68.0, Turbolearn 57.8, field average 67.9.”
Internal eval run by Jungle, the company behind Studyly. Methodology and per-criterion scores at studyly.io/quality.
Read that table fairly. NotebookLM is a strong free quiz tool, and for memorizing terms from a clean set of sources it does the job. If your exam is mostly application, you want case-style questions; if you plan to drill the same deck repeatedly, you want a stem that changes so the retake stays a real test. Those are the two places the answer changes.
Free tier on app.jungleai.com, no credit card. Upload the same deck you would put into NotebookLM and see all four formats come out of it.
Frequently asked
Are NotebookLM's quiz questions accurate?
Mostly yes, and for a specific reason. NotebookLM generates quiz questions grounded only in the sources you upload, not from a model's general knowledge of the topic, and when you get a question wrong it shows an explanation with a citation pointing back to the line in your material. That is the single biggest cause of wrong answer keys in AI question generators, and NotebookLM closes it. The honest caveat is the one Google states itself: the quiz is only as good as the material you upload. Feed it a sloppy slide and it will key a sloppy answer, faithfully.
What question types does NotebookLM's quiz feature support?
Multiple-choice questions. That is the quiz feature. NotebookLM also has a separate Learning Guide chat mode, added in its April 2026 update, that asks probing open-ended questions in conversation, but that is a chat experience, not a gradeable quiz. If you want a free-response question, a clinical-vignette-style case question, or an image-occlusion card, the quiz feature does not produce them. It produces a stem and four options.
Can you make NotebookLM quiz questions harder?
Yes, within limits. The customization panel lets you set difficulty to easy, medium, or hard, set the question count to fewer, standard, or more, and type a prompt to focus the quiz on a topic. Setting it to hard genuinely produces tougher questions. What difficulty does not change is the format: a hard question is still a harder multiple-choice question. You are turning up the difficulty of recognition, not switching to application.
Does NotebookLM reword quiz questions when you retake them?
No. NotebookLM's April 23, 2026 update added the ability to save progress, mark cards as got it or missed it, replay only the questions you missed, and shuffle card order. Shuffling the order stops you from memorizing a position. It does not change the wording of the question stem. By the second or third retake of the same quiz you can answer from the first few words of the stem rather than from the material, which is the point at which a retake stops teaching you anything.
How many quizzes can you make on NotebookLM's free tier?
The free tier allows 10 quizzes per day. That is generous for casual use and a real constraint if you are converting a full day of lectures and want to drill each deck several times. The paid tiers raise the cap substantially. The per-day quiz count, not the per-quiz question count, is the limit most students hit first.
Is NotebookLM's quiz feature good enough for a memorization-heavy program like med school?
It is good for the memorize-the-term layer: definitions, dates, mechanisms, anything where recognition is the skill. It is weaker for the layer most clinical exams actually test, which is whether you can apply a fact inside a scenario you have not seen. A multiple-choice recognition question and a vignette-style case question are different cognitive tasks, and a quiz feature that only produces the first one leaves the second one untrained. For a program where the exam is mostly application, that gap matters.
Does Studyly's question quality beat NotebookLM's?
On the dimension you can measure, question-type coverage, Studyly produces four formats from one source where NotebookLM's quiz produces one. On overall quality, Studyly scored 81.3 on a held-out three-document eval run by Jungle, the company behind Studyly, graded on factual correctness, clarity, distractor quality, and question-type coverage. NotebookLM was not a tool in that eval, so there is no comparable number for it, and you should read 81.3 as Studyly's own measurement rather than an independent audit. The clearest, checkable difference is format coverage, not a head-to-head score.
Related reading on this site
- AI-generated practice question quality: the part you cannot grade yourself the wrong-answer-key failure, and why source-grounding is the fix.
- Quiz generator from PDF: what happens on take #2 why a stem that never changes turns a retake into a memory test.
- Auto-rephrasing practice questions the mechanism that keeps a reworded question testing the same fact.
- Anki card distractor quality: the five failure modes what separates a distractor that trains you from one that does not.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.