
AI Graded a 21-Page Test Script in Under 2 Minutes: FirstMarker Phase 1 Results from a Barbados Primary School
FirstMarker Team ยท February 23, 2026
FirstMarker Phase 1 Results
Real-World Classroom Trial โ Barbados Primary School
Over the past week, I conducted Phase 1 testing of FirstMarker at a primary school in Barbados.
The objective was simple:
Can AI meaningfully reduce grading time while maintaining acceptable accuracy in a real classroom setting?
This was not a lab simulation.
This was a live test using actual student scripts.
๐ Context
- Location: Primary school in Barbados
- Subject: Language Arts
- Script length: 21 pages
- Format: Booklet-style test paper
- Validation method: Multiple AI grading passes for consistency
This represents the first structured validation phase of FirstMarker in a school environment.
๐ฑ Scanning Workflow: 21 Pages in Under 2 Minutes
Before grading begins, scripts must be digitized.
For this trial, scripts were scanned using vFlat, a mobile document scanning app optimized for books and bound materials.
Using vFlat:
- A 21-page test booklet was scanned in under 2 minutes
- Pages were automatically flattened and cropped
- Bound script curvature was corrected
- Image clarity was sufficient for AI analysis
This scanning speed makes high-volume grading feasible.
โฑ Grading Speed: Under 2 Minutes
Once scanned, FirstMarker processed the full 21-page script in under 2 minutes.
For comparison:
- Manual grading typically takes 15โ25 minutes per script
- Entering marks into spreadsheets adds additional time
- Fatigue can reduce grading consistency over large batches
FirstMarker completed:
- Text extraction
- Rubric comparison
- Score calculation
- Structured feedback generation
All in under 120 seconds.
๐ง Detailed Feedback for Every Script
Each script did not simply receive a score.
FirstMarker generated:
- Structured strengths
- Areas for improvement
- Question-level feedback
- Clear breakdown aligned to the rubric
This is important.
The system does not only calculate marks โ it produces actionable feedback for students.
That significantly increases its value during revision periods.
๐ Consistency Results (Language Arts โ No Diagrams)
The same scripts were graded multiple times to test reliability.
Findings:
- Most results were identical across runs
- Some varied by ยฑ1โ2 marks
- Larger variations were rare
This suggests:
- Stable rubric interpretation
- Consistent scoring logic
- Minor variation primarily linked to handwriting recognition
For text-heavy Language Arts papers without diagrams, internal consistency was strong.
โ๏ธ Limitation Identified: Poor Handwriting
The primary weakness observed was poor handwriting.
When handwriting was:
- Clear โ scores were highly consistent
- Moderately messy โ minor variation
- Very difficult to read โ occasional misinterpretation
This is not a grading logic issue.
It is a vision/OCR limitation.
Language Arts responses are sensitive to word-level interpretation.
A single misread word can affect scoring.
This limitation must be transparently acknowledged.
โ๏ธ Important Positioning
FirstMarker should not be used as the final authority for official examination marks.
Instead, it is best positioned as:
- A first-pass grading assistant
- A consistency checker
- A moderation support tool
- A formative assessment accelerator
The teacher remains the final decision-maker.
Professional judgement is preserved.
๐ฏ Best Use Case: Exam Preparation (e.g., BSSSE)
FirstMarker is particularly well suited for:
- Exam preparation practice (e.g., BSSSE preparation)
- Mock examinations
- Large revision batches
- Rapid feedback cycles
During exam season, teachers may correct hundreds of pages in a short timeframe.
Reducing grading time from 20 minutes to 2 minutes per script is transformational.
๐ Phase 1 Summary
- โ Conducted at a primary school in Barbados
- โ 21 pages scanned in under 2 minutes
- โ 21 pages graded in under 2 minutes
- โ Detailed feedback returned for every script
- โ Strong internal consistency for text-based Language Arts
- โ ๏ธ Performance impacted by poor handwriting
- โ Not intended to replace final teacher judgement
What Comes Next
Phase 1 focused on speed and internal consistency.
Next phases will include:
- Direct comparison against teacher-assigned marks
- Accuracy measurement (mean absolute error, ยฑ1 tolerance rates)
- Expanded subject testing
- Handwriting robustness improvements
Final Reflection
FirstMarker is not about replacing teachers.
It is about reducing administrative burden so teachers can focus on instruction, intervention, and student growth.
Phase 1 demonstrates that AI-assisted grading can be:
- Fast
- Structured
- Consistent
- Practically deployable
Further validation is ongoing.
More results soon.