AI Graded a 21-Page Test Script in Under 2 Minutes: FirstMarker Phase 1 Results from a Barbados Primary School

FirstMarker Team · February 23, 2026

AI gradingAI marking softwareAI in educationteacher productivity toolsexam marking softwareprimary school assessmentBSSSE preparationBarbados education

FirstMarker Phase 1 Results

Real-World Classroom Trial – Barbados Primary School

Over the past week, I conducted Phase 1 testing of FirstMarker at a primary school in Barbados.

The objective was simple:

Can AI meaningfully reduce grading time while maintaining acceptable accuracy in a real classroom setting?

This was not a lab simulation.
This was a live test using actual student scripts.

📍 Context

Location: Primary school in Barbados
Subject: Language Arts
Script length: 21 pages
Format: Booklet-style test paper
Validation method: Multiple AI grading passes for consistency

This represents the first structured validation phase of FirstMarker in a school environment.

📱 Scanning Workflow: 21 Pages in Under 2 Minutes

Before grading begins, scripts must be digitized.

For this trial, scripts were scanned using vFlat, a mobile document scanning app optimized for books and bound materials.

Using vFlat:

A 21-page test booklet was scanned in under 2 minutes
Pages were automatically flattened and cropped
Bound script curvature was corrected
Image clarity was sufficient for AI analysis

This scanning speed makes high-volume grading feasible.

⏱ Grading Speed: Under 2 Minutes

Once scanned, FirstMarker processed the full 21-page script in under 2 minutes.

For comparison:

Manual grading typically takes 15–25 minutes per script
Entering marks into spreadsheets adds additional time
Fatigue can reduce grading consistency over large batches

FirstMarker completed:

Text extraction
Rubric comparison
Score calculation
Structured feedback generation

All in under 120 seconds.

🧠 Detailed Feedback for Every Script

Each script did not simply receive a score.

FirstMarker generated:

Structured strengths
Areas for improvement
Question-level feedback
Clear breakdown aligned to the rubric

This is important.

The system does not only calculate marks — it produces actionable feedback for students.

That significantly increases its value during revision periods.

📊 Consistency Results (Language Arts – No Diagrams)

The same scripts were graded multiple times to test reliability.

Findings:

Most results were identical across runs
Some varied by ±1–2 marks
Larger variations were rare

This suggests:

Stable rubric interpretation
Consistent scoring logic
Minor variation primarily linked to handwriting recognition

For text-heavy Language Arts papers without diagrams, internal consistency was strong.

✍️ Limitation Identified: Poor Handwriting

The primary weakness observed was poor handwriting.

When handwriting was:

Clear → scores were highly consistent
Moderately messy → minor variation
Very difficult to read → occasional misinterpretation

This is not a grading logic issue.

It is a vision/OCR limitation.

Language Arts responses are sensitive to word-level interpretation.
A single misread word can affect scoring.

This limitation must be transparently acknowledged.

⚖️ Important Positioning

FirstMarker should not be used as the final authority for official examination marks.

Instead, it is best positioned as:

A first-pass grading assistant
A consistency checker
A moderation support tool
A formative assessment accelerator

The teacher remains the final decision-maker.

Professional judgement is preserved.

🎯 Best Use Case: Exam Preparation (e.g., BSSSE)

FirstMarker is particularly well suited for:

Exam preparation practice (e.g., BSSSE preparation)
Mock examinations
Large revision batches
Rapid feedback cycles

During exam season, teachers may correct hundreds of pages in a short timeframe.
Reducing grading time from 20 minutes to 2 minutes per script is transformational.

📌 Phase 1 Summary

✅ Conducted at a primary school in Barbados
✅ 21 pages scanned in under 2 minutes
✅ 21 pages graded in under 2 minutes
✅ Detailed feedback returned for every script
✅ Strong internal consistency for text-based Language Arts
⚠️ Performance impacted by poor handwriting
❌ Not intended to replace final teacher judgement

What Comes Next

Phase 1 focused on speed and internal consistency.

Next phases will include:

Direct comparison against teacher-assigned marks
Accuracy measurement (mean absolute error, ±1 tolerance rates)
Expanded subject testing
Handwriting robustness improvements

Final Reflection

FirstMarker is not about replacing teachers.

It is about reducing administrative burden so teachers can focus on instruction, intervention, and student growth.

Phase 1 demonstrates that AI-assisted grading can be:

Fast
Structured
Consistent
Practically deployable

Further validation is ongoing.

More results soon.