โ† Back to blog

AI Graded a 21-Page Test Script in Under 2 Minutes: FirstMarker Phase 1 Results from a Barbados Primary School

FirstMarker Team ยท February 23, 2026

AI gradingAI marking softwareAI in educationteacher productivity toolsexam marking softwareprimary school assessmentBSSSE preparationBarbados education

FirstMarker Phase 1 Results

Real-World Classroom Trial โ€“ Barbados Primary School

Over the past week, I conducted Phase 1 testing of FirstMarker at a primary school in Barbados.

The objective was simple:

Can AI meaningfully reduce grading time while maintaining acceptable accuracy in a real classroom setting?

This was not a lab simulation.
This was a live test using actual student scripts.


๐Ÿ“ Context

  • Location: Primary school in Barbados
  • Subject: Language Arts
  • Script length: 21 pages
  • Format: Booklet-style test paper
  • Validation method: Multiple AI grading passes for consistency

This represents the first structured validation phase of FirstMarker in a school environment.


๐Ÿ“ฑ Scanning Workflow: 21 Pages in Under 2 Minutes

Before grading begins, scripts must be digitized.

For this trial, scripts were scanned using vFlat, a mobile document scanning app optimized for books and bound materials.

Using vFlat:

  • A 21-page test booklet was scanned in under 2 minutes
  • Pages were automatically flattened and cropped
  • Bound script curvature was corrected
  • Image clarity was sufficient for AI analysis

This scanning speed makes high-volume grading feasible.


โฑ Grading Speed: Under 2 Minutes

Once scanned, FirstMarker processed the full 21-page script in under 2 minutes.

For comparison:

  • Manual grading typically takes 15โ€“25 minutes per script
  • Entering marks into spreadsheets adds additional time
  • Fatigue can reduce grading consistency over large batches

FirstMarker completed:

  • Text extraction
  • Rubric comparison
  • Score calculation
  • Structured feedback generation

All in under 120 seconds.


๐Ÿง  Detailed Feedback for Every Script

Each script did not simply receive a score.

FirstMarker generated:

  • Structured strengths
  • Areas for improvement
  • Question-level feedback
  • Clear breakdown aligned to the rubric

This is important.

The system does not only calculate marks โ€” it produces actionable feedback for students.

That significantly increases its value during revision periods.


๐Ÿ“Š Consistency Results (Language Arts โ€“ No Diagrams)

The same scripts were graded multiple times to test reliability.

Findings:

  • Most results were identical across runs
  • Some varied by ยฑ1โ€“2 marks
  • Larger variations were rare

This suggests:

  • Stable rubric interpretation
  • Consistent scoring logic
  • Minor variation primarily linked to handwriting recognition

For text-heavy Language Arts papers without diagrams, internal consistency was strong.


โœ๏ธ Limitation Identified: Poor Handwriting

The primary weakness observed was poor handwriting.

When handwriting was:

  • Clear โ†’ scores were highly consistent
  • Moderately messy โ†’ minor variation
  • Very difficult to read โ†’ occasional misinterpretation

This is not a grading logic issue.

It is a vision/OCR limitation.

Language Arts responses are sensitive to word-level interpretation.
A single misread word can affect scoring.

This limitation must be transparently acknowledged.


โš–๏ธ Important Positioning

FirstMarker should not be used as the final authority for official examination marks.

Instead, it is best positioned as:

  • A first-pass grading assistant
  • A consistency checker
  • A moderation support tool
  • A formative assessment accelerator

The teacher remains the final decision-maker.

Professional judgement is preserved.


๐ŸŽฏ Best Use Case: Exam Preparation (e.g., BSSSE)

FirstMarker is particularly well suited for:

  • Exam preparation practice (e.g., BSSSE preparation)
  • Mock examinations
  • Large revision batches
  • Rapid feedback cycles

During exam season, teachers may correct hundreds of pages in a short timeframe.
Reducing grading time from 20 minutes to 2 minutes per script is transformational.


๐Ÿ“Œ Phase 1 Summary

  • โœ… Conducted at a primary school in Barbados
  • โœ… 21 pages scanned in under 2 minutes
  • โœ… 21 pages graded in under 2 minutes
  • โœ… Detailed feedback returned for every script
  • โœ… Strong internal consistency for text-based Language Arts
  • โš ๏ธ Performance impacted by poor handwriting
  • โŒ Not intended to replace final teacher judgement

What Comes Next

Phase 1 focused on speed and internal consistency.

Next phases will include:

  • Direct comparison against teacher-assigned marks
  • Accuracy measurement (mean absolute error, ยฑ1 tolerance rates)
  • Expanded subject testing
  • Handwriting robustness improvements

Final Reflection

FirstMarker is not about replacing teachers.

It is about reducing administrative burden so teachers can focus on instruction, intervention, and student growth.

Phase 1 demonstrates that AI-assisted grading can be:

  • Fast
  • Structured
  • Consistent
  • Practically deployable

Further validation is ongoing.

More results soon.