Issue #266

Why AI Can Ace the Bar Exam but Can't Read a Clock

The smartest tool you own has a toddler-sized blind spot — and knowing where it lives is the difference between using AI and getting burned by it.

By Jerry Croteau

ai-capabilities jagged-frontier productivity ai-workflow

Why AI Can Ace the Bar Exam but Can't Read a Clock — AI clock-reading accuracy 13% vs humans 89%

Show a top AI model a photo of an ordinary wall clock and ask it the time. There's a good chance it gets it wrong.

Not a trick clock. Not Roman numerals or a melting Dalí face. Just a clock. On the ClockBench benchmark, humans read analog clocks with 89% accuracy. The best AI model managed 13%. A University of Edinburgh study presented at a 2025 research conference found the leading models — GPT-4o, Gemini, Claude — placed the clock hands correctly less than a quarter of the time, and botched calendar-date math once every five tries.

Here's the part that should make you sit up: those exact same models pass the bar exam, draft medical diagnoses doctors agree with, and write production code. They can do the hard thing and fail the easy thing — in the same breath.

The mountain range, not the ladder

We tend to imagine AI climbing a ladder: a little smarter every month, rung by rung, until it passes us. That's the wrong picture.

Wharton professor Ethan Mollick and a team from Harvard, MIT, and BCG gave it a better one: the jagged frontier. AI capability isn't a smooth line — it's a mountain range. Towering, superhuman peaks sit right next to valleys where the model fumbles things a six-year-old handles. Two tasks that look equally hard to you can land on opposite sides of that ridge. The difficulty you perceive tells you almost nothing about whether AI will nail it or face-plant.

The jagged frontier of AI

Tasks that feel equally hard to you land on opposite sides of the ridge

what feels "equally hard" to a humanbar examwrite codediagnoseread a clockcount letterscalendar math● peaks: AI is superhuman● valleys: a six-year-old wins

Why the easy stuff breaks

The failures aren't random, and they're weirdly reassuring once you understand them.

Take the clock. When one writer fed clock images to Claude and watched it fail, she tried something clever: she described where the hands were pointing in words. The model calculated the time instantly, no problem. The reasoning was never broken — the breakdown happened upstream, in how the model sees an image and turns angles into something it can think about. Spatial perception, not intelligence.

The famous "how many R's in strawberry" stumble is the same story from a different angle. The model doesn't read letters — it reads chunks of text called tokens, so "counting the letters" is asking it to see something it was never looking at. Not stupidity. A blind spot with a known cause.

That's the reframe: these aren't signs AI is overhyped. They're a map. Every documented valley is a place you now know to keep a human in the loop — and every peak is a place you can lean in hard.

Where it's genuinely superhuman

And the peaks are real. In BCG's large field experiment, consultants using AI on tasks inside the frontier finished 12% more tasks, 25% faster, with 40% higher quality. Developers using AI coding assistants have posted productivity jumps north of 50%. On knowledge work that lives on the peaks — drafting, summarizing, structured analysis, first-pass code, brainstorming, translation — the gap runs the other way: the machine laps you.

The trap isn't that AI is weak. It's that the peaks are so impressive you start trusting the valleys too.

The Jagged Frontier infographic — the valleys where AI fails and the peaks where it wins, and why

How to actually use this

You don't need to memorize a list of what AI can and can't do — the frontier shifts every few months anyway. You need a habit: before you delegate a task, ask which side of the ridge it's on.

A quick rule of thumb: AI is strong when the task is about language, patterns, and synthesis (turn this into that, find the through-line, draft a version). It gets shaky when the task needs precise perception, exact counting, real-world spatial sense, or live-updating truth (read this gauge, count these exactly, what's true right now). When you're not sure, give it the task and a way to show its work — then spot-check the valleys.

To make that concrete for your job, we built an interview-style prompt. It asks about your actual role and the tasks you do each week, then hands you back a personalized map: green-light work to delegate now, red-light work to keep human, and the yellow-light tasks worth testing with a safety net.

Your job	Hand to AI now	Keep human	The surprise
Marketing manager	First-draft copy, A/B variants, repurposing one post into ten	Reading live campaign dashboards off a screenshot	It writes the ad but miscounts the metrics in the chart
Paralegal	Summarizing case law, drafting clauses, spotting inconsistencies	Calculating filing deadlines from a date	Brilliant on the brief, shaky on 'what day is 45 days out'
Nurse / clinician	Drafting patient-education notes, summarizing research	Reading an analog wall clock or a gauge from a photo	Explains the condition expertly, can't reliably read the dial
Software developer	First-pass code, refactors, test scaffolding, docs	Counting exact characters / column alignment by eye	Ships the function, then miscounts the brackets it just wrote

Same prompt. YOUR situation. Try it.

The next time AI dazzles you, remember the clock. The most useful thing you can know about a genius is exactly where it's blind — because that's the spot where you are still the expert.