Thursday, February 12, 2026

Why Your Tests Passed But Your API Failed - The SerDe Boundary Problem

I recently spent several hours debugging a bug that shouldn't have existed. My test suite? 510 passing tests with solid coverage. My BDD tests? Validating every tool's contract like a bouncer at an exclusive club. Yet when I deployed to production, half my API was broken.

Tests pass - prod is on fire: This is fine

The tests lied. Well, not exactly lied—they just weren't testing what I thought they were testing.

The Bug That Passed All My Tests

I had MCP tools that accepted dataclass parameters:

@dataclass
class AzureDevOpsPRContext:
    organization: str
    project: str
    repository: str
    pr_id: int

@mcp.tool()
async def azure_devops_pr_comment_analysis(
    pr_context: Optional[AzureDevOpsPRContext] = None,
    ...
):
    org = pr_context.organization  # Works in tests, crashes in production

In my unit tests: ✅ All green, baby!
Via Copilot in production: ❌ AttributeError: 'dict' object has no attribute 'organization'

Chef's kiss.

What Was I Actually Testing?

Here's what my "comprehensive" unit test looked like:

# My unit test - so pure, so naive
async def test_pr_comment_analysis():
    context = AzureDevOpsPRContext(
        organization="msazure",
        project="Commerce",
        repository="PaymentsDataPlatform",
        pr_id=14679218
    )
    
    result = await azure_devops_pr_comment_analysis(pr_context=context)
    
    assert result  # Passes! Ship it!

This test validates:

  • ✅ Python function signature is correct
  • ✅ Internal logic works
  • ✅ Return type is valid
  • ✅ I'm a competent developer

This test does NOT validate:

  • ❌ JSON deserialization
  • ❌ The MCP protocol boundary
  • ❌ How clients actually invoke the tool
  • ❌ Literally the deployment path

unit tests: perfect. production: on fire. integration tests: conspicuously absent.

The Brutal Reality of Production

Here's what actually happens when a client calls my API:

Copilot (JSON-RPC client)

Sends: {"pr_context": {"organization": "msazure", ...}}

MCP Protocol Layer (the betrayer)

Deserializes JSON → Python dict

My Beautiful Function

Receives: pr_context = {"organization": "msazure", ...} # Not a dataclass!

Confidently executes: pr_context.organization

💥 AttributeError: 'dict' object has no attribute 'organization'

🔥 Production burns

My unit test completely bypassed this path. It handed my function a fully-formed Python dataclass instance, never once exercising the JSON serialization/deserialization that happens in the real world.

It's like testing a parachute by throwing it on the ground and confirming it's made of fabric.

Wait, What's Actually Being Tested Here?

Let me spell it out with a handy table, because I'm nothing if not a documentation enthusiast:

Test TypeWhat It TestsExercises SerDe?Catches This Bug?
Unit testPython APINoNo
Integration testFull protocolYesYes

My 100% test coverage was testing the wrong layer entirely. I had perfect coverage of my Python semantics while my protocol semantics burned in production.

You Have Yet to Convince Me, Sir!

"But Jack," I hear you saying, "surely you just wrote bad tests?"

REST, gRPC, GraphQL, and MCP all pointing at each other: same SerDe footgun, different protocol

Fair. But here's why this matters: every API that crosses a serialization boundary has this exact footgun waiting for you.

REST? Check.
gRPC? Check.
GraphQL? Check.
MCP? Check.
That custom binary protocol you swore you'd document? Definitely check.

If your tests don't exercise the actual wire protocol, they're testing a fantasy version of your system.

So, Now What?

The fix? Write tests that actually exercise the protocol:

async def test_pr_comment_analysis_via_mcp():
    """Test tool invocation through actual MCP protocol"""
    
    # This goes through the full serialization cycle
    result = await mcp_client.call_tool(
        "azure_devops_pr_comment_analysis",
        {
            "pr_context": {
                "organization": "msazure",
                "project": "Commerce",
                "repository": "PaymentsDataPlatform",
                "pr_id": 14679218
            }
        }
    )
    
    # Now we're testing what clients actually do

This test caught the bug immediately. The dataclass parameter didn't deserialize from JSON to a Python dataclass—it stayed as a dict, exactly as the protocol specified.

The Actual Fix (AKA: How I Should Have Done It)

python

# Before (broken in production, passed all tests)
@mcp.tool()
async def azure_devops_pr_comment_analysis(
    pr_context: Optional[AzureDevOpsPRContext] = None,
    ...
):
    org = pr_context.organization

# After (works everywhere, imagine that)
@mcp.tool()
async def azure_devops_pr_comment_analysis(
    pr_url_or_id: str,
    ...
):
    # Construct dataclass internally from primitive
    context = azure_devops_establish_pr_context(pr_url_or_id)
    org = context.organization

The pattern: Primitives in, complex objects out. Construct your dataclasses internally where you control the object creation, not at the API boundary where JSON deserialization is a wild west of "good luck, cowboy."

Why AI-Assisted Development Makes This Worse

When you hand-write code, you think about the deployment path. You remember that time in 2019 when serialization bit you, and you code defensively.

When AI generates code, it optimizes for passing your tests. It looks at your test suite, sees the pattern, and generates more code that follows that pattern. If your tests don't exercise the actual protocol boundary, AI will happily generate code that works beautifully in tests and catastrophically in production.

They're (not) the same picture meme: code that passes tests and code that works in production

My tests were well-written. My coverage was high. My BDD contracts were solid. My CI was green. And I had a completely broken API in production because none of it tested the actual deployment path.

My Admittedly Biased Conclusion

Here's what I learned the hard way:

  1. Unit tests validate Python semantics, not protocol semantics
  2. Integration tests must exercise the actual client→server path
  3. 100% coverage means nothing if you're testing the wrong layer
  4. SerDe boundaries are where type assumptions go to die
  5. AI-generated code amplifies this problem - it passes your tests without understanding your deployment

If you're building APIs that cross serialization boundaries—and unless you're writing a desktop calculator app, you probably are—write integration tests that actually invoke through the protocol. Your unit tests are lying to you, and they don't even feel bad about it.

Don't ask me how I know.


Friday, January 2, 2026

When AI Tooling Actually Works: Building a Browser Extension in 10 Hours


Yes, I'm writing a positive post about AI-assisted development. I know, I'm as shocked as you are. If you've read my previous posts, you know I'm not exactly bullish on enterprise AI tooling that promises to replace developers. But here's the thing: when you treat AI as a productivity tool rather than a replacement for engineering discipline, some interesting things can happen.

Case in point: I built a fully-tested, TypeScript-based browser extension from concept to Microsoft Edge Extension store submission in roughly 10 hours. And yes, Claude 4.5 (via GitHub Copilot) was instrumental in making that happen.

The Context: A Hiring Manager's Gift

Recently, I had a conversation with a hiring manager who dismissed me because I didn't have TypeScript experience. Never mind the 15+ years of software engineering across multiple languages and platforms - no TypeScript checkbox meant no interview.


So, I decided to fix that gap while solving an actual problem I'd been meaning to address: YouTube's web interface doesn't give you the "Videos" tab when viewing a channel like it does on mobile. Instead, it drops you into the "Home" tab where featured videos are displayed. That may sound just fine to you, but to me, an avid subscriber to many channels, it's rather annoying to have to click to "Videos" to see their latest work everytime. So, I finally scratched that itch.

The Workflow That Actually Worked

Here's what I did differently than the typical "AI will do everything" approach:

1. Started with BDD Test Specifications

I didn't just say "build me an extension." I started with stating my goal and asking for a design doc for me to review. The biggest mistake many make is allowing it to code off a single prompt, and then chasing the AI into seeing your vision. (The same mistake happens in many enterprise organizations with humans at the wheel, too.) I then shared my Behavior-Driven Development testing documentation framework (see my prior post on BDD). This gave the AI:

  • Clear acceptance criteria
  • Expected behaviors
  • Edge cases to consider
  • A quality bar to meet


2. Incremental Iteration

Initial implementation was JavaScript (perfectly appropriate for a browser extension). But then I asked: "Would TypeScript make sense here?"

The AI pivoted seamlessly, rebuilt the solution in TypeScript, and I got hands-on experience with the language that hiring manager wanted. Turns out, learning a new language when you have 15+ years of experience in others isn't actually that hard.

3. Verification of Claims

Here's where it gets interesting. The AI confidently reported 100% code coverage.

I checked. It was 57%.


When I called it out, it didn't argue. It worked to actually achieve 100% coverage. It got to 97% (remember my past discussion of AI laziness?), but asked me if I wanted to push for 100%. I reminded it that BDD is all about specifying what a system will do, and 97% is 3% unspecified. This is the critical difference: I verified the claims. An inexperienced developer might have just believed the initial report and moved on with inadequate test coverage.

4. Real Problem-Solving

Theory met reality when the extension didn't work in the browser. Turns out, there's non-trivial complexity in making TypeScript, Node.js tooling, and pure JavaScript browser requirements all play nicely together.

The AI helped navigate the compatibility shimming, but this required back-and-forth, debugging context, and engineering judgment about which solutions made sense.

What I Brought to the Table

This is the part that matters: the AI was a force multiplier, not a replacement. I brought:

  • Domain Knowledge: Knowing that TypeScript would be valuable, understanding browser extension architecture
  • Quality Standards: Demanding actual 100% coverage, not accepting claims at face value
  • Testing Discipline: Actually running the extension and catching failures
  • DevOps Expertise: Setting up CI/CD with linting, testing, code coverage, and proper branch protection


Without these, the project would have been:

  • Written in a language I didn't care about learning
  • Reported as having test coverage it didn't have
  • Non-functional in actual browser use
  • Lacking any deployment gates or quality controls

The Honest Assessment

What AI Did Well:

  • Rapid scaffolding and boilerplate generation for a browser extension
  • Pivoting between languages without rewriting from scratch
  • Suggesting patterns and approaches I could evaluate
  • Explaining TypeScript concepts as we worked
  • Iterating toward 100% coverage once I insisted on it

Where I Had to Course-Correct:

  • Verifying claimed test coverage vs actual coverage
  • Debugging the runtime environment mismatches
  • Making architectural decisions about structure
  • Determining what "good enough" looked like
  • Setting up CI/CD and branch protection policies


The Takeaway

AI-assisted development works when:

  1. The engineer sets clear requirements (not vague "make me a thing")
  2. Claims are verified (trust, but verify - actually, just verify)
  3. The engineer maintains quality standards (100% means 100%, not 57%)
  4. Real-world testing happens (if it doesn't run, it doesn't ship)
  5. The AI is a tool, not a decision-maker

This isn't "AI replaces developers." This is "experienced engineer uses modern tooling to accelerate development while maintaining professional standards."

And yeah, I learned TypeScript in the process. Turns out it's not mystical knowledge - it's just static typing for JavaScript. If you've done C#, Java, or any other statically-typed language, you'll pick it up in an afternoon.


The Result

  • Fully functional browser extension: ✅
  • 100% test coverage (actual, not claimed): ✅
  • TypeScript experience: ✅
  • CI/CD pipeline with quality gates: ✅
  • Published to Edge Extension store: ✅
  • Time investment: ~10 hours

Not bad for a language I "didn't have experience with."



You can check out the source at github.com/grimlor/youtube-videos-tab-extension. Feel free to judge my TypeScript - I've had it for about 10 hours now.

Why Your Tests Passed But Your API Failed - The SerDe Boundary Problem

I recently spent several hours debugging a bug that  shouldn't have existed . My test suite? 510 passing tests with solid coverage. My B...