"Well that's interesting"

Written by: Dean Zarras

In prepping for the release of ClearFactr's MPC server, where you'll be able to create from scratch, edit, and compute with models using natural language prompts, I tripped up Claude Desktop on something. It's the kind of thing that might be impossible to find, or otherwise know about. Blog post worthy for sure!

I asked Claude to build me a Black Scholes options pricing model, straight into ClearFactr. That is, not an Excel file that I'd run through ClearFactr's importer, but where I'd hit the refresh button of ClearFactr's Model Browser, and it would just appear there.

Claude was very thorough, and did a reconciliation of its own work before writing the formulas to the new model. It highlighted the following to me:

Screenshot 2026-04-16 at 7.51.38 PM

Admittedly, I was quite alarmed!

Amongst the thousands of unit tests we have for the "inner sanctum" of the product -- the valuation engine -- after 13+ years, had we somehow missed this? Before I ran off to add yet another unit test, to isolate the problem and fix it, I had this little exchange with Claude Desktop:

Screenshot 2026-04-16 at 7.53.16 PM

Worth repeating here: "...so you'd never spot it from internal checks alone... it masked the error completely."

*It's worth noting that when you run an Excel file into ClearFactr's importer, it also does a cell-by-cell reconciliation.

I first ran to Excel to recreate the situation and a few variations, and then coded things up on my side, without changing any valuation engine code. In cases like this, the process is always the same:

Recreate the bug in a unit test and watch the test fail
Find and fix the bug
Re-run the test and see that it passes

But my new unit tests passed on the first try. No failures.

I double-checked the formulas, scratched my head, and then went back to Claude (note the example formula is slightly different here):

Screenshot 2026-04-16 at 8.03.50 PM

Now I thought, what would Grok say?

Here's the output from that convo (blue highlighting is mine):

Screenshot 2026-04-16 at 8.07.16 PM

So now I went back to Claude, and had this quick exchange

Screenshot 2026-04-16 at 8.14.09 PM

What to make of all this?

For all of the amazing progress in the ability of LLMs to build compute models with the Excel language, increasingly subtle, fine-tuning work remains to be done. ClearFactr's unit tests give us the confidence that we're comparing the behavior of our code with a reference standard. In the case of LLM-generated models, it's a bit of a brain bender to wonder what that standard needs to be in any given situation, and how a user would implement the tests.

In this case involving a Black Scholes model, someone would need to compare the results of the new LLM-generated model to the outputs and behaviors of a trusted alternative. In this case, ClearFactr quickly identified 6 inputs, and 30 outputs. It's an incredibly useful little model, but let's emphasize the word "little." What someone might need to do with 60 inputs and 300 outputs, or dramatically more than that, is the stuff for many more, and very tedious, conversations.

We'd love to hear your thoughts!

Tags: Spreadsheet Risk

Subscribe Today

Related Blogs

The Daily Grind of Shadow Spreadsheets: End-Users and Compliance Teams Speak the Same Language—Frustration

This is post two of a series, important backstory here. End-users live in perpetual doubt. Every morning (or meeting!) starts with the same ritual: “Is this the latest file? Did someone email a new...

The Fractured Bridge: Spreadsheet Wizards and Code Crafters in Enterprise Tug-of-War

This is post three of a series, important backstory here. In the heart of many enterprises, a dotted line of friction divides two essential tribes: the Legacy Spreadsheet Modelers, often the unsung...

ClearFactr in Two Slides, and more...

This post is a precursor to a series of posts, two of which are being released simultaneously. It's the necessary backstory to the series. Scroll to bottom for links to each related post. Over a year...