Observation: Use Precision/Recall to Improve a Data Model Learned by AI

Over the past month, I’ve been working on an Agentic AI system that, given a data model and a set of target websites, automatically creates a database by extracting and organizing data into PostgreSQL. This project was not only fun to build but also a tremendous learning experience.

My initial data model drove the analysis engine to create the database well; however, I discovered a flaw in my modeling: I had confused a business with the service it offered, largely because the names were the same. (For instance, an organization that offers specific services can sometimes be mistaken for those services themselves due to identical naming conventions.) After refining the model to better distinguish these entities and rebuilding the database against the new model, I noticed a significant improvement in data collection—without any changes to the acquisition logic! This is striking: nothing changed except for how well the data model matched reality.

The new data model significantly increased the system’s recall (meaning it captured more of the available data). This was surprising, as I didn’t expect the model change to have such a profound impact. Precision improved slightly as well, which is an added bonus. Interestingly, I also noticed a "dead-end" in my new model—ugh, where I made a new modeling mistake—is seeing much lower recall. This dead-end further corroborates my observation, as it underscores how valuable a well-aligned model is for effective automated data collection in my system. Fortunately, this area is a small and insignificant portion of the model, so I’m not concerned. I may end up removing that portion completely.

I believe this observation could generalize across different domains, and I plan to test that soon. The fact that the analysis logic remained unchanged, with only the model itself modified, suggests a new approach to quantitatively scoring data models. Meaning, if this hypothesis holds true, it could lead to a technique for programmatically evaluating and improving data models based on their alignment with reality, using metrics like recall and precision within an objective function. Seems reasonable to me.

I'd love to hear your perspectives!