8 Comments
User's avatar
Sam Illingworth's avatar

Love this Ryan! A wonder of a patient version could also be on the cards so that they could better understand how their data is used and how to protect it as well. 🙏

Ryan Sears, PharmD's avatar

Thanks so much for the feedback, Sam! I'm interested in exploring your idea more - what are your specific uncertainties/fears? What would you want to know more about and how would you like to see it explained?

Sam Illingworth's avatar

Thanks Ryan. I guess I am just really interested in what patients even know about how their data is being used.

Karen Spinner's avatar

This is a great example of how vibe coding allows domain experts to solve problems quickly!🌟

Would love to hear how beta testing goes…and curious what tech stack you built this on. 🤔

Ryan Sears, PharmD's avatar

Thanks so much! I'll do my best to describe the tech stack:

Runtime: Python 3.12+ managed via uv

Reactive Frontend: Solara - built on Reacton. Uses a virtual DOM and reactive state to handle UI updates without full-page refreshes.

Core Engine: Epic Seismometer 0.5 - A specialized framework for clinical model validation. It's build on top of Pandas and PyArrow for high-speed Parquet ingestion and data manipulation; as well as Matplotlib/Plotly for generation of ROC curves, Calibration plots, and Fairness Cohort tables.

Data Format: Apache Parquet - use to take the synthetic patient information generated from Synthea to ensure efficient schema enforcement and fast I/O compared to CSV.

State Management: Singleton Pattern. Uses a global Seismogram instance to maintain configuration and data state across the application lifecycle.

The primary challenge was that Seismometer is architected to be "environment-aware." It is designed to live inside an Epic Hyperspace instance, where it expects a "handshake" from an OIDC provider or a specific filesystem structure to find its metadata.json and config.yml.

To run this as a standalone "Flight Simulator," I had to perform several "man-in-the-middle" style patches to the framework's startup sequence:

1. Bypassing the Handshake (The "Force-Feed")

Normally, Seismometer initiates a load_data() call that looks for environment variables and specific paths to resolve clinical events. In a standalone environment, this leads to a NoneType resolution loop.

The Hack: I manually instantiated the ConfigProvider and then force-fed a Pandas DataFrame directly into the Seismogram instance. This bypassed the internal "Data Discovery" logic that was hardcoded to look for the Epic context.

2. Singleton State Poisoning

Because Seismometer uses a Singleton pattern, if the initialization failed once (due to a bad path), the "poisoned" state stayed in memory.

The Hack: I had to implement a Hard Purge in the Solara initialization function: sm.Seismogram._instances = {}. This wipes the library's internal memory on every reload, forcing it to accept our manual configuration injection.

3. Internal Attribute Injection (The AutomationManager)

The library’s reporting components (the Auditor View) use an internal AutomationManager that is strictly coupled to the configuration registry.

The Hack: I performed Low-Level Attribute Injection. By using setattr(sg, '_config', config), we satisfied the private internal slots that the framework's developers didn't intend for users to touch. This allowed the "Model Evaluation" tabs to render without the library knowing it was running in a "fake" environment.

Essentially, I built a "Mock-Environment Wrapper," taking a tool designed to be a "cog" in a massive corporate machine and built a custom "housing" for it so it can spin independently. The result is a high-fidelity simulation that uses the exact same math and visualization logic as the production Epic tool, but runs entirely on someone's local machine.

Karen Spinner's avatar

Very cool! And amazing troubleshooting. Are you planning to offer this as a product?

Ryan Sears, PharmD's avatar

It'll be up on GitHub once the features are fully built out. I also plan on including "tutorials" that explain each statistical output and have examples of bias so users can see what the graphs will look like when the model's not working as intended.

Karen Spinner's avatar

This should be a really useful open source resource! 🎉