It's a circular dilemma: Healthcare professionals are denied access to the tools required to build AI literacy because they're not competent. They can't get competent without the tools.
Domain expert here. You’ll need to get your data into an EHR. I don’t see any health industry protocols in your article. I know you may want to circumvent them, but they’re valuable and without them, this approach lacks credibility.
I read this like asking someone to train on a theoretical airplane simulator when they’re actually trying to get qualified on a 747. I can’t tell if you’re simulator is Ace combat 7 video game or something real.
Also concerns around privacy are very very real. I see people knocking HIPAA, but without HIPAA it would’ve been the wild wild West. I have seen someone fired for stealing x-rays of a quarterback‘s hand stolen so that that person could bet on a football game. The hand is the hand of a human being, personal data, and should be treated respectfully. There are other reasons hackers go after healthcare data.
Love that you’re simulating data. I’ve generated a lot of data simulations and it’s hard to get the data right, particularly when you don’t think about the particular use case ahead of time. It’s really hard to get good distributions on specific conditions without big example datasets. Also many of the healthcare companies are competing with that have data feeds also have agreements to be able to use the data for product improvement. We did. It’s invaluable for coming up with good simulated data.
I appreciate very much your willingness to provide feedback on this project. I was hoping that someone with domain expertise would come across this project and provide guidance.
The fact of the matter is, everything that you said in your comment is true. I do not knock HIPAA and I certainly am not calling for reducing privacy protections so people can learn AI model validation.
Your comment reinforces the structural reality of the Access-Competency Paradox perfectly:
You state that without an EHR, health industry protocols, and getting good patient data, that my approach “lacks credibility.” True - that’s the Competency portion.
The Access portion of the paradox is the issue for people who want to learn, and the portion I’m attempting to address with my flight sim. I make no claims that my GUI looks anything like an EHR, nor do I claim that my synthetic patient data contains the complexity or richness of a real human being’s chart. But, for all the important reasons you mentioned, only people who already have informatics and model validation experience have access to a real EHR with real patients.
If you will do me the kindness of further sharing your expertise, I would love to hear your thoughts to a few questions:
1.) How can the process of learning AI model validation in healthcare be democratized? People want to upskill but have no available options.
2.) What would be a more rigorous/realistic architecture and methodology for the project if you were to plan it yourself?
Again, I greatly appreciate the response, and agree with your feedback on all accounts. What I’m trying to do is create a responsible way for people to learn and train that does not put real humans in harm’s way.
I learned by getting a job in healthcare and then starting the company. I don’t know where you find the job that I found. I got lucky. It wasn’t democratic. it was serendipity. That’s the way most people get jobs. There’s nothing democratic about getting a certain kind of job. There never has been. Healthcare was probably my third choice.
Right now, my main question is who is your customer? If it’s yourself and you want a job working in AI in healthcare, that’s different than: my customers are doctors and patients. I think it’s much easier to serve doctors and patients as customers. If you solve a problem for a doctor or a patient, they’re more likely to trust you with their data. If you’re trying to solve a problem for yourself with someone else’s data, they’re probably not going to give it to you.
I would not call that undemocratic.
Don’t start with AI as your problem. I think that’s the whole problem with the industry. Everyone is starting with AI as a problem when the actual problems are out there. There are so many important problems out there, particularly in healthcare.
If you want to learn AI, learn AI.
Here’s one: how do you help underfunded, undermanned lung cancer screening programs manage their patient populations? If you did this, you would save thousands of lives and improve peoples quality of life significantly. (I actually think LLMs are a huge step forward and very applicable for this.)
I learned about such a problem by talking to the person who was doing the work working a job in the industry.
Jeremy, I think we are looking at this from two different angles. You are looking at this like a Hospital Administrator hiring a surgeon (you want someone who has already operated on real people).
I am looking at this like a Medical School Dean. We don't let first-year students operate on live patients. We make them practice on cadavers and plastic simulations first.
My tool is the plastic simulation. It isn't "real," and it doesn't solve a patient's problem. But it ensures that when people do finally get that AI validation job (and the real patient access that comes with it), they don't make a rookie mistake on day one.
Love this Ryan! A wonder of a patient version could also be on the cards so that they could better understand how their data is used and how to protect it as well. 🙏
Thanks so much for the feedback, Sam! I'm interested in exploring your idea more - what are your specific uncertainties/fears? What would you want to know more about and how would you like to see it explained?
Thanks so much! I'll do my best to describe the tech stack:
Runtime: Python 3.12+ managed via uv
Reactive Frontend: Solara - built on Reacton. Uses a virtual DOM and reactive state to handle UI updates without full-page refreshes.
Core Engine: Epic Seismometer 0.5 - A specialized framework for clinical model validation. It's build on top of Pandas and PyArrow for high-speed Parquet ingestion and data manipulation; as well as Matplotlib/Plotly for generation of ROC curves, Calibration plots, and Fairness Cohort tables.
Data Format: Apache Parquet - use to take the synthetic patient information generated from Synthea to ensure efficient schema enforcement and fast I/O compared to CSV.
State Management: Singleton Pattern. Uses a global Seismogram instance to maintain configuration and data state across the application lifecycle.
The primary challenge was that Seismometer is architected to be "environment-aware." It is designed to live inside an Epic Hyperspace instance, where it expects a "handshake" from an OIDC provider or a specific filesystem structure to find its metadata.json and config.yml.
To run this as a standalone "Flight Simulator," I had to perform several "man-in-the-middle" style patches to the framework's startup sequence:
1. Bypassing the Handshake (The "Force-Feed")
Normally, Seismometer initiates a load_data() call that looks for environment variables and specific paths to resolve clinical events. In a standalone environment, this leads to a NoneType resolution loop.
The Hack: I manually instantiated the ConfigProvider and then force-fed a Pandas DataFrame directly into the Seismogram instance. This bypassed the internal "Data Discovery" logic that was hardcoded to look for the Epic context.
2. Singleton State Poisoning
Because Seismometer uses a Singleton pattern, if the initialization failed once (due to a bad path), the "poisoned" state stayed in memory.
The Hack: I had to implement a Hard Purge in the Solara initialization function: sm.Seismogram._instances = {}. This wipes the library's internal memory on every reload, forcing it to accept our manual configuration injection.
3. Internal Attribute Injection (The AutomationManager)
The library’s reporting components (the Auditor View) use an internal AutomationManager that is strictly coupled to the configuration registry.
The Hack: I performed Low-Level Attribute Injection. By using setattr(sg, '_config', config), we satisfied the private internal slots that the framework's developers didn't intend for users to touch. This allowed the "Model Evaluation" tabs to render without the library knowing it was running in a "fake" environment.
Essentially, I built a "Mock-Environment Wrapper," taking a tool designed to be a "cog" in a massive corporate machine and built a custom "housing" for it so it can spin independently. The result is a high-fidelity simulation that uses the exact same math and visualization logic as the production Epic tool, but runs entirely on someone's local machine.
Thanks so much, Jenny! I think the truly amazing thing about it all is that the models are smart enough, and the agentic IDEs are robust enough, to give a non-coder like me the ability to chain something like this together.
Like, I’m so naïve to it that I don’t even know how much effort this would have taken a knowledgeable person before. Would it have taken a team? How long would the project have taken?
Like Karen was saying, AI is allowing people with domain expertise the ability to make cool things - and knowing how to code is no longer a gatekeeper.
This is the beautiful part, you don’t yet know how much effort it will take, so you build fearlessly.
Sometimes I feel the more knowledgeable we are, the more time we spend weighing every edge case and debating, when we could already have a prototype in hand to validate the idea.
It'll be up on GitHub once the features are fully built out. I also plan on including "tutorials" that explain each statistical output and have examples of bias so users can see what the graphs will look like when the model's not working as intended.
Domain expert here. You’ll need to get your data into an EHR. I don’t see any health industry protocols in your article. I know you may want to circumvent them, but they’re valuable and without them, this approach lacks credibility.
I read this like asking someone to train on a theoretical airplane simulator when they’re actually trying to get qualified on a 747. I can’t tell if you’re simulator is Ace combat 7 video game or something real.
Also concerns around privacy are very very real. I see people knocking HIPAA, but without HIPAA it would’ve been the wild wild West. I have seen someone fired for stealing x-rays of a quarterback‘s hand stolen so that that person could bet on a football game. The hand is the hand of a human being, personal data, and should be treated respectfully. There are other reasons hackers go after healthcare data.
Love that you’re simulating data. I’ve generated a lot of data simulations and it’s hard to get the data right, particularly when you don’t think about the particular use case ahead of time. It’s really hard to get good distributions on specific conditions without big example datasets. Also many of the healthcare companies are competing with that have data feeds also have agreements to be able to use the data for product improvement. We did. It’s invaluable for coming up with good simulated data.
Hi Jeremy,
I appreciate very much your willingness to provide feedback on this project. I was hoping that someone with domain expertise would come across this project and provide guidance.
The fact of the matter is, everything that you said in your comment is true. I do not knock HIPAA and I certainly am not calling for reducing privacy protections so people can learn AI model validation.
Your comment reinforces the structural reality of the Access-Competency Paradox perfectly:
You state that without an EHR, health industry protocols, and getting good patient data, that my approach “lacks credibility.” True - that’s the Competency portion.
The Access portion of the paradox is the issue for people who want to learn, and the portion I’m attempting to address with my flight sim. I make no claims that my GUI looks anything like an EHR, nor do I claim that my synthetic patient data contains the complexity or richness of a real human being’s chart. But, for all the important reasons you mentioned, only people who already have informatics and model validation experience have access to a real EHR with real patients.
If you will do me the kindness of further sharing your expertise, I would love to hear your thoughts to a few questions:
1.) How can the process of learning AI model validation in healthcare be democratized? People want to upskill but have no available options.
2.) What would be a more rigorous/realistic architecture and methodology for the project if you were to plan it yourself?
Again, I greatly appreciate the response, and agree with your feedback on all accounts. What I’m trying to do is create a responsible way for people to learn and train that does not put real humans in harm’s way.
Thank you and take care!
I learned by getting a job in healthcare and then starting the company. I don’t know where you find the job that I found. I got lucky. It wasn’t democratic. it was serendipity. That’s the way most people get jobs. There’s nothing democratic about getting a certain kind of job. There never has been. Healthcare was probably my third choice.
Right now, my main question is who is your customer? If it’s yourself and you want a job working in AI in healthcare, that’s different than: my customers are doctors and patients. I think it’s much easier to serve doctors and patients as customers. If you solve a problem for a doctor or a patient, they’re more likely to trust you with their data. If you’re trying to solve a problem for yourself with someone else’s data, they’re probably not going to give it to you.
I would not call that undemocratic.
Don’t start with AI as your problem. I think that’s the whole problem with the industry. Everyone is starting with AI as a problem when the actual problems are out there. There are so many important problems out there, particularly in healthcare.
If you want to learn AI, learn AI.
Here’s one: how do you help underfunded, undermanned lung cancer screening programs manage their patient populations? If you did this, you would save thousands of lives and improve peoples quality of life significantly. (I actually think LLMs are a huge step forward and very applicable for this.)
I learned about such a problem by talking to the person who was doing the work working a job in the industry.
Sorry I had to dictate this. Please excuse typos.
Jeremy, I think we are looking at this from two different angles. You are looking at this like a Hospital Administrator hiring a surgeon (you want someone who has already operated on real people).
I am looking at this like a Medical School Dean. We don't let first-year students operate on live patients. We make them practice on cadavers and plastic simulations first.
My tool is the plastic simulation. It isn't "real," and it doesn't solve a patient's problem. But it ensures that when people do finally get that AI validation job (and the real patient access that comes with it), they don't make a rookie mistake on day one.
Love this Ryan! A wonder of a patient version could also be on the cards so that they could better understand how their data is used and how to protect it as well. 🙏
Thanks so much for the feedback, Sam! I'm interested in exploring your idea more - what are your specific uncertainties/fears? What would you want to know more about and how would you like to see it explained?
Thanks Ryan. I guess I am just really interested in what patients even know about how their data is being used.
This is a great example of how vibe coding allows domain experts to solve problems quickly!🌟
Would love to hear how beta testing goes…and curious what tech stack you built this on. 🤔
Thanks so much! I'll do my best to describe the tech stack:
Runtime: Python 3.12+ managed via uv
Reactive Frontend: Solara - built on Reacton. Uses a virtual DOM and reactive state to handle UI updates without full-page refreshes.
Core Engine: Epic Seismometer 0.5 - A specialized framework for clinical model validation. It's build on top of Pandas and PyArrow for high-speed Parquet ingestion and data manipulation; as well as Matplotlib/Plotly for generation of ROC curves, Calibration plots, and Fairness Cohort tables.
Data Format: Apache Parquet - use to take the synthetic patient information generated from Synthea to ensure efficient schema enforcement and fast I/O compared to CSV.
State Management: Singleton Pattern. Uses a global Seismogram instance to maintain configuration and data state across the application lifecycle.
The primary challenge was that Seismometer is architected to be "environment-aware." It is designed to live inside an Epic Hyperspace instance, where it expects a "handshake" from an OIDC provider or a specific filesystem structure to find its metadata.json and config.yml.
To run this as a standalone "Flight Simulator," I had to perform several "man-in-the-middle" style patches to the framework's startup sequence:
1. Bypassing the Handshake (The "Force-Feed")
Normally, Seismometer initiates a load_data() call that looks for environment variables and specific paths to resolve clinical events. In a standalone environment, this leads to a NoneType resolution loop.
The Hack: I manually instantiated the ConfigProvider and then force-fed a Pandas DataFrame directly into the Seismogram instance. This bypassed the internal "Data Discovery" logic that was hardcoded to look for the Epic context.
2. Singleton State Poisoning
Because Seismometer uses a Singleton pattern, if the initialization failed once (due to a bad path), the "poisoned" state stayed in memory.
The Hack: I had to implement a Hard Purge in the Solara initialization function: sm.Seismogram._instances = {}. This wipes the library's internal memory on every reload, forcing it to accept our manual configuration injection.
3. Internal Attribute Injection (The AutomationManager)
The library’s reporting components (the Auditor View) use an internal AutomationManager that is strictly coupled to the configuration registry.
The Hack: I performed Low-Level Attribute Injection. By using setattr(sg, '_config', config), we satisfied the private internal slots that the framework's developers didn't intend for users to touch. This allowed the "Model Evaluation" tabs to render without the library knowing it was running in a "fake" environment.
Essentially, I built a "Mock-Environment Wrapper," taking a tool designed to be a "cog" in a massive corporate machine and built a custom "housing" for it so it can spin independently. The result is a high-fidelity simulation that uses the exact same math and visualization logic as the production Epic tool, but runs entirely on someone's local machine.
What an amazing tech stack! Thanks for sharing this Ryan. I had similar questions coming from a client, but the solution was way less intensive.
Thanks so much, Jenny! I think the truly amazing thing about it all is that the models are smart enough, and the agentic IDEs are robust enough, to give a non-coder like me the ability to chain something like this together.
Like, I’m so naïve to it that I don’t even know how much effort this would have taken a knowledgeable person before. Would it have taken a team? How long would the project have taken?
Like Karen was saying, AI is allowing people with domain expertise the ability to make cool things - and knowing how to code is no longer a gatekeeper.
This is the beautiful part, you don’t yet know how much effort it will take, so you build fearlessly.
Sometimes I feel the more knowledgeable we are, the more time we spend weighing every edge case and debating, when we could already have a prototype in hand to validate the idea.
Very cool! And amazing troubleshooting. Are you planning to offer this as a product?
It'll be up on GitHub once the features are fully built out. I also plan on including "tutorials" that explain each statistical output and have examples of bias so users can see what the graphs will look like when the model's not working as intended.
This should be a really useful open source resource! 🎉