LLM-Guided Causal Bayesian Network Construction for Pediatric Patients on ECMO

Published in Artificial Intelligence in Medicine, 2025

Quick Overview

In this work, we propose a framework for constructing Causal Bayesian Networks (CBNs). Our approach combines data-driven methods with knowledge from two key sources: human experts and large language models. We experimentally validated our method using the small PELICAN dataset, which includes physiological and lab variables from 71 ECMO patients. We compared the CBNs generated by our framework against those created purely from data and those directly elicited by LLMs, all benchmarked against a true graph provided by experts. Causal Bayesian Networks Causal Bayesian Networks (CBNs) are a subclass of Bayesian Networks (BNs). BNs represent the joint probability distribution over a set of variables (say, {X}) by factorizing it over a directed acyclic graph (DAG). This DAG consists of nodes corresponding to each variable; each directed edge between two variables denotes direct influence. If directed edges also denote direct causal relationships, then the BN is considered a CBN. For instance, an edge X $\rightarrow$ Y means that the variable X is a cause of Y. CBNs combine BNs’ ability to reason under uncertainty with the semantics of causality, allowing for both probabilistic reasoning and reasoning about interventions. The structure of CBNs makes them suitable for domains like medicine as these models are interpretable making them easier for the clinicians to understand. Furthermore, these models represent causal pathways which can allow the clinicians to perform counterfactual reasoning and reason about possible interventions which could be performed on the patient.

Learning Causal Bayesian Networks using Data

Learning Causal Bayesian Networks from data is not an easy task. It is especially true for critical domains like medicine. This is due to the fact that interventional data, which is required to properly learn CBNs, is not easy or ethical to obtain. Additionally, we are forced to assume causal sufficiency, i.e, the causal relations between the set of variables under consideration are not mediated by an external or confounding factor. Causal sufficiency may not hold in medicine as all the underlying factors causing a disease are hard to know for sure.

Therefore, we are forced to learn from causal models using observational data, while assuming causal sufficiency using greedy search methods like Greedy Search and Score (GSS), constraint based methods like Peter-Clark (PC), and more advanced methods like Fast Causal Inference (FCI) to learn causal models.

Large Language Models

Large Language Models (LLMs) are a class of generative models that represent the probability distribution over natural language text using neural networks, typically based on the transformer architecture.. Compared to other other language models like Hidden Markov Models (HMMs), LLMs are orders of magnitude larger in scale, with some of the largest models having billions of parameters This large size allows them to capture intricate statistical patterns from large corpora of natural language text.

Large Language Models (LLMs) are trained on vast amounts of text, including medical corpora, which enables them to often generate text containing relevant and accurate clinical information. As a result, they can be used as approximate knowledge sources in medicine. With an appropriately designed prompt, these models can be asked to identify causal relationships between variables and generate a Causal Bayesian Network. However, these models are not truly causal and can hallucinate.

Theory Refinement and Large Language Models for constructing Causal Bayesian Networks

To address the challenge of constructing Causal Bayesian Networks (CBNs) in domains where interventional data is limited or unavailable, we propose a hybrid framework that integrates expert knowledge, large language models (LLMs), and data-driven refinement. Expert-Guided Constraint Definition We begin by eliciting the variables of interest from domain experts (e.g., critical-care physicians). Experts also specify a set of causally impossible relationships—edges that must be excluded from any valid causal graph. These constraints ensure the model adheres to known domain-specific causal logic. LLM-Based Graph Initialization Next, we use a pre-trained LLM to propose an initial causal graph structure. A prompt—designed to elicit causal relationships between the identified variables—is issued multiple times to the model. The resulting graphs are aggregated by pooling their edges. To enforce acyclicity, we eliminate cycles by removing edges that appear in fewer responses, thereby retaining more consistently suggested relationships.

Subtractive Data-Driven Refinement

We then refine the graph using observational data. This phase involves subtractive refinement: we first remove all edges known to be causally impossible (as defined by the experts). We then perform local search to iteratively remove additional edges if doing so improves the Minimum Description Length (MDL) score—a measure that balances model fit and complexity. Full Data-Driven Refinement Finally, we conduct a more flexible refinement step that allows the addition, deletion, or reversal of edges, again guided by improvements in the MDL score. This step enables the discovery of novel or subtle causal relationships not initially captured.

Recommended citation: Mathur, S. et al. (2025). LLM-Guided Causal Bayesian Network Construction for Pediatric Patients on ECMO. In: Bellazzi, R., Juarez Herrero, J.M., Sacchi, L., Zupan, B. (eds) Artificial Intelligence in Medicine. AIME 2025. Lecture Notes in Computer Science(), vol 15735. Springer, Cham. https://doi.org/10.1007/978-3-031-95841-0_48
Download Paper | Download Bibtex