👨‍🏫 Context and Motivation#

Causal Inference#

Motivation: Evaluate the efficiency of an intervention
Treatment effect motivation
Patient trajectory in health observational databases
Patient trajectory in health observational databases
Methodological question: Numerous modelization choices
Modelization choices in causal inference

Different modelization choices can be valid from an expert point of view, but yields different estimates.

Stability of the ATE, an illustration with MIMIC-III

In the Mimic III database, I tried to measure the effect on in-stay mortality of having an act of imagery during the ICU stay, compared to not having this intervention for patients hospitalized in ICU with stroke related symptoms. The confounding variables are 13 common physiology measures.

_images/ate_stability_aggregation_window_13_base_measures.png

🤯 The ATE is not stable over the different aggregation windows, the choice of the estimator or the identification formula.

Chronic Kidney Failure#

Dialysis puts a heavy burden both on the patient lifestyle and the healthcare system.

As the author from the IDEAL trial on Early versus Late Initiation of Dialysis stress [1]:

In clinical practice, there is considerable variation in the timing of the initiation of maintenance dialysis for patients with stage V chronic kidney disease.

There is a strong interest for nephrologists and public health doctor to study this variability in healthcare practices and evaluate if different timing initialization has an impact on patient outcomes.

🎯 Study main objective: Reproduce the Ideal Trial, from Cooper et al. [1] with APHP data.


❓️ Medical question#

The PICO framework is useful to structure the treatment effect question.
  • Population: Adult patient recently diagnosed with CKF at stade 5 with at least one dialysis act recorded in the database.

  • Intervention: Late initialization of dialysis after stade 5 diagnosis.

  • Comparator: Early initialization of dialysis after stade 5 diagnosis

  • Outcome: Primary target is all-cause mortality at a given horizon (eg. 1 or 2 years). Secondary target are MACE. A third target will focus on a quality of life proxy such as the number of hospital visits in the year following dialysis initialization.

Medical question

In adult patients recently diagnosed with final stade of CKF, what is the effect of delayed dialysis initialization on mortality and Major Adverse Cardiovascular Events (MACE) ?


🔎 Data exploration#

We are asking this question on the data from the CSE project 180026, extraction of 20221027.

  • Format: I2B2

  • 🤒 Inclusion of patients: All patient with at least one stay (associated with a document) at Necker from the start of the information system up to october 2022.

  • 🗃 Inclusion of data : All contacts between included patient and any APHP Information System.

  • Some numbers: 348 309 patients, 168 439 concepts, 20 681 providers.

General Information#

Tables description

Tables

Nb Unique Patients

Nb Unique Visits

Nb Rows

Nb Columns

I2b2_concept

NA

NA

168439

9

I2b2_observation_ccam

221437

591021

10151670

22

I2b2_observation_cim10

241122

713940

9054159

22

I2b2_observation_doc

344711

1973424

7237315

22

I2b2_observation_ghm

241585

715971

3855805

22

I2b2_observation_lab

208820

730247

97786558

22

I2b2_observation_med

83581

164130

5106297

22

I2b2_observation_microbio

119068

309802

2199006

22

I2b2_observation_pacs

150217

355415

3501057

22

I2b2_observation_physio

129772

309186

28737904

22

I2b2_observation_ufr

348309

3456566

3804265

22

I2b2_ontology

NA

NA

262334

25

I2b2_patient

348309

NA

348309

21

I2b2_provider

NA

NA

20681

9

I2b2_visit

348309

10858184

10858184

29

Extraction demographics
_images/age_sex_i2b2_patient.png
Count of start_datetime in each table
_images/all_observations.png

🤯 The period of the data is not possible. There was not collect of data back in 1990 !

Comparison to the SNDS#

To control the quality of the data, I compared, for a given code, the number of codes in the EDS and the number codes from the national database (SNDS).

The national billing database (PMSI) gives aggregation numbers for each hospital. It is:

Individual Billing code discrepancies
  • CIM10:N185, CKF stade 5: In the SNDS I computed the number of DP in all RUM coded as N185 at Necker in 2015: 409. If doing the same computation in the cim10 table of the EDSH, I got 203 codes.

  • CIM10:Z491, dialysis: 7576 in the SNDS compared to 5522 in the EDS.

Ratio of missingness for all main diagnoses codes for the year 2015

I do the same computation systematically for all the main diagnosis codes in the year 2015.

Ratio of missingness for all dp codes for the year 2015

🤯 The EDS seems to miss a lot of data !

Our extraction randomized the dates

Our extraction did not ask for exact dates of the events. It appears that the dates were randomly shuffled (consistently for each patient) sometimes several years away from the original date.


🩺 Phenotyping#

Population selection#

We follow the selection of population of the national swedish renal registry on the effect of the timing of dialysis initialization on mortality, [2].

Three sub-populations:
  • Aged over 18 years at hospitalization

  • Chronic Kidney Failure, stade 5, deepacau.cohort.ckf.get_ckf_cohort():

    • 🧪 An eGFR measurement between 10 and 20 mL/min/1.73 m 2, with a previous eGFR measurement between 10 and 30 mL/min/1.73 m2 as confirmation.

    • 🏥 A billing code of CKF stage 5 (CIM10: N185).

    • 📃 A mention of CKF stage 5 in the medical record:

      • eds.matcher for ckf stade terminal: ckf_5=r"((insuff?isanc?t?e?s?\s)|(maladie\s))renale?s?\sterminale?s?"

      • eds.contextual-matcher for ckf stade 5: same ckf5 regex with contextual stade_rg = r"stade ([12345])"

  • Dialysed at least one time, deepacau.cohort.ckf.get_dialysis_cohort(): A billing code of dialysis.

Exclusion criteria, not yet implemented

No history of kidney replacement therapy; and at least one available measurement of systolic blood pressure, diastolic blood pressure, total calcium, phosphate, albumin, and haemoglobin.

Selection flowchart

Using eds-scikit.recipe.flowchart

_images/flowchart.svg
Trial population demographics

Patients with CKF stage 5 and dialysis, age at first dialysis.

_images/trial__age_at_first_dialysis.png
Description of the ckf5, dialysis and trial Population

Table 1 Based on deepacau.featurization.events.get_table_1() and deeepacau.featurization.events.get_wide_table()

Mean (sd) for each concept and each population.#

Concept

CKF 5

Dialysis

Trial population

age_to_inclusion_start

58.7277 (19.457)

53.3167 (20.6295)

57.1449 (18.7626)

chronic_kidney_failure count

39.1662 (138.3599)

33.1784 (126.3923)

41.8095 (143.4094)

dialysis count

141.9865 (388.7395)

140.251 (389.1568)

141.9865 (388.7395)

dp_kidney_replacement_therapy count

5.1205 (5.1463)

4.6961 (4.8738)

5.1398 (5.1587)

dp_myocardial_infarction count

2.7265 (2.1745)

2.6307 (2.1413)

2.7426 (2.2675)

dp_stroke count

3.4138 (2.4893)

3.1282 (2.1977)

3.6567 (2.6234)

eGFR mean

33.3221 (24.639)

52.2776 (36.722)

32.9352 (24.4086)

female

0.4052 (0.4909)

0.4492 (0.4974)

0.3953 (0.4889)

kidney_transplant_complications count

5.1758 (6.9251)

5.3259 (7.612)

5.2105 (6.9525)

mortality@1years

0.127 (0.3329)

0.074 (0.2618)

0.0969 (0.2959)

mortality@5years

0.2315 (0.4218)

0.1369 (0.3438)

0.2039 (0.4029)

n_visits

77.5412 (70.6758)

77.0354 (72.7288)

81.1354 (72.2207)

n_patients

6530

13823

5695

A focus on the 3-criteria subpopulation#

Inclusion with three data sources

Using Matplotlib-venn

Inclusion with three data sources
Description of the different inclusions

Table 1 Based on deepacau.featurization.events.get_table_1() and deeepacau.featurization.events.get_wide_table()

Mean (sd) for each concept and each population.#

Concept

ckf5_nlp |cim10|bio

ckf5_nlp &cim10&bio

ckf5 cim10 only

ckf5 bio only

ckf5 nlp only

age_to_inclusion_start

58.7277 (19.457)

57.0639 (17.8613)

61.7224 (17.8872)

68.4013 (18.3552)

51.027 (19.9777)

chronic_kidney_failure count

39.1662 (138.3599)

62.3466 (179.6032)

12.4525 (63.2396)

6.6588 (7.9964)

6.0909 (10.6357)

dialysis count

141.9865 (388.7395)

181.917 (444.2142)

46.1306 (194.3021)

8.2895 (10.0849)

12.1818 (13.3029)

dp_kidney_replacement_therapy count

5.1205 (5.1463)

5.4426 (3.3551)

4.9902 (3.932)

4.7639 (4.6741)

3.136 (4.1795)

dp_myocardial_infarction count

2.7265 (2.1745)

3.1636 (2.7154)

1.9231 (1.2065)

2.5517 (1.4524)

2.6 (1.6852)

dp_stroke count

3.4138 (2.4893)

4.0667 (2.6196)

2.0 (0.866)

2.4118 (1.8169)

2.0 (0.8165)

eGFR mean

33.3221 (24.639)

27.1938 (18.211)

42.0589 (33.1835)

28.9253 (16.9971)

62.3343 (30.2024)

female

0.4052 (0.4909)

0.3789 (0.4851)

0.3782 (0.4849)

0.4128 (0.4923)

0.4619 (0.4985)

kidney_transplant_complications count

5.1758 (6.9251)

5.318 (5.3447)

4.4773 (5.6749)

2.881 (2.432)

5.3409 (9.6034)

mortality@1years

0.127 (0.3329)

0.0549 (0.2278)

0.1226 (0.328)

0.3294 (0.47)

0.0632 (0.2433)

mortality@5years

0.2315 (0.4218)

0.1844 (0.3878)

0.2245 (0.4173)

0.4317 (0.4953)

0.1195 (0.3244)

n_visits

77.5412 (70.6758)

83.6354 (65.8512)

65.7686 (60.1839)

61.9068 (66.4879)

88.377 (85.2246)

The cim10 and biology criteria focus on hospitalized patients, and have more biologies, that drives the eGFR towards low values.

Intervention/Comparator#

  • Intervention: Late initialization of dialysis after stade 5 diagnosis.

  • Comparator: Early initialization of dialysis after stade 5 diagnosis.

Different approaches are possible to define early vs late initialization. Each of them introduces a bias in the treatment effect analysis.

From initialization
  • Lead time bias: If the inclusion of patient is performed at time of

first dialysis (from initialization inclusion), lead time bias is introduced. In this case, patients with earlier dialysis (when the disease has not progressed too much) have an artificial survival advantage.

Illustration from Fu et al. [2]

Lead time bias
  • Survival bias: In the from initialization inclusion method, the

inclusion in is responsible to artificial advantage for late starter. Included late dialysis starters are more robust than early starter. We did not include fragile late dialysis starter that died before dialysis initialization. Correcting this bias supposes to control for all factors influencing baseline risks which is hard.

From threshold

Immortal bias: To avoid lead time and survival time biases, researcher conducted studies with followup from a common point in time: eg. followup starts when all patients have eGFR=20mL/min/1.73m2. This from threshold analysis uses the future information of eGFR at dialysis to classify patient into exposure or control group. Thus only patient that survives until the dialysis are included. Consequently, between inclusion and dialysis initialization, all included patients are immortal.

Illustration from Fu et al. [2]

Lead time bias

References#

1(1,2)

Bruce A. Cooper, Pauline Branley, Liliana Bulfone, John F. Collins, Jonathan C. Craig, Margaret B. Fraenkel, Anthony Harris, David W. Johnson, Joan Kesselhut, Jing Jing Li, Grant Luxton, Andrew Pilmore, David J. Tiller, David C. Harris, and Carol A. Pollock. A Randomized, Controlled Trial of Early versus Late Initiation of Dialysis. New England Journal of Medicine, August 2010. URL: https://doi.org/10.1056/NEJMoa1000552.

2(1,2,3)

Edouard L. Fu, Marie Evans, Juan-Jesus Carrero, Hein Putter, Catherine M. Clase, Fergus J. Caskey, Maciej Szymczak, Claudia Torino, Nicholas C. Chesnaye, Kitty J. Jager, Christoph Wanner, Friedo W. Dekker, and Merel van Diepen. Timing of dialysis initiation to reduce mortality and cardiovascular events in advanced chronic kidney disease: nationwide cohort study. BMJ, 2021. URL: https://www.bmj.com/content/375/bmj-2021-066306.