Table of Contents
Study population
The Health 2000 is a nationwide population-based survey (N = 8,028) in Finland that was performed in 2000–200117,28. In 2001–2002, a subsample of the Health 2000 participants, aged 45–74 years at the baseline and living in six large cities (Helsinki, Turku, Tampere, Kuopio, Joensuu and Oulu) and their surroundings, was recruited for research focusing primarily on cardiovascular health (n = 1,526)29. Of the 1,526 participants, 1,257 individuals who had no missing data in any of the variables presented in Table 1 were included in the present study. As shown in Table 1, in this sample, 45% of the participants had CVD diagnosis at the baseline. More detailed information of the Health 2000 survey can be found at: https://thl.fi/en/web/thlfi-en/research-and-expertwork/projects-and-programmes/health-2000-2011.
Variables in the analysis
For this study, we selected 30 health-related variables, including the cf-DNA level that we hypothesised to be associated with mortality (see statistical analysis). These variables are described in Table 1. These data were collected in health examinations, interviews and questionnaires in the surveys 2000–2001 and 2001–2002. The variables (lifestyle factors, education, and other diseases than diabetes and CVDs) were available only from the survey 2000–2001. All other data were collected in the survey 2001–2002. The information on different disease diagnoses (yes/no), smoking (yes/no), SRH, eating habits, and education level, originated from the interview, and the information on alcohol consumption and exercise originated from the questionnaire.
Education level corresponds to the total number of years in school, and this variable was categorised into tertiles. SRH was assessed with a question: “Is your present state of health: poor, rather poor, moderate, rather good or good?”. The body composition described by BMI (kg/m2) was based on measured height and weight. Fasting blood samples were collected in the health examination. A question, “During the past week, how often (number of days/week) have you eaten fresh vegetables (excluding potatoes)?” was used as an indicator of habitual vegetable consumption. As an indicator of alcohol consumption level, total quantity of alcohol (in grams) consumed in a week was used30. A question, “In a typical week during your leisure time, how often do you perform for more than 10 min such a physical activity that can be considered as an intense exercise (e.g. running, aerobic, heavy outdoor housekeeping)?” was used as an indicator of physical activity.
cf-DNA and other blood biomarkers were measured in EDTA plasma collected in the survey in 2001–2002. The plasma samples were centrifuged for 20 min at 1,800×g and stored at − 70 °C. cf-DNA level was quantified in 2012 from plasma that was thawed prior to analysis using a method described in Jylhava et al.10. Briefly, the level of cf-DNA in plasma was measured from the blood sample using a Quant-iT High-Sensitivity dsDNA Assay Kit and a Qubit Fluorometer v.1 (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. The level of plasma ghrelin was measured according to Lähdeaho et al.31, and the level of plasma adiponectin according to Santaniemi et al.32. The other blood biomarkers, namely, levels of apolipoproteins A1 and B, fasting glucose, insulin, HDL, LDL and total cholesterol, triglycerides, resistin, CRP, IL-6, and TNF-alpha were analysed as described in Malo et al.33. Detailed assay dates are provided in the Supplementary Table S6.
The effect of storage time on cf-DNA levels was assessed by experimental quality control analysis. In specific, we now, in 2020, re-measured cf-DNA levels in 34 EDTA-plasma samples that were first measured right after collection in 2010 27. These samples have been stored at − 70 °C throughout the time and thawed only once. Absolute median differences in the cf-DNA levels measured in 2010 vs. 2020 were assessed using the Mann–Whitney U test. Spearman rank correlations between the measurements in 2010 and 2020 were used to assess the degree to which the rank orders of the samples are maintained.
Indicator variables for having CVD or respiratory diseases were assigned so that in both cases having one or more disease diagnosis of either CVD or respiratory disease was coded as 1 and otherwise as 0. CVD diagnoses included myocardial infarction, coronary heart disease, heart failure, arrhythmia, hypertension, stroke, deep vein thrombosis, and other CVDs. Respiratory diseases included asthma, chronic obstructive pulmonary disease, chronic bronchitis, and other unspecified respiratory diseases. Indicator of diabetes diagnosis refers to any type of diabetes.
Dates of death were drawn on the 31st of December 2017 from the National Register on Causes of Death maintained by Statistics Finland. Mean length of the all-cause mortality follow-up was 15 (standard deviation 0.5) years.
Statistical analysis
The difference in each study variable (in Table 1) between survivors and non-survivors was analysed using Mann–Whitney U test for continuous variables and Pearson’s chi-squared test for categorical variables. Correlations, and thus potential collinearities in the survival model between cf-DNA and other continuous variables, were explored using Spearman’s rank correlation coefficient statistics. The correlation matrix was ordered using hierarchical clustering and visualised as a heatmap using R-package ggcorrplot v0.1.3.
The relationship between cf-DNA and mortality was analysed and visualised using Kaplan–Meier cumulative survival curves. First, participants were categorised into two groups so that individuals in the highest gender-wise cf-DNA quartile are in the group of “elevated cf-DNA levels” and all other individuals in the group of “cf-DNA level is in the normal range” (Fig. 1). Then, to analyse whether cf-DNA level exhibits a dose-responsive relationship with mortality, it was categorised into tertiles (Supplementary Fig. 2). Differences between elevated and normal cf-DNA levels as well as across the cf-DNA tertiles were assessed using the log-rank test. In all other analyses, cf-DNA was treated as a continuous variable. For the subsequent Cox models (see below), cf-DNA values were multiplied by 100 so that the HR of cf-DNA would represent a risk associated with 0.1 μg/ml increase in the cf-DNA level.
Using Cox regression, we first analysed the univariate association of age and gender with mortality, and then adjusted the analysis of age with gender and vice versa. We then analysed individually all the variables in Table 1 for their associations with mortality, adjusting each model for age and gender. Those variables that remained significant were then entered simultaneously to a multivariate Cox model. Variables that remained significant (p < 0.05) in this multivariate Cox model were kept in the model, yielding a final mortality prediction model. Because cf-DNA has attracted attention in CVD medicine as a prognostic tool, and the sample by design includes a high proportion of participants with CVD, the association between mortality and cf-DNA in the fully-adjusted final model was also analysed stratified by CVD status. The proportional hazards assumption (i.e. independence of time) for the final Cox model and for each of the predictors in the model was evaluated using diagnostics based on the Schoenfeld residual correlation statistics. This was performed using cox.zph() function in the R-package survival v2.44-1.1.
Lastly, we analysed the added value of cf-DNA by using the following approaches suitable for censored data and nested models.
Harrell’s C
We first assessed the predictive accuracies of all the final Cox model variables individually as well as for the full final model with and without cf-DNA. For this purpose, univariate Cox models were fit individually for all the final model predictors as well as for the full final model, with and without cf-DNA. After which the Harrell’s C statistics were calculated for each model using the cindex() function in the R-package dynpred. Harrell’s C is a concordance index that is appropriate for right-censored survival data as it assesses the amount of agreement (concordance) between predictions and outcomes by comparing the events and non-events, also accounting for events happened at different points in time.
LR test
The LR test was used to assess whether the addition of cf-DNA to the final model improved model fit. The LR test was performed using the anova() function in R-package survival.