Miguel-Angel Luque-Fernandez

i) Associate Professor of Biostatistics, Department of Statistics and Operations Research, University of Granada, Granada, Spain. ii) Former Senior Scientific Researcher of Epidemiology and Biostatistics (ISCIII, Spain). iii) Associate Professor (Honorary) of Epidemiology and Biostatistics (Department of Epidemiology, LSHTM). iv) Research Scientist Collaborator at the Harvard TH Chan School of Public, MA, U.S.A (Department of Epidemiology), and v) the Belgian School of Public Health at the University Libre de Bruxelles, Belgium (Department of Clinical Epidemiology and Biostatistics).

Universidad de Granada (Department of Statistics and Operations Research)

Biomedical Research Institute of Granada (Andalusian School of Public Health, Miguel Servet I Award from the Carlos III Institute of Health)

London School of Hygiene and Tropical Medicine

TH Chan Harvard School of Public Health

Universite Libre de Bruxelles, School of Public Health

Biography

I received my PhD in Preventive Medicine (Epidemiology) and Public Health, awarded Summa Cum Laude, from the University of Granada (UGR, Spain) and the ULB (Universite Libre de Bruxelles, Belgium). Also, I hold an MSc in Biostatistics from the University of Newcastle, Australia, an MSc in Epidemiology from the ULB and an MPH and health management from the UGR. After the completion of my Ph.D. in 2010, I moved to the Center for Infectious Disease Epidemiology and Research (University of Cape Town) as a postdoctoral fellow for two years. Afterwards, I moved to the Harvard School of Public Health (Department of Epidemiology), where I specialized in epidemiological methods from 2012 to 2015. I have also been trained as an Epidemic Intelligence Officer (EIS), and I worked as a field epidemiologist for several years in different African countries with Médecins Sans Frontières and GOARN-WHO during the Cholera epidemic in Haiti, 2010.

My research interests lie principally, but not exclusively in the field of epidemiological methods and comparative effectiveness research. At UCT, I used marginal structural models applied to large longitudinal data from Khayelitsha (HIV-Cohort) to assess the effectiveness of an observational, nonrandomized intervention Club of Patients. At Harvard, I used fixed effects methods in the context of the analysis of the components of the variance and within siblings design (observational cross-over) to evaluate the effect of a small fetoplacental ratio at birth on the risk of delivering a small for gestational age infant.

Currently, I am developing in collaboration with colleagues from the Cancer Survival Group (CSG) at the LSHTM data-adaptive methods for model selection and evaluation based on cross-validation techniques cvAUROC and applying advanced causal inference methods such as targeted maximum likelihood estimation TMLE to study social inequalities in cancer outcomes and survival.

Link to my list of publications (updated 2020): LIST

Interests

Epidemiological Methods
Causal Inference
Perinatal Epidemiology
Social Epidemiology
Cancer Epidemiology
Health and Culture
Mathematical Statistics

Education

PhD in Epidemiology and Public Health, 2010

University of Granada, Spain
BSc in Mathematics and Statistics, 2023

Open University, UK
MSc in Data Science & Business Analytics, 2023

Nebrija University & INDRA, Spain
University Certificate in Biostatistics, 2019

Harvard University, Boston, USA
MSc in Biostatistics, 2015

University of Newcastle, Australia
Field Epidemiology Training Program (Epidemic Intelligence Officer Service), 2009

National Center of Epidemiology, ISCIII, Madrid, Spain
MSc in Epidemiology and Biostatistics, 2007

Universite Libre de Bruxelles, Belgium
Master in Public Health (MPH) and Health Management, 2005

University of Granada, Andalusian School of Public Health
MA in Social and Cultural Anthropology, 2004

University of Granada, Spain
University Diploma in Biostatistics, 2003

UNED University, Madrid, Spain
University Diploma in International Health, 2003

National Center of Epidemiology, ISCIII, Madrid, Spain
Obstetrics and Gynecology (Midwife), 1999

University of Granada
BSc in Health Sciences, 1993

University of Granada, Spain

Selected Publications

The Delta-Method and Influence Function in Medical Statistics: a Reproducible Tutorial.

Approximate statistical inference via determination of the asymptotic distribution of a statistic is routinely used for inference in applied medical statistics (e.g. to estimate the standard error of the marginal or conditional risk ratio). One method for variance estimation is the classical Delta-method but there is a knowledge gap as this method is not routinely included in training for applied medical statistics and its uses are not widely understood. Given that a smooth function of an asymptotically normal estimator is also asymptotically normally distributed, the Delta-method allows approximating the large-sample variance of a function of an estimator with known large-sample properties. In a more general setting, it is a technique for approximating the variance of a functional (i.e., an estimand) that takes a function as an input and applies another function to it (e.g. the expectation function). Specifically, we may approximate the variance of the function using the functional Delta-method based on the influence function (IF). The IF explores how a functional ϕ(θ) changes in response to small perturbations in the sample distribution of the estimator and allows computing the empirical standard error of the distribution of the functional. The ongoing development of new methods and techniques may pose a challenge for applied statisticians who are interested in mastering the application of these methods. In this tutorial, we review the use of the classical and functional Delta-method and their links to the IF from a practical perspective. We illustrate the methods using a cancer epidemiology example and we provide reproducible and commented code in R and Python using symbolic programming. The code can be accessed at https://github.com/migariane/DeltaMethodInfluenceFunction

Miguel Angel Luque-Fernandez

In Arxiv 2022, 2022

Preprint Project

Tutorial: computational causal inference for applied researchers

The purpose of many health studies is to estimate the effect of an exposure on an outcome. It is not always ethical to assign an exposure to individuals in randomised controlled trials, instead observational data and appropriate study design must be used. There are major challenges with observational studies, one of which is confounding that can lead to biased estimates of the causal effects. Controlling for confounding is commonly performed by simple adjustment for measured confounders; although, often this is not enough. Recent advances in the field of causal inference have dealt with confounding by building on classical standardisation methods. However, these recent advances have progressed quickly with a relative paucity of computational-oriented applied tutorials contributing to some confusion in the use of these methods among applied researchers. In this tutorial, we show the computational implementation of different causal inference estimators from a historical perspective where different estimators were developed to overcome the limitations of the previous one. Furthermore, we also briefly introduce the potential outcomes framework, illustrate the use of different methods using an illustration from the health care setting, and most importantly, we provide reproducible and commented code in Stata, R and Python for researchers to apply in their own observational study. The code can be accessed at https://github.com/migariane/TutorialCausalInferenceEstimators

Miguel Angel Luque-Fernandez

In Arxiv 2020, 2020

Preprint The `projects` parameter in `content/publication/CCI.md` references a project file, `content/project/cci.md`, which cannot be found. Please either set `projects = []` or fix the reference.

Using longitudinal targeted maximum likelihood estimation in complex settings with dynamic interventions

Longitudinal targeted maximum likelihood estimation (LTMLE) has very rarely been used to estimate dynamic treatment effects in the context of time-dependent confounding affected by prior treatment when faced with long follow-up times, multiple time-varying confounders, and complex associational relationships simultaneously. Reasons for this include the potential computational burden, technical challenges, restricted modeling options for long follow-up times, and limited practical guidance in the literature. However, LTMLE has desirable asymptotic properties, ie, it is doubly robust, and can yield valid inference when used in conjunction with machine learning. It also has the advantage of easy-to-calculate analytic standard errors in contrast to the g-formula, which requires bootstrapping. We use a topical and sophisticated question from HIV treatment research to show that LTMLE can be used successfully in complex realistic settings, and we compare results to competing estimators. Our example illustrates the following practical challenges common to many epidemiological studies: (1) long follow-up time (30 months); (2) gradually declining sample size; (3) limited support for some intervention rules of interest; (4) a high-dimensional set of potential adjustment variables, increasing both the need and the challenge of integrating appropriate machine learning methods; and (5) consideration of collider bias. Our analyses, as well as simulations, shed new light on the application of LTMLE in complex and realistic settings: We show that (1) LTMLE can yield stable and good estimates, even when confronted with small samples and limited modeling options; (2) machine learning utilized with a small set of simple learners (if more complex ones cannot be fitted) can outperform a single, complex model, which is tailored to incorporate prior clinical knowledge; and (3) performance can vary considerably depending on interventions and their support in the data, and therefore critical quality checks should accompany every LTMLE analysis. We provide guidance for the practical application of LTMLE.

Michael Schomaker, Miguel Angel Luque-Fernandez et al.

In SIM, 2019

Preprint Project

Effect Modification and Collapsibility in Evaluations of Public Health Interventions

The effect modification and collapsibility are critical concepts in epidemiological research when assessing the validity of using regression for the estimation of causal effects. Monte Carlo simulations and code supporting the letter can be found at Luque’s GitHub repository.

Miguel Angel Luque-Fernandez

In AJPH 2019, 2019

Preprint Project

Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application

Classical epidemiology has focused on the control of confounding, but it is only recently that epidemiologists have started to focus on the bias produced by colliders. A collider for a certain pair of variables (e.g. an outcome Y and an exposure A) is a third variable © that is caused by both. In a directed acyclic graph (DAG), a collider is the variable in the middle of an inverted fork (i.e. the variable C in A → C ← Y). Controlling for, or conditioning an analysis on a collider (i.e. through stratification or regression) can introduce a spurious association between its causes. This potentially explains many paradoxical findings in the medical literature, where established risk factors for a particular outcome appear protective. We use an example from non-communicable disease epidemiology to contextualize and explain the effect of conditioning on a collider. We generate a dataset with 1000 observations, and run Monte-Carlo simulations to estimate the effect of 24-h dietary sodium intake on systolic blood pressure, controlling for age, which acts as a confounder, and 24-h urinary protein excretion, which acts as a collider. We illustrate how adding a collider to a regression model introduces bias. Thus, to prevent paradoxical associations, epidemiologists estimating causal effects should be wary of conditioning on colliders. We provide R code in easy-to-read boxes throughout the manuscript, and a GitHub repository https://github.com/migariane/ColliderApp for the reader to reproduce our example. We also provide an educational web application allowing real-time interaction to visualize the paradoxical effect of conditioning on a collider http://watzilei.com/shiny/collider/

Miguel Angel Luque-Fernandez

In IJE 2019, 2018

Preprint Project

Targeted Maximum Likelihood Estimation: A tutorial

When estimating the average treatment effect for a binary outcome, methods that incorporate propensity scores, the G-formula, or Targeted Maximum Likelihood Estimation (TMLE) are preferred over naïve regression approaches which often lead misspecified models. Some methods require correct specification of the outcome model, whereas other methods require correct specification of the exposure model. Doubly-robust methods only require correct specification of one of these models. TMLE is a semi-parametric doubly-robust method that enhances correct model specification by allowing flexible estimation using non-parametric machine-learning methods and requires weaker assumptions than its competitors. We provide a step-by-step guided implementation of TMLE and illustrate it in a realistic scenario based on cancer epidemiology where assumptions about correct model specification and positivity (i.e., when a study participant had zero probability of receiving the treatment) are nearly violated. This article provides a concise and reproducible educational introduction to TMLE for a binary outcome and exposure. The reader should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. Extensive R-code is provided in easy-to-read boxes throughout the article for replicability. Stata users will find a testing implementation of TMLE and additional material in the appendix and at the following GitHub repository: https://github.com/migariane/SIM-TMLE-tutorial

Miguel Angel Luque-Fernandez, Michael Schomaker, Bernard Rachet, Mireille Schnitzer

In SIM, 2018

Preprint

Data-Adaptive Estimation for Double-Robust Methods in Population-Based Cancer Epidemiology: Risk differences for lung cancer mortality by emergency presentation

We propose a structural framework for population-based cancer epidemiology and evaluate the performance of double-robust estimators for a binary exposure in cancer mortality. We performed numerical analyses to study the bias and efficiency of these estimators. Furthermore, we compared two different model selection strategies based on 1) the Akaike and Bayesian Information Criteria and 2) machine-learning algorithms, and illustrated double-robust estimators’ performance in a real setting. In simulations with correctly specified models and near-positivity violations, all but the naïve estimators presented relatively good performance. However, the augmented inverse-probability treatment weighting estimator showed the largest relative bias. Under dual model misspecification and near-positivity violations, all double-robust estimators were biased. Nevertheless, the targeted maximum likelihood estimator showed the best bias-variance trade-off, more precise estimates, and appropriate 95% confidence interval coverage, supporting the use of the data-adaptive model selection strategies based on machine-learning algorithms. We applied these methods to estimate adjusted one-year mortality risk differences in 183,426 lung cancer patients diagnosed after admittance to an emergency department versus non-emergency cancer diagnosis in England, 2006–2013. The adjusted mortality risk (for patients diagnosed with lung cancer after admittance to an emergency department) was 16% higher in men and 18% higher in women, suggesting the importance of interventions targeting early detection of lung cancer signs and symptoms.

Miguel Angel Luque-Fernandez

In AJE, 2017

Preprint Project

Deconstructing the smoking-preeclampsia paradox through a counterfactual framework

Although smoking during pregnancy may lead to many adverse outcomes, numerous studies have reported a paradoxical inverse association between maternal cigarette smoking during pregnancy and preeclampsia. Using a counterfactual framework we aimed to explore the structure of this paradox as being a consequence of selection bias. Using a case–control study nested in the Icelandic Birth Registry (1309 women), we show how this selection bias can be explored and corrected for. Cases were defined as any case of pregnancy induced hypertension or preeclampsia occurring after 20 weeks’ gestation and controls as normotensive mothers who gave birth in the same year. First, we used directed acyclic graphs to illustrate the common bias structure. Second, we used classical logistic regression and mediation analytic methods for dichotomous outcomes to explore the structure of the bias. Lastly, we performed both deterministic and probabilistic sensitivity analysis to estimate the amount of bias due to an uncontrolled confounder and corrected for it. The biased effect of smoking was estimated to reduce the odds of preeclampsia by 28 % (OR 0.72, 95 %CI 0.52, 0.99) and after stratification by gestational age at delivery (<37 vs. ≥37 gestation weeks) by 75 % (OR 0.25, 95 %CI 0.10, 0.68). In a mediation analysis, the natural indirect effect showed and OR > 1, revealing the structure of the paradox. The bias-adjusted estimation of the smoking effect on preeclampsia showed an OR of 1.22 (95 %CI 0.41, 6.53). The smoking-preeclampsia paradox appears to be an example of (1) selection bias most likely caused by studying cases prevalent at birth rather than all incident cases from conception in a pregnancy cohort, (2) omitting important confounders associated with both smoking and preeclampsia (preventing the outcome to develop) and (3) controlling for a collider (gestation weeks at delivery). Future studies need to consider these aspects when studying and interpreting the association between smoking and pregnancy outcomes.

Miguel Angel Luque-Fernandez

In EJEP, 2016

Preprint PDF Source Document

Recent Publications

Miguel Angel Luque-Fernandez. The Delta-Method and Influence Function in Medical Statistics: a Reproducible Tutorial.. In Arxiv 2022, 2022.

Preprint Project

Miguel Angel Luque-Fernandez. Tutorial: computational causal inference for applied researchers. In Arxiv 2020, 2020.

Preprint The `projects` parameter in `content/publication/CCI.md` references a project file, `content/project/cci.md`, which cannot be found. Please either set `projects = []` or fix the reference.

Michael Schomaker, Miguel Angel Luque-Fernandez et al.. Using longitudinal targeted maximum likelihood estimation in complex settings with dynamic interventions. In SIM, 2019.

Preprint Project

Miguel Angel Luque-Fernandez. Effect Modification and Collapsibility in Evaluations of Public Health Interventions . In AJPH 2019, 2019.

Preprint Project

Miguel Angel Luque-Fernandez. Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application. In IJE 2019, 2018.

Preprint Project

Miguel Angel Luque-Fernandez, Michael Schomaker, Bernard Rachet, Mireille Schnitzer. Targeted Maximum Likelihood Estimation: A tutorial. In SIM, 2018.

Preprint

Miguel Angel Luque-Fernandez. Data-Adaptive Estimation for Double-Robust Methods in Population-Based Cancer Epidemiology: Risk differences for lung cancer mortality by emergency presentation. In AJE, 2017.

Preprint Project

Miguel Angel Luque-Fernandez. Deconstructing the smoking-preeclampsia paradox through a counterfactual framework. In EJEP, 2016.

Preprint PDF Source Document

Recent Talks

The Delta-Method and Influence Function in Medical Statistics: a Reproducible Tutorial.

May 27, 2022 12:00 AM

Internal ICON group seminar, LSHTM, London, UK.

Slides

Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application .

Sep 19, 2019 12:00 AM

XXVII annual meeting Spanish Society of Epidemiology (SEE), Oviedo, Spain.

Slides

Clinical Epidemiology in the Era of the Big Data Revolution: New Opportunities.

Nov 8, 2017 12:00 AM

NDMORS, Medical Science Division, Oxford University, Oxford, UK

Slides

Ensemble Learning Targeted Maximum Likelihood Estimation, at London.

Oct 15, 2017 12:00 AM

Stata Users Group Meeting, London, UK

Slides

Projects

The Delta-Method and Influence Function in Medical Statistics: a Reproducible Tutorial.

HETMOR

Effect modification and collapsibility when estimating the effect of public health interventions: A Monte-Carlo Simulation comparing classical multivariable regression adjustment versus the G-Formula based on a cancer epidemiology illustration: The American Journal of Public Health series Evaluating Public Health Interventions offers excellent practical guidance to researchers in public health. In the 8 part of the series, a valuable introduction to effect estimation of time-invariant public health interventions was given. The authors of this article suggested that in terms of bias and efficiency there is no advantage of using modern causal inference methods over classical multivariable regression modeling. However, this statement is not always true. Most importantly, both effect modification and collapsibility are important concepts when assessing the validity of using regression for causal effect estimation (https://github.com/migariane/hetmor/blob/master/README.md)

Colliders in Epidemiology: an educational interactive web application

A collider for a certain pair of variables (outcome and exposure) is a third variable that is influenced by both of them. Controlling for, or conditioning the analysis on (i.e., stratification or regression) a collider, can introduce a spurious association between its causes (exposure and outcome) potentially explaining why the medical literature is full of paradoxical findings [6]. In DAG terminology, a collider is the variable in the middle of an inverted fork (i.e., variable W in A -> W <- Y). While this methodological note will not close the vexing gap between correlation and causation, but it will contribute to the increasing awareness and the general understanding of colliders among applied epidemiologists and medical researchers.

Ensemble Learning for Model Prediction in Cancer Epidemiology

To improve model selection and prediction in cancer epidemiology data adaptive ensemble learning methods based on the Super Learner as a method for variable selection via cross-validation are suitable. To selection of the optimal regression algorithm among all weighted combinations of a set of candidate machine learning algorithms the ensemble learning method improves model accuracy and prediction.

Targeted Maximum Likelihood Estimation: A Tutorial for Applied Researchers

TMLE is a semiparametric doubly-robust method for Causal Infernece that enhances correct model specification by allowing flexible estimation using non-parametric machine-learning methods and requires weaker assumptions than its competitors.

Tutorial: computational causal inference for applied researchers

In this tutorial the computational we show the implementation of different causal inference estimators from a historical perspective where different estimators were developed to overcome the limitations of the previous one. Furthermore, we also briefly introduce the potential outcomes framework, illustrate the use of different methods using an illustration from the health care setting, and most importantly, we provide reproducible and commented code in Stata, R and Python for researchers to apply in their own observational study. Available at https://arxiv.org/abs/2012.09920

cvAUROC

cvAUROC is a Stata program that implements k-fold cross-validation for the AUC for a binary outcome after fitting a logistic regression model. Evaluating the predictive performance (AUC) of a set of independent variables using all cases from the original analysis sample tends to result in an overly optimistic estimate of predictive performance. K-fold cross-validation can be used to generate a more realistic estimate of predictive performance.

Miguel-Angel Luque-Fernandez

Universidad de Granada (Department of Statistics and Operations Research)

Biomedical Research Institute of Granada (Andalusian School of Public Health, Miguel Servet I Award from the Carlos III Institute of Health)

London School of Hygiene and Tropical Medicine

TH Chan Harvard School of Public Health

Universite Libre de Bruxelles, School of Public Health

Biography

Interests

Education

Selected Publications

The Delta-Method and Influence Function in Medical Statistics: a Reproducible Tutorial.

Tutorial: computational causal inference for applied researchers

Using longitudinal targeted maximum likelihood estimation in complex settings with dynamic interventions

Effect Modification and Collapsibility in Evaluations of Public Health Interventions

Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application

Targeted Maximum Likelihood Estimation: A tutorial

Data-Adaptive Estimation for Double-Robust Methods in Population-Based Cancer Epidemiology: Risk differences for lung cancer mortality by emergency presentation

Deconstructing the smoking-preeclampsia paradox through a counterfactual framework

Recent Publications

Recent Talks

Projects

The Delta-Method and Influence Function in Medical Statistics: a Reproducible Tutorial.

HETMOR

Colliders in Epidemiology: an educational interactive web application

Ensemble Learning for Model Prediction in Cancer Epidemiology

Targeted Maximum Likelihood Estimation: A Tutorial for Applied Researchers

Tutorial: computational causal inference for applied researchers

cvAUROC

Teaching

Contact