Research areas

Statistical methods for oral health research

Oral health is essential for overall well-being, but did you know that oral diseases affect more than half of the people in the world? Despite the high prevalence of oral diseases and the many statistical challenges that arise from analyzing complex oral health data, methodological research motivated by these challenges are limited. I develop statistical methods and tools to answer oral health questions arising from complex multilevel data.

Biased sampling designs in complex surveys and observational studies

To study the relationship between a disease and its risk factors, we collect information from a sample of individuals. This sample, ideally, should represent the population that we want to study. But when time and money are limited, the sample size must be small. To make meaningful inference about every person in the target study population, we need to purposefully oversample people from under-represented groups and correct for the sampling bias using proper statistical methods. I study the implications of using biased sampling designs under special circumstances.

Multiple imputation for missing data

Multiple imputation (MI) is a common approach to handle missing data in observational and experimental studies. MI is straightforward to implement with cross-sectional data with no derived variables, such as interaction terms. But if I want to include an interaction effect in the analysis, and if one or both variables that make up the interaction are missing, how do I impute them? Do I impute the original variables and compute the interaction variable or do we impute the interaction directly? What if I have clustered or longitudinal data with missing outcomes? How do I incorporate the correlation between observations in the imputation model? These are the questions I try to answer.

Modeling agreement in cancer screening

Radiologists read mammograms to screen for breast cancer but how consistent are the results between two radiologists, or three? Traditional agreement statistics – like Cohen’s Kappa – lack the ability to measure agreement between many radiologists each reading many mammograms. A generalized linear mixed effects models (GLMM) can be used to compute a comprehensive summary measure of agreement (and association if the screening result is an ordinal score) between many radiologists’ mammogram readings. I also authored an R package modelkappa to calculate GLMM-based agreement and association.

Selected publications

A complete list of my published works is available from my Google scholar page.

R packages