So what exactly is a risk factor, anyway?

We epidemiologists are pretty darn picky about the words we use, how we use them, and what they mean (why else could we have so many different types of incidence measures)? Sometimes nuances in phrasing are harmless, like, whether we call a sport cycling, bicycling or biking (unless you say that to a cyclist who takes themself very seriously). Other times, what seems like nuance can mean a world of difference.

When talking about measures of likelihood, it’s possible to conflate risk and odds. There are specific situations where these numbers approximate one another, but if you don’t know the difference between these and what those situations are, you can run into problems, such as severely biasing your results. If you’re talking about how chewy your apple pie is, a biased answer is probably unproblematic. If you’re interpreting results from a research study, that’s not the case. So now that I’ve established a reason for being picky, let’s go through an example.

Risk factors, correlates, covariates, predictors, causes, lions, tigers, and bears – oh wait, nevermind. But let’s dig deeper and clarify what it means to distinguish between these.  Risk factor. A commonly used, but vague term (I’ve already used it many times in my dissertation). According to the World Health Organization, a risk factor is “any attribute, characteristic or exposure of an individual that increases the likelihood of developing a disease or injury.” But does that mean that a risk factor is a potential cause? Not necessarily.


Image from:

I think it’s worth taking a step back to think about context. Why even ask about risk factors in health science? From a public health standpoint, I argue that we ask about risk factors that we can intervene and hopefully prevent future cases. As Yogi Berra once said, “it’s difficult to make predictions, especially about the future.” So since we can’t actually predict the future and most of our methods of data analysis tell us about averages in groups, we can find out which characteristics, on average, are associated with higher probabilities of the outcome of interest. Without boring you with jargon, one of the ways we do this is through statistical adjustment (simply put: controlling for the impacts of other competing explanations) in regression. In exploratory studies, this is one way that risk factors are identified. When there could be confounding, one could include the variable as a covariate in the model to account for that competing explanation. A covariate could be also a risk factor, or another potential cause.  A covariate could be some other thing that’s associated with your outcome that isn’t a potential cause.

Think about a time someone went through a list of potential factors with you, like this list of risk factors for a stroke. A quick look at information from the Mayo Clinic lists a number of risk factors for diabetes. The risk factors for Type 2 diabetes include: weight, physical inactivity, a family history, race, age, gestational diabetes, polycystic ovary syndrome, high blood pressure, and abnormal cholesterol and triglyceride levels. Some of these risk factors could be causes, some of theses could be correlates, and some of these could be other health conditions due to the same causes.

If you examine other health behaviors and conditions, you’ll likely see a similar variety of types of risk factors. But it leads me to a question: Is it worth calling something a “risk factor” if it’s not a potential cause?

In my dissertation, I’m studying child maltreatment. Specifically, I’m studying whether people who experience maltreatment in their childhood are more likely than other people to maltreat their own children when they become adults. This is also called intergenerational child maltreatment, continuity of child maltreatment, and more broadly cyclical violence or adversity. This is a commonly held belief, but the research out there is inconclusive, and there’s much to be studied. One of the aims in my research, is testing for differences in the frequency of intergenerational maltreatment between groups. While there’s a lot of scientific theory (and pop psychology) out there about the causes of violence, it’s not easy to name a specific cause. Some known risk factors for perpetrating child maltreatment include: interpersonal violence, substance abuse, and there are known disparities in CPS contact between racial and ethnic groups.

Are all of these groups risk factors? Are some of them covariates? And what exactly is the difference between a risk factor and a covariate? I honestly don’t know. It could be useful to know potential causes so that eventually preventing the outcome is possible. But aren’t non-causal risk factors useful in their own right? 

Do risk factors have to be something a person’s choice? I might choose to bike without a helmet, knowing it increases my risk of a traumatic brain injury if I end up in a crash. But biking on streets without bicycling infrastructure is associated with higher risk of crashes (and subsequently, higher risk for traumatic brain injury, see here and here). Most of us don’t choose how streets are designed. And those not living in bike-friendly cities can’t always find a bicycle boulevard.  It’s a complicated conversation to talk about distributions of risk among things people don’t choose.

If a risk factor is changeable or preventable (like substance abuse, to the extent that it is preventable), wouldn’t we want to help people prevent that? We can’t prevent people’s race or gender (at least not in a world that I want to live in). We can’t prevent people’s family history (without superpowers). It would be a tall argument to say that someone’s demographic composition is a cause of their illness or health behavior. But it’s still useful to know these risk factors, especially if it means you can provide resources to groups at greater risk and prevent.

All of this discussion leads me to questions I’ve been wrestling with for a long while: if we can’t predict the future, but knowing something increases risk allows us to prevent an adverse outcome, does it matter if the risk factor is a cause? If something puts you at increased risk, is that enough information? And what amount of increased risk is necessary for action?