Cox Proportional Hazard

Survival regression with covariates

The Cox Proprtional hazard model is perhaps the most common survival model. In a number of ways it has similarities to traditional linear regression. There are two main differences:

1 - We are regressing against hazard rate (probability of dying conditional on being alive) 2 - We use the exponent of xb instead of just xb to ensure the values are always positive.

The key concept is of members of the population at risk. The regression works by predicting for each time period, the probability of a subject converting/dying, conditional on having not yet died/converted

The model is semi-parametric. This means that it does not place a restriction on how hazard changes through time, however, it does place restrictions on how features can affect hazard. Specifically, features must have a constant, proprtional effect (hence the name).

For example, if smoking increases hazard rate of getting cancer, then it must do so by increasing the risk of cancer by some set proprtional in every time period. If smoking makes cancer twice as likely, then it must make cancer twice as likely in every period. See (this post)[post/testing_proportional_hazards] for more info on testing whether this holds


h(0) represents the “base line” hazard, the hazard that the reference group experiences. It can be thought of as similar to the intercept in traditional regression, although it cannot be negative (negative hazard doesn’t make sense in context of our model). The exp function is used to prevent negative values for the same reason

With this equation in mind, it is easy to see the proportional nature of the model if we divide the hazard rates for two different values of coefficients. The baseline hazard rate immediately cancels out and we can turn the division into subtraction within the exponential function.

$$\frac{h(t|x_0)}{h(t|x_1)} = \frac{h_0(t)e^{x_0\beta}} {h_0(t)e^{x_1\beta}} = e^{(x_0 - x_1)\beta}$$