Definition and use of survival analysis
Construction of the survival curve using the life-table and the Kaplan-Maier methods
Comparison of the curves graphically by vision
Interpretation, and testing of survival curves
Key Words and Terms:
Survival analysis is study of the occurrence and timing of events. Covariates are studied to determine their effect
on survival duration. Although applicable for both retrospective and prospective data, they are best for the latter. Two features
of survival analysis are not found in conventional statistics: censoring and time-dependent covariates (time-varying explanatory
Three methods of survival analysis are commonly: the life-table method, the Kaplan-Maeir method, and the Proportional
Survival curves are used for preliminary examination of data. Median survival can be read off the curves. Visual
inspection can tell us whether there are obvious differences between the 2 groups and whether those differences are increasing
Survival analysis is used in follow-up of patients on treatment by various experimental therapies. It is also used
to evaluate survival after diagnosis with specific diseases. It is also used to summarize and evaluate mortality in different
groups. The methods can be extended to other uses that are non-medical such as: survival of animals in drug trials, survival
of electric bulbs, survival of machine tools, survival of equipment, survival of friendships, time to promotion, time to divorce.
The techniques of survival analysis are employed in various disciplines for example event history in sociology, reliability
analysis in engineering, failure time analysis in engineering, and duration analysis in economics.
MEASUREMENT OF TIME
There are several ways of measuring time to the event of interest. Time may be measured as duration for example
time since birth (age), time since a given event, time since the last occurrence of the same event. Time may also be measured
as calendar time although this is less popular in clinical trials. The following examples illustrate various descriptions
of time periods: time to relapse, survival after relapse, time to death, time to infection or any other complication. In survival
analysis our interest is in survival duration which is usually time measured from zero time until the event of interest: failure/relapse,
death, 1st response, or censoring. Zero time is defined as the point in time when the hazard starts operating,
the point of randomization, the time of enrolment into the study, the date of the first visit, the date of the first symptoms,
the date of diagnosis, or the date of starting treatment. The best zero time is the point if randomization. Use of time at
diagnosis or start of treatment may introduce bias because socio-economic factors may determine access to diagnosis and treatment
facilities. Survival duration is measured by subtraction of the zero time from time at failure or censoring. Thus we may be
interested in time from start of treatment to the 1st response. Sometimes the interest is in the length of remission,
remission duration. Sometimes the interest is in the tumor-free time. Survival can be described as relative survival or absolute
survival. Relative survival is to 1-year survival of trial subjects with the general population. Absolute survival is the
proportion of the trial subjects who live up to 5 years. Absolure survival is more popular in usage.
PROBLEM OF CENSORING
A problem in survival analysis is censoring. Censoring occurs when an individual is not followed up until occurrence
of the event of interest. Censoring leads to loss of information due to incomplete observation. Those not followed up fully
may have a different experience that would lead to bias in the study. Censoring is caused by loss to follow-up, withdrawal
from the study, study termination when subjects had different dates of enrolment, loss to follow-up, or death due to a competing
risk. Censored observations contribute to the analysis until the time of censoring. Censored analysis makes the assumption
that if censored subjects had been followed beyond the point in time at which they were censored, they would have had the
same rates of outcomes as those not censored at that time. Existence of similar censoring patterns between different treatment
groups suggests that censoring assumptions are holding.
5.3.2 THE LIFE TABLE METHOD
MANUAL CONSTRUCTION OF THE LIFE-TABLE ( 8 COLUMNS)
Column #1 is the time at the start of the time interval. The first row of the table is assigned time 0. Column
#2 is the number of subjects under observation at the start of the time interval, O. Column #4 is the number who died during
the time interval, D. Column #4 is the number withdrawn during the time interval, W. Withdrawals are considered to occur at
the start of the time interval. We assume that there are no secular trends in risk of death in different calendar periods.
Those who withdraw and those who stay under observation have the same probability of death. Column #5 is the number under
observation during the interval. It is computed as O-W. Column #6 is the probability of dying in the interval. It is computed
as P = D / O-W. Column #7 is the probability of surviving to the end of the interval and is computed as Q=1-P. Column #8 is
the probability of survival from time 0 until the end of the interval. The probability for the first row is 1.0. Subsequent
probabilities are computed by multiplying Q into the survival probability of the prior row. The survival probabilities in
column #8 are plotted against time in column #1 to generate a survival curve. Two or more curves can be generated depending
on the treatment or experimental groups.
The lifetable methods works well with large data sets and when the time of occurrence of an event can not be measured
precisely. It is an advantage of being able to make a credible analysis without knowing the exact times of censoring or withdrawal.
The life-table method is not efficient in handling withdrawals. This could be a source of bias. The choice of the
interval is arbitrary. The method assumes that withdrawal occurs at mid-interval which may not be the case.
5.3.3 THE KAPLAN-MAIER METHOD
INTRODUCTION and DEFINITION
The KM involves defining a risk set at each time there is a failure and computation of the instantaneous probability
of death at that time.
MANUAL CONSTRUCTION OF THE KM TABLE
Column #1 is the time at occurrence of an event, ti. It is an exact time and not a time interval. It
is not fixed in advance but is defined by events of death or withdrawal. Deaths and withdrawals occur at different times.
The notation t refers to any time when death, withdrawal, or censoring of an event occur.
Column #2 is the number of subjects at risk at time, ti. This number decreases progressively down the column
as the number of deaths, the number of withdrawals, and the number of censored observations are subtracted. Column #4 is the
number of deaths at time t. Column #4 is the number of withdrawals at time t. Column #5 is the probability of death at time
ti. It is computed as the number of deaths at time ti (column
#4) divided by the number at risk just before time ti (column #2). Occurrence of withdrawals is recorded in the
table but they are considered non-events. A withdrawal affects only the number at risk when the next event of death occurs.
Column #6 is the probability of survival at time ti . It is computed as the 1 - probability of death at time ti…Column
#7 indicates cumulative survival from time 0 to time ti . It is computed by multiplying the row probability of
survival into the probability of survival of the previous row.
The Kaplan-Maier method is best used for small data sets in which the time of event occurrence is measured precisely.
The Kaplan-Maier method is an improvement on the life-table method in the handling of withdrawals. The life-table method considers
withdrawals to occur at the start of the interval but it reality withdrawals occur throughout the interval. The assumption
could therefore create bias or imprecision. The Kaplan-Maier method avoids this complication by not fixing the time intervals
in advance. Intervals are defined in two ways: (a) An interval ends when the end-point event of interest occurs. (b) An interval
ends when a withdrawal occurs.
5.3.4 REGRESSION METHODS
The Proportional hazards regression is the most popular. This procedure uses
regression methods proposed in 1972 by the British statistician Sir David Cox in his famous paper ‘Regression Models
and Lifetables’ published in the Journal of the Royal Statistical Society. It became one of the most quoted papers in
5.3.5 COMPARING SURVIVAL CURVESThe curves can be compared by manual inspection or specialized formulas can be used.