1.0 INTRODUCTION TO EPIDEMIOLOGY
1.1 DEFINITION, SCOPE, and CLASSIFICATION
Epidemiology is the
study of the distribution and determinants of both disease and injury. Two triads are involved in epidemiology: (a) the agent,
host, and environment triad and the time, place, and person triad.
The primary goals of
epidemiology are prevention, control, and, in rare instances, eradication disease and injury.
as a study of epidemics and extended to cover infectious disease and later non-infectious diseases. It has now become a methodological
discipline that is used to study disease and non-disease phenomena.
deals with qualitative descriptions. Quantitative epidemiology deals with numerical descriptions. Observational epidemiology
is based on observation of human phenomena. Experimental epidemiology involves assessment of the effects of intervention against
a disease phenomenon. Theoretical epidemiology deals with mathematical and methodological issues. Descriptive epidemiology
describes the patterns of disease occurrence in terms of place, time and person. Analytic epidemiology seeks to discover the
underlying causes of diseases.
deals with preventive medicine. Clinical epidemiology deals with diagnosis, management, and prognosis of disease. Hospital
epidemiology deals with nosocomial infections and other aspects of hospital operations that can be studied using epidemiological
methodology. Drug or pharmaco-epidemiology studies phenomena of adverse reactions and side-effects of drugs. Genetic epidemiology
studies the patterns of inheritance of disease from parents and how genetic and
environmental factors interact in the final pathway of disease causation. Molecular epidemiology deals with phenomena at the
molecular level. Occupational epidemiology studies diseases due to exposure to hazardous material or working conditions in
the work-place. Environmental epidemiology studies the impact of air, water, and soil pollution on health.
The supporting disciplines
of epidemiology are clinical sciences, demographical sciences, data and information sciences, behavioral sciences, and environmental
1.2 IMPORTANCE OF EPIDEMIOLOGY
Epidemiology is used
in clinical medicine, public health, and actuarial sciences. The major activities of an epidemiologist are: study design including
selection of the study sample, data collection, data analysis, data interpretation, and initiation of action programs to prevent
disease and promote health. Professional practice and careers in Epidemiology are in government (Ministry of Health), universities,
hospitals, and the private sector (drug manufacturers), and research institutes.
1.3 PIONEERS OF EPIDEMIOLOGY
contributed to the early growth of the discipline. Hippocrates made the first recorded epidemiological observations by describing
the relation of disease to climate and geography. John Snow (1813-1858) recognized the importance of field epidemiology in
his study of the London cholera and its relation to water pollution
William Budd (1811-1880) described the spread of typhoid due to ingestion of infected material from patients. William Furr
realized that cycles of epidemics could be described mathematically. Major Greenwood (1880-1949) was chief of epidemiology
and vital statistics at the London School of Hygiene and Tropical Medicine worked on models of epidemics.
1.4 EPIDEMIOLOGIC METHODOLOGY:
1.5 HISTORICAL EVOLUTION OF EPIDEMIOLOGIC KNOWLEDGE
Five stages can be identified in the evolution of epidemiological knowledge.
The ancient period up to 1500, the post renaissance period 1500-1750, the sanitary period 1750-1870, the infectious disease
period 1870-1945, and modern epidemiology period starting in 1945 (also considered the chronic disease period).
In the ancient period, inter-personal disease transmission, connection between
diseases and the environment, quarantine and isolation were known. In 400 BC Hippocrates suggested the relation between disease
on one side and lifestyle and environmental factors on the other side.
The post renaissance period witnessed rapid growth of knowledge of pathology,
and transmission as well as control of disease. In the 1660s Bacon and others developed inductive logic that provided a philosophical
basis for epidemiology. Girolamo Fracastoro (1478-1553) suggested that disease spread by direct contact and by small living
particles. In 1683 Van Leeuwenhoek saw microorganisms under the microscope. In 1662 Captain John Graunt analyzed births and
deaths and described disease in population quantitatively with significant epidemiological observations and determinations.
In 1747 James Lind discovered the prevention of scurvy by conducting one of the first experimental trials on humans. In 1798
Edward Jenner discovered vaccination. Ramazzini wrote on occupational health in 1770. Percival Pott (1713-1788) associated
scrotal cancer to chimney soot.
In the sanitary period concern was about environmental correlates of disease;
quarantine and isolation were used for disease control.
During the infectious disease period, the microbial basis of disease became
firmly established when Louis Pasteur (1822-1895) and Robert Koch (1843-1900) developed the germ theory through experimentation.
Dr Robert Koch the father of bacteriology identified causative organisms of anthrax (1876), tuberculosis (1882), and cholera
(1883). He developed Koch’s postulates which were criteria for determining an infectious etiology of disease. In 1847
Ignaz Philip Semmelweis suggested hand-washing to avoid obstetric infection. John Snow described the association between cholera
and contaminated water by forming and testing a series of hypotheses thus being a pioneer of analytic epidemiology. William
Budd in 1857-73 concluded that typhoid was contagious. In 1839 William Farr started the discipline of vital statistics as
a system of regular collection and interpretation of data and set up a system for routine summaries of causes of death. Joseph
Lister introduced antiseptic surgery in 1865. Manson Barr, Bruce-Chwatt and others studied the transmission of mosquito-borne
infections, malaria and yellow fever.
Towards the end of the infectious disease period, there were developments
in knowledge of non-infectious disease and statistical methodology. Non-infectious diseases (nutritional, occupational, psychiatric,
and environmental) were identified and were studied. In 1905 beriberi was found associated with eating milled rice. In 1920
Joseph Goldberger published a descriptive field study relating pellagra to diets high in cereal & canned foods and free
of fresh animal products. Elmer McCollum a Professor at Johns Hopkins since 1918 discovered vitamin-deficiency diseases. Statistical
theory and practice developed rapidly towards the close of the 19th century to keep up with developments in basic
research and public health all of which required statistical analysis.
The period of modern epidemiology starting in 1945 is the chronic disease
epoch. By 1945 there was convergence of the non-mechanistic concepts of disease (environment, social, and behavioral basis
of disease) and the mechanistic concepts of disease (molecular, biological, gent-host interaction). Health was defined in
a broad sense as: physical, mental, psychological, and spiritual well-being. Scientists recognized the multi-causal nature
of disease (genetic, psycho-social, physiological, and metabolic). The period is witnessed a demographic transition (ageing
populations) as an epidemiologic transition (change from communicable to non-communicabe diseases). It also witnessed major
studies that helped redefine the direction of epidemiology and public health. In 1949 the Framingham Heart Study was began
as the first cohort study of the causative factors of cardiovascular disease. In 1950 Doll and Hill, Levin et al, Schreck
et al. and Wynder and Graham published the first case control studies of smoking and lung cancer. In 1954 the Field trials
of the Salk polio vaccine were the largest formal human experiment. In 1971-1972 the North Karelia Project and the Stanford
Three Community studies were launched as the first community-based cardiovascular disease prevention programs. Further methodological
developments were witnessed in this period. In 1960 MacMahon published the first epidemiology textbook with systematic treatment
of study design. In 1959 Mantel and Haenszel developed statistical procedures for case control studies. In the 1970s logistic
regression and log-linear regression were developed as new multivariate analytic methods. In the 1970s – present new
developments in computer hardware and software. In the 1990s molecular techniques are being applied to study of large populations.
1.6 ETHICO-LEGAL ISSUES IN EPIDEMIOLOGY
A study involving humans
must get approval from a recognized body. For approval the study must fulfil certain criteria. It must be scientifically valid.
It is unethical to waste resources (time and money) on a study that will give invalid conclusions. In 1992 the Council for
International Organizations of the Medical Sciences published ‘Guidelines for Ethical Review of Epidemiological Studies.
Among ethical considerations are: individual vs. community rights, benefits vs. risks, informed consent, privacy and confidentiality,
and conflict of interest.
Study interpretation and communication of findings to the public pose problems. Risk reports that are not yet confirmed
are picked up by the media and create unnecessary public concern. Study findings affect policy. Epidemiologists must know
how to communicate risk to the public. It is an ethical obligation to report research findings to subjects so that they may
take measures to lessen risk. Epidemiological evidence is different from legal evidence. Epidemiological evidence may not
be accepted in a court of law because it has few certainties; it is concerned with populations whereas legal evidence pertains
2.0 EPIDEMIOLOGIC STUDIES: INTRODUCTION
2.1 SAMPLE SIZE DETERMINATION
The size of the sample depends on the hypothesis, the budget, the study durations, and the precision required.
If the sample is too small the study will lack sufficient power to answer the study question. A sample bigger than necessary
is a waste of resources. Power is ability to detect a difference and is determined by the significance level, magnitude of
the difference, and sample size. The bigger the sample size the more powerful the study. Beyond an optimal sample size, increase
in power does not justify costs of larger sample. There are procedures, formulas, and computer programs for determining sample
sizes for different study designs.
2.2 SOURCES OF SECONDARY DATA
Secondary data is from decennial censuses, vital statistics, routinely collected data, epidemiological studies,
and special health surveys. Census data is reliable. It is wide in scope covering demographic, social, economic, and health
information. The census describes population composition by sex, race/ethnicity, residence, marriage, socio-economic indicators.
Vital events are births, deaths, Marriage & divorce, and some disease conditions. Routinely collected data are cheap but may be unavailable or incomplete. They are obtained from medical facilities, life and health insurance companies, institutions (like prisons, army, and schools), disease registries,
and administrative records. Observational epidemiological studies are of 3 types: cross-sectional, case-control, and
follow-up/cohort studies. Special surveys cover a larger population that epidemiological studies and may be health, nutritional,
or socio-demographic surveys.
2.3 PRIMARY DATA COLLECTION BY QUESTIONNAIRE
Questionnaire design involves content, wording
of questions, format and layout. The reliability and validity of the questionnaire
as well as practical logistics should be tested during the pilot study. Informed consent and confidentiality must be respected.
A protocol sets out data collection procedures. Questionnaire administration by face-to-face interview is the best but is
expensive. Questionnaire administration by telephone is cheaper. Questionnaire administration by mail is very cheap but has
a lower response rate. Computer-administered questionnaire is associated with more honest responses.
2.4 PHYSICAL PRIMARY DATA COLLECTION
Data can be obtained by clinical examination, standardized psychological/psychiatric
evaluation, measurement of environmental or occupational exposure, and assay of biological specimens (endobiotic or xenobiotic)
and laboratory experiments. Pharmacological experiments involve bioassay, quantal dose-effect curves, dose-response curves,
and studies of drug elimination. Physiology experiments involve measurements of parameters of the various body systems. Microbiology
experiments involve bacterial counts, immunoasays, and serological assays. Biochemical experiments involve measurements of
concentrations of various substances. Statistical and graphical techniques are used to display and summarize this data.
2.5 DATA MANAGEMENT AND DATA ANALYSIS
Self-coding or pre-coded questionnaires are preferable. Data is input as text, multiple choice, numeric, date and
time, and yes/no responses. Data in the computer can be checked manually against the original questionnaire. Interactive data
entry enables detection and correction of logical and entry errors immediately.
Data editing is the process of correcting data collection and data entry errors. The data is 'cleaned' using logical,
statistical, range, and consistency checks. All values are at the same level of precision (number of decimal places) to make
computations consistent and decrease rounding off errors. The kappa statistic is used to measure inter-rater agreement. Data editing identifies and corrects errors such as invalid
or inconsistent values. Data is validated and its consistency is tested. The main data problems are missing data, coding and entry errors, inconsistencies, irregular patterns, digit preference, out-liers,
rounding-off / significant figures, questions with multiple valid responses, and record duplication. Data transformation is
the process of creating new derived variables preliminary to analysis and includes mathematical operations such as division,
multiplication, addition, or subtraction; mathematical transformations such as logarithmic, trigonometric, power, and z-transformations.
Data analysis consists of data summarization,
estimation and interpretation. Simple manual inspection of the data is needed before statistical procedures. Preliminary examination consists of looking at tables and
graphics. Descriptive statistics are used to detect errors, ascertain the normality of the data, and know the size of cells.
Missing values may be imputed or incomplete observations may be eliminated. Tests for association, effect, or trend
involve construction and testing of hypotheses. The tests for association are the t, chi-square, linear correlation, and logistic
regression tests or coefficients. The common effect measures Odds Ratio, Risk Ratio, Rate difference. Measures of trend can
discover relationships that are not picked up by association and effect measures. The probability, likelihood, and regression
models are used in analysis. Analytic procedures and computer programs vary for continuous and discrete data, for person-time
and count data, for simple and stratified analysis, for univariate, bivariate and multivariate analysis, and for polychotomous
outcome variables. Procedures are different for large samples and small samples.
3.0 EPIDEMIOLOGIC STUDIES: DIFFERENT STUDY DESIGNS
3.1 CROSS-SECTIONAL DESIGN
The cross-sectional study, also called the prevalence study or naturalistic sampling, has the objective of determination
of prevalence of risk factors and prevalence of disease at a point in time (calendar time or an event like birth or death). Disease and exposure are ascertained simultaneously. A cross-sectional study can be
descriptive or analytic or both. It may be done once or may be repeated. Individual-based
studies collect information on individuals. Group-based (ecologic) studies collect aggregate information about groups of individuals.
Cross-sectional studies are used in community diagnosis, preliminary study of disease etiology, assessment of health status,
disease surveillance, public health planning, and program evaluation.
3.2 CASE-CONTROL DESIGN
The case-control study is popular because or its low cost, rapid results, and flexibility. It uses a small numbers
of subjects. The source population for cases and controls must be the same. Cases are sourced from clinical records, hospital
discharge records, disease registries, data from surveillance programs, employment records, and death certificates. Cases
are either all cases of a disease or a sample thereof. Only incident cases (new cases) are selected. Controls must be from
the same population base as the cases and must be like cases in everything except having the disease being studied. Information
comparability between the case series and the control series must be assured. Hospital, community, neighborhood, friend, dead,
and relative controls are used. There is little gain in efficiency beyond a 1:2 case control ratio unless control data is
obtained at no cost. Confounding can be prevented or controlled by stratification and matching. Exposure information is obtained
from interviews, hospital records, pharmacy records, vital records, disease registry, employment records, environmental data,
genetic determinants, biomarker, physical measurements, and laboratory measurements.
3.3 FOLLOW-UP DESIGN
A follow up study (also called cohort study, incident study, prospective study, or longitudinal study), compares
disease in exposed to disease in non-exposed groups after a period of follow-up. It can be prospective (forward), retrospective
(backward), or ambispective (both forward and backward) follow-up. In a nested case control design, a case control study is
carried out within a larger follow up study. The follow-up cohorts may be closed (fixed cohort) or open (dynamic cohort).
Analysis of fixed cohorts is based on CI and that of open cohorts on IR. The study population is divided into the exposed
and unexposed populations. A sample is taken from the exposed and another sample is taken from the unexposed. Both the exposed
and unexposed samples are followed for appearance of disease. The ascertainment of the outcome event must be standardized
with clear criteria. Follow-up can be achieved by letter, telephone, surveillance of death certificates and hospitals. Care
must be taken to make sure that surveillance, follow-up, and ascertainment for the 2 groups are the same.
3.4 RANDOMIZED DESIGN: COMMUNITY TRIALS
A community intervention study targets the whole community and not individuals. There are basically 2 different
study designs. In a single community design, disease incidence is measured before and after intervention. In a 2-community
design, one community receives an intervention whereas another one serves as the control. Allocation of a community to either
the intervention or the control group is by randomization. The intervention and the assessment of the outcome may involve
the whole community or a sample of the community. Outcome measures may be individual level measures or community level measures.
3.5 RANDOMIZED DESIGN: CLINICAL
The aim of randomization in controlled clinical trials is to make sure that there is no
selection bias and that the two series are as alike as possible by randomly balancing confounding factors. Patients with a
disease are allocated randomly to 2 groups. One group receives the drug being tested. The other group, also called comparison
group, receives a placebo or receives another drug being compared. Equal allocation in randomization is the most efficient
design. Methods of randomization include alternate cases and sealed serially numbered envelopes. Stratified randomization
is akin to block design of experimental studies. Randomization is not successful with small samples and does not always ensure
EPIDEMIOLOGICAL STUDIES: ANALYSIS & INTERPRETATION
Data analysis affects practical decisions and should therefore be taken seriously. Simple manual inspection of
the data is needed can help identify outliers, identify commonsense relationships, and alert the investigator to errors in
computer analysis. Two procedures are employed in analytic epidemiology: test for association and measures of effect. The
test for association is done first. The assessment of the effect measures is done after finding an association. Effect measures
are useless in situations in which tests for association are negative. The common tests for association are: t-test, F test,
chi-square, the linear correlation coefficient, and the linear regression coefficient. The effect measures commonly employed
are: Odds Ratio, Risk Ratio, Rate difference. Measures of trend can discover relationships that are too small to be picked
up by association and effect measures.
An epidemiological study should be considered as a sort of measurement with
parameters for validity, precision, and reliability. Validity is a measure of accuracy. Precision measures variation in the
estimate. Systematic errors lead to bias and therefore invalid parameter estimates. Random errors lead to imprecise parameter
Internal validity is concerned with the results of each individual study.
Internal validity is impaired by study bias. External validity is generalizability of results. Traditionally results are generalized
if the sample is representative of the population. In practice generalizability is achieved by looking at results of several
studies each of which is individually internally valid. It is therefore not the objective of each individual study to be generalizable
because that would require assembling a representative sample.
Precision is a measure for lack of random error. An effect measure with a
narrow confidence interval is said to be precise. An effect measure with a wide confidence interval in imprecise. Precision
is increased in three ways: increasing the study size, increasing study efficiency, and care taken in measurement of variables
to decrease mistakes.
Meta analysis refers to methods used to combine data from more than one study to produce
a quantitative summary statistic. Meta analysis enables computation of an effect estimate for a larger
number of study subjects thus enabling picking up statistical significance that would be missed if analysis were based on
small individual studies. Meta analysis also enables study of variation across several population subgroups
since it involves several individual studies carried out in various countries and populations. Criteria must be set for what
articles to include or exclude. Information is abstracted from the articles on a standardized data abstract form with standard
outcome, exposure, confounder, or effect modifying variables.
Confounding is mixing
up of effects. Confounding bias arises when the disease-exposure relationship is disturbed by an extraneous factor called
the confounding variable. The confounding variable is not actually involved in the exposure-disease relationship. It is however
predictive of disease but is unequally distributed between exposure groups. Being related both to the disease and the risk
factor, the confounding variable could lead to a spurious apparent relation between disease and exposure if it is a factor
in the selection of subjects into the study.