Presentation at the orthopedics Research Seminar held on 18th June at the Kulliyah of Medicine, IIUM by Prof Dr Omar Hasan Kasule, Sr. Deputy Dean for Research. UIAM



1.1 The Study Protocol

1.2 Steps of the Study

1.3 Different Designs of a Phase 3 Study

1.4 Sample Size Determination

1.5 Randomization in Phase 3 Trials



2.1 Type of Data Collected

2.2 Design of CRF Forms

2.3 Blinding

2.4 Stopping Rules

2.5 Quality Control



3.1 Fixed Sample Vs Sequential Analysis

3.2 Analytic Methods

3.3 Causes of Differences between the 2 Series

3.4 Problems with Study Design

3.5 Problems with Analysis




Scope: The study protocol details the medical and administrative aspects of the study. This includes the following: definition of objectives, the background to the study, definition of the sample and the treatments, methods of data collection and data analysis.


Chapters of the protocol: Title page, background and Introduction; Objectives of the trial; Patient Selection Criteria; Trial design; Therapeutic regimen: dose and toxicity; required evaluations: clinical, laboratory, and follow-up; criteria of evaluation; registration and randomization of patients; forms and procedure for data collection; statistical procedures; administrative responsibilities; informed consent; references; regulatory regulations, drug ordering, appendices (consent form etc).


Definition of objectives: A clinical trial is a serious and expensive undertaking that must not be started before defining clear, specific, and attainable objectives.


Literature review: Once objectives are set, literature review is carried out for similar studies or similar outcomes.


Definition of patients: method of randomization, inclusion and exclusion criteria, treatment allocation, and withdrawals. The criteria should not be too rigid to ensure a homogenous population. Registration procedures must be defined explicitly. Rigorous enrollment procedures should ensure that the subject does not have the outcome at the time of enrollment.


Description of treatment: treatments, response assessment (single-blind & double-blind), protocol departures, definition of end-points and criteria of efficacy, duration and frequency of treatment. Treatments may be single agent, combined modality, or adjuvant therapy. Description of treatment administration includes what to do in case of side effects or adverse drug reactions. A schema is a treatment plan in graphic form.


Sample size: The sample size is fixed in advance in fixed sample studies. In sequential sample trial the final sample size is determined as the study progresses. What really matters is the number of events and not number of subjects. A study of 1000 patients with 5 events is a weak study.


Data Collection: The exact methods of data collection must be described in such detail that any knowledgeable person will be able to carry out the protocol without further instructions. Measurement of effects and end-points must be defined. The data collection section must define the items of data to be collected. The following data items are usually needed: identification data (patient identifier, trial identifier, and institution identifier), administrative data (names of physicians and research associates), regulation data (institution review board approval, informed consent), the case record form (who fills the form and when, standard layout of the form including standard header, number of items, designation of decimal points, unit of measurement used, and definition of alternative responses of unknown, not available or not applicable, not done) coding of the CRF responses (multiple choices, numeric self-coding, non-numeric self coding), The design of the case record form is important for accuracy of data collection. In computerized CRF data is entered directly into the data-base. The choice between online and offline data entry depends on the nature of the study and preferences of the institution. Online data entry has the problem that there is no paper record to check for mistakes. Sometimes direct data capture is possible from laboratory and clinical measuring instruments.


Data-base: The design of the study data-base including details of data retrieval and security features. The database should be designed such that automatic editing checks are made for eligibility criteria and data inaccuracies as data is entered. The system must also be able to check for timely submission of data from the clinical centers. The relational data-base is more commonly used than the hierarchical. Attention must be paid to data security. Audit trails for changes made to the data must be updated and available. Three groups of study files are maintained: the protocol file with all protocol changes, the regulatory file, and the patient file.


Method of analysis: The statistical analytic philosophy must be determined in advance. There are basically 4 approaches: (a) the classical frequentist statistics (b) The Bayesian approach which requires having a prior probability. The prior probability may be subjective or could be objective based on previous data (c) The likelihood approach in which the inference is drawn as the trial progresses. The study is terminated as soon as a significant result is obtained.


Regulatory measures: Information must be obtained on whether the institution where the research is undertaken is in good standing. Is the investigator authorized? Are regulatory requirements met? Are patients eligible? Demographic data. Measures to ensure protocol compliance.


Quality control: Quality control measures must be put in place. The local clinical site is responsible for ensuring timeliness, completeness, and consistency of data. It must also make sure that patient identifiers are correct and that the necessary privacy measures have been taken. The coordination center is responsible for systematic review of the data and making sure it is complete. The following are QA responsibilities of the central data coordination center: eligibility checks, logging data receipt, checking for correct identifiers, checking for data completion, range sand field type checks, logical and consistency checks carried out manually or computerized, assessment of study end-points, clinical review, and feedback to participants. A data monitoring committee whose members are independent of the study investigators will monitor issues such as safety data and carry out interim analyses.


GCP responsibilities of the local participating site: ethics committee approval, patient recruitment, patient informed consent, collection and record of data required by the protocol,  reporting of adverse effects, ensuring protocol compliance, ordering and storing study drugs.


GCP responsibilities of the study coordination center: confirmation of the ethics committee approval, confirming informed consent, confirming the accuracy of the data by regular visits to the study site, screening qualifications of study personnel, quality control of CRF, monitoring adverse effects, analysis of the trial and report of results.



  1. Decision if single hospital or multicenter studies
  2. Inclusion and exclusion criteria
  3. Study feasibility: funding. Patient accrual may not be enough and resort is to multi-center trials. Multicenter trials allow rapid accrual of patients but need an efficient data coordination center to work well. Trained personnel, pilot or computer simulation
  4. Approvals
  5. Informed consent
  6. Registration
  7. Randomization
  8. Eligibility
  9. Data Collection
  10. Study closure (off-protocol)
  11. Data Analysis



Completely randomized design: In this design subjects are randomly allocated to treatment groups in such a way that likelihood of belonging to any group is the same for all subjects. It has the advantage of being simple to administer and dealing with the subject only once in allocation. Its disadvantage is that it allows only comparison between subjects and not within subjects.


Stratified design or randomized block design: In a stratified design subjects are randomly allocated to treatment groups separately for each stratum or block. It has the advantage if allowing within-block comparisons and using smaller sample sizes than the completely randomized design. Its disadvantage is that the allocation process is prolonged and patients may drop out. The stratification factors are: histology, performance status etc. Stratification may be carried out to achieve institutional balance. Stratifying on too many factors is not recommended. Imbalances can be adjusted at analysis. The larger the study the less the need for stratification.


Factorial design: assess more than 2 treatments at the same time displayed as shown below


Drug B+

Drug B-

Drug A+



Drug A-





Cross-over: In a cross-over design, each subject is his own control. Subjects are allocated to sequences and each sequence contains all the treatments. The advantage of the cross-ver design is that comparisons are made within subjects rather than between subjects which decreases confounding effects. This design requires a smaller sample size than the completely randomized design. Its advantage is spill-over effects in which treatment effects are carried over into the next treatment sequence. Drop-outs are likely because the allocation process is prolonged.



Sample size determination must take into consideration the number of participants, the total number of end-points, and the differences in compliance. Formulas described below are used to compute the total number of participants. Since there are many end-points in a study, there must be a method of selecting which ones are used in sample size determination. The general guideline is to use end-points associated with ‘high risk’ and those that can assure an adequate period of follow-up. Difference in compliance between the two groups is difficult to determine. A pilot study may provide some guidance. Results of the pilot study may be used to eliminate potential non-compliers before randomization. In the end analysis of the study may have to be based on ‘intention to treat’.


The hypothesis testing approach is used to determine study size. The null and alternative hypotheses are stated. Sample size depends on type 1 error (alpha), type 2 error (beta) and the expected difference (D). Alpha is customarily set at 0.05. Beta is usually set no less than 4-5 times the value of alpha usually 0.1 or 0.2. The power of the study is 1 – beta. There are different formulas for sample size for different situations comparing success rates or comparing survival distributions. The sample size for comparing a single success rate against a standard or for comparing 2 success rates is n = [(Za/2 + Zb)2 ] / [2{arcsin(q1)1/2 – arcsin (q2)1/2}2] where n = number of subjects in each group, q1 and q2 are the expected response rates for two treatments. The sample size for comparing more than 2 rates is n = l2 / 2{arcsinq1 (q1 - arcsinq0)1/2}2 where  q0 = minimum rate and q1 = maximum rate, and l = noncentrality parameter looked up from appropriate tables based on a and b. The sample size for comparing two survival distributions is n = {2 (Za/2 + Zb)2 }/ {(lnD)2} where D is the ratio of hazard functions evaluated as L1/ L2. The rules for sequential samples are different from those for fixed samples.


A simpler formula for the number in the treated group can be written as n = [{Za + Zb} {(R + 1)/R}  {p(1-p)}] / [{p1 + p2)2}] where n is the number in the treated group, R is the ratio of the number treated to the untreated, and p = (p2 + Rp1) / (1 + R). At a= 0.05, Za = 1.96 and Zb = 1.645. If n is fixed, the power of the study, 1 - b, is computed from the relationship Zb = [{n1/2 (p1 + p2)}] / [{(R + 1)/R}{ p1 - p2)]1/2 - Za.


In many practical situations, one study center may not collect an adequate number of cases. Several studies of inadequate size may be combined using metaanalysis to reach a reliable conclusion. Metanalysis is feasible if the studies are similar in design.



Study non-participants and participants come from a reference population. The participants, also called the experimental population, are then randomized to a treatment and a comparison group. Participants may differ in systematic ways from non-participants. It is therefore important to obtain minimal information from non-participants in order to assess the potential for bias. Volunteer participants may differ by age, gender, SES, and educational levels.


Randomization is random assignment and not random selection. The study sample is not a random sample and need not be representative of any particular population. The aim of randomization in controlled clinical trials is to make sure that (a) There is no selection bias (b) the two series are as alike as possible by randomly balancing confounding factors. Some confounding factors are known whereas others are not known. Randomization may not always be successful.


Cases ineligible after randomization can be included in the analysis following the rule ‘once randomized always analyzed’.

Equal allocation in randomization is the most efficient design


Pre-randomization: (a) exclusions (b) testing reliability/compliance of subjects


Methods of randomization: (a) Alternate cases: the first case is put in one group. The next one to be recruited is put in the other group. This continues on an alternate basis. (b) Sealed serially numbered envelopes that are opened as each patient is recruited and the group to which he is allocated is read from the envelope. The cards in the envelope are determined by random table numbers (c) Random permuted blocks (d) Biased coin technique (e) dynamic methods. Adaptive randomization is when the allocation ratio varies according to results obtained so far.


Stratified randomization: This is akin to block design of experimental studies. It aims at improving the balancing of confounding factors


Limitation of randomization: (a) Randomization is not successful with small samples (b) Randomization does not always ensure correct conclusions




Patient data: eg performance status, weight


Tumor data: ICD, histopathology, TNM, Measurement (palpation, radiology, phototherapy, CAT scan); tumor markers eg gonadotrophins for trophoblast


Response to treatment: We may talk of response by one organ or response by the whole patient. Response assessment may be qualitative or quantitative. The qualitative assessment is preferred. The response scale is ordinal. The following are normally used for solid tumors: Complete response, Partial response, No response, Disease progression, No Evidence of Disease (NED), Recurrence. Response measurement in non-solid tumors is more difficult and is often measured indirectly. Although not measurable, the response is evaluable. Duration of response is an important parameter, defining the start of response is difficult but the end of response is easy


Survival: disease-free survival, time to recurrence, survival until death


Adverse effects: Type of toxicity, severity, onset, duration (acute/transient vs long-term/chronic). Toxicity must be distinguished from medical complications of the disease. The longer the duration of treatment the higher the occurrence of toxicity. More frequent observation increases pick-up of toxicity. Proper toxicity analysis should include all patients even the non-evaluable. Patients whose drug dose was reduced because of toxicity should be analyzed separately. There are patient-related factors for toxicity.


Quality of life: Quality of life is not easy to measure. Quantifiable scales are preferred. Methods of measurement: clinical observation, clinical interview, self report by patient. Levels of inquiry: physical, chemical, anatomical, biochemical, physiological (organs and systems), psychological, social, and socio-psychological. Measuring instruments must have: validity, reliability, norms, and feasibility.



  • Principles: logical order, clear and not ambiguous, minimize text, self-explanatory questions, every question must be answered
  • Scales: nominal, ordinal, interval, ratio
  • Disease classification: ICD and TNM



Blinding is used to avoid biases. There three types of blinding: single blinding, double blinding, and triple blinding. Single blinding is when the person assessing the response knows the diagnosis but not the treatment. Double blinding is when the person assessing does not know both the treatment and the diagnosis. This not possible in surgery and similar circumstances. The practical problems of ensuring double-blind status can be avoided by using a panel of physicians to review the outcome while leaving the usual physician to continue his work as usual. Maintaining complete double-blind will prevent early detection of benefits in order to stop the trial. In triple blinding the physician, the subject, and the data analyst are blinded to the treatment and the diagnosis.



Stopping rules must be established from the start. Usually the trial is stopped when there is evidence of a difference or when there is risk to the treatment group.



  • Loss of information: due to eligibility criteria, evaluability criteria, adequacy of treatment criteria, and censoring: death, loss to follow-up
  • Data editing: missing values, range checks, cross-checking for consistency, sequence checks, missing data (random or non-random)
  • Follow up: delayed follow-up, loss to follow-up, withdrawal, exclusions. Compliance in trials can be improved by reliability tests before subject recruitment.
  • Outliers
  • Institutional differences in reporting, patient management
  • Data consistency: use a review panel or carry out inter-observer rating
  • Inaccurate data:
  • Protocol non-adherence in randomization, measurement, treatment, follow-up, data management and analysis
  • Maintenance of subject compliance with treatment is difficult. The comparison group may also compliance with the treatment.
  • Ascertainment of outcome may be incomplete or biased.




A sequential trial is one designed in such a way that there is continuous assessment of results in order to be able to stop the trial as soon as a difference or no difference has been determined. A pre-determined stopping rule must be used to avoid bias.

Sequential trials have the following advantages: (a) economy of time and money since fewer patients are involved (b) Achievement of a specific precision (c) ethical considerations: the trial is not continued beyond what is necessary to establish the result.

The disadvantage of sequential trials is that a lower level of statistical significance has to be used. This makes the study too strict.



  • Compare response proportions: chi-square, exact test, chi-square for trend
  • Drawing survival curves: K-M & life-table
  • Comparing survival & remission: Wilcoxon and log-rank
  • Prognostic factors related to response: Cox logistic regression
  • Prognostic factors of remission, duration, and survival times: Cox proportional hazards model
  • Meta-analysis: This is a technique of combining data from several related clinical trials. Each study is treated as a treatment block.



        Sampling variation/chance: controlled by statistical tests of significance

  • Inherent differences between the two series: controlled by randomization
  • Differences in the handling and evaluation of the 2 series: controlled by double-blinding
  • True effects of the treatment
  • Non compliance may cause a difference between the two groups. Non compliance may be overt or covert.
  • Drop-in is a term used to describe a situation in which patients in one group take the treatment of the other group.



        Lack of full documentation/accounting for patients

  • Removing 'bad' cases from series
  • Not censoring the dead
  • Retrospective censoring: ‘clean up’ data by removing cases due to ‘competing causes of death’
  • Retrospective stratification
  • Publication bias
  • Ethics: The following ethical issues arise: (a) withholding a potentially beneficial treatment from the controls (b) the new agent may have unknown risks (c) lack of informed consent (d) Is double-blinding proper? (e) is using a placebo proper?



        Maturation of data: If study is analyzed later different results are obtained

  • Misuse of ‘p-value’: no proper substantive or statistical questions and conclusions; use of part of data; use of inappropriate formula
  • Numerator problems: errors in measuring response
  • Denominator problems (# at risk): exclusion of the ineligible of non-evaluable, early death, loss to follow-up, failure to complete therapy due to toxicity, refusal of further therapy, inadequate data, major protocol violations. To solve denominator problems several denominators should be used in calculations and results are compared: registered and eligible, registered eligible and treated, registered, eligible, and adequately treated.
Accuracy is lack of bias. Precision is lack of variability. Among methods of controlling or minimizing bias: written protocols, blinding double or single, and minimizing drop out form the study.

Omar Hasan Kasule, Sr. June 2004