Home

ISLAMIC MEDICAL EDUCATION RESOURCES 04

0104-NON-PARAMETRIC ANALYSIS OF CONTINUOUS DATA USING MEDIANS

By Professor Omar Hasan Kasule Sr.

Learning Objectives:

Definition and properties of the non-parametric methods

Strengths and weaknesses of non-parametric methods

Parametric and non-parametric methods: correspondence and comparison

Situations in which non-parametric tests are used  

Key Words and Terms:

Friedman

Kendal

Kruskall Wallis

Non-parametric

Rank correlation coefficient

Wilcoxon Rank sum test

Sign test

Wilcoxon Signed rank test

Spearman rank order correlation coefficient

 

1.0INTRODUCTION

1.1 DEFINITION AND NATURE

Non-parametric methods do not conform to normality assumptions. They were first introduced as rough, quick and dirty methods and became popular because of their ease and not being constrained by normality assumptions. They were later found to be powerful and valid even in conditions of normally distributed data. They are about 95% as efficient as the more complicated and involved parametric methods. Their popularity is likely to wane with availability of easy computing that negates the major advantage of simplicity of non-parametric tests.

 

1.2 ADVANTAGES

These methods are simplicity itself. They are easy to understand and employ. They do not need complicated mathematical operations thus leading to rapid computation. They have few assumptions about the distribution of data. All they require is to array the data in ranks. They can be used for non-Gaussian data. They can also be used for data whose distribution is not known because there is no need for normality assumptions. They have the further advantage that they can be used for data that is expressed only as ranks. These methods are more robust; we can gain robustness at the expense of power.

 

1.3  DISADVANTAGES:

Non-parametric methods can be used efficiently for small data sets. With data sets that have many observations, ranking becomes cumbersome and the methods can  not be applied with ease. It is difficult to estimate precision because computation of confidence intervals is cumbersome. These methods are also not easy to use with complicated experimental designs. Non-parametric are less efficient than parametric methods for normally-distributed data. They will require a larger sample size than comparable parametric methods to be able to reject a false null hypothesis. Hypothesis testing with non-parametric methods is less specific than hypothesis testig with parametric methods.

 

1.4 CHOICE BETWEEN PARAMETRIC AND NON-PARAMETRIC

Parametric methods are most powerful (ie lower type 2 error) when normality assumptions hold. They are also more efficient in using all available data. Non-parametric are less powerful and less efficient for normal data. They are concerned with the direction and not the size of the difference between the groups being compared. They are most powerful for non-normal data. Non-parametic methods should never be used where parametric methods are possible. Non-parametric should therefore be used only if the test for normality is negative. In general non-parametric methods are used where the assumptions of the central limit theorem do not apply. Non-parametric methods are also used in situations in which the distribution of the parent population is not known. If the unknown distribution is not normal the non-parametric tests are the right choice. If the distribution is normal then at least 95% efficiency is achieved compared to parametric tests. In case of ordinal or ranked data there is no other analytic choice other than non-parametric tests.

 

When faced with a data set it is worth testing it for normality in order to decide the analysis to be used. The Lilliefor test is a simple test of normality without assuming any particular mean or standard deviation. The variables are standardized and the standardized variables are tested for normality.

 

1.5 CORRESPONDENCE OF PARAMETRIC & NON PARAMETRIC

Situation

Parametric test

Non-parametric test

1 sample

z-test, t-test

Sign test

2 independent sample means

t-test

Rank Sum test

2 paired sample means

t-test

Signed Rank Test

3 or more independent sample means

ANOVA (1-way)

Kruskall Wallis

Multiple comparisons of means

ANOVA (2-way)

Friedman

Correlation

Pearson

Spearman

Comparing survival curves

Proportional hazards regression

Log rank test

 

Virtually each parametric test has an equivalent non-parametric one as shown in the table above. Note that the Mann-Whitney test gives results equivalent to those of the signed rank test. The Kendall gives results equivalent to those of the Spearman coefficient. The signed rank and rank sum tests are based on the median.

 

2.0 SPEARMAN RANK CORRELATION COEFFICIENT

The 2 samples being compared are ordered from high to low and a rank is assigned to each observation. The differences between the ranks of corresponding pairs of observations are computed. The differences are then squared. The sum of the squared differences is computed and are summed up. The Spearman rank correlation coefficient is then determined as 6 x {sum of squared differences}/n(nxn-1) where n is the sample size. This can be presented in symbols as rs = {1 - 6d2} / {n(n2 – 1)}where n = number of pairs and d = (rank of x) – (rank of y). The significance of the correlation coefficient is determined by using the t test where t = [{rs(n-2)1/2}] / [{1 - rs2}1/2] with n-2 degrees of freedom. There are tables that can be used to look up the significance of the rank correlation coefficient. The advantage of rank correlation is that comparisons can be carried out even if actual values of the observations are not known. It suffices to know the ranks.

 

3.0 THE SIGN TEST FOR ONE SAMPLE TESTS

The concept of the sign test is very simple to grasp. The test is based on the binomial proportion. If a value is picked at random from any distribution, the probability that it will be less than the median is 0.5. The probability that it is more than the median is also 0.5. The sign test thus makes assumptions based only on the median and does not refer at all to any population parameters. In a 1 sample test we want to test the null hypothesis H0 : sample median = population median. The sign + is assigned to observations above the population median. The sign – is assigned to observations below the population median. If H0 is true, the number of pluses is equal to the number of minuses. Zero is assigned to any tie; ties are not counted. The p-value can be read off the appropriate tables depending on the number of pluses and minuses.

 

4.0 TESTS FOR 2 SAMPLES

SIGNED RANK TEST FOR 2 PAIRED SAMPLES

This is a test of the hypothesis H0: sample median1 = sample median2. The differences between the first and second measurement are computed by simple subtraction. The sign of the difference is ignored for the moment and ranks are assigned to the difference from high to low. A positive or negative sign is assigned to each rank according to the sign of the original difference. The appropriate tables are then used to look up the p value for the given sample size, sum of positive ranks, and sum of negative ranks. Alternatively the z test statistic can be used being defined as z = (wo - we)/sw = (wo - we) / {2n + 1) we/6}1/2 where we = (1/2) n(n + 1) and wo = the sum of positive ranks. 

 

RANK SUM TEST FOR TWO INDEPENDENT SAMPLES

Both samples are combined and the observations are ordered from low to high. A rank is assigned to each observation while making sure that the original group of each observation is not mixed up. The ranks of the group with the smaller size are added up. The rank is referred to the appropriate tables and the p-value is looked up. Alternatively the z test statistic based on the ranks of the smaller sample can be used defined as follows: z = (wo - we)/sw = (wo - we)/ [n1n2 (n1 + n2 +1)/12}]1/2 where wo = sum of ranks in othe smaller sample, wo = expected sum of ranks = {n1(n1 + n2 +1)}/2.

 

5.0 TESTING 3 or MORE SAMPLES

KRUSKALL WALLIS

This a 1-way test for 3 or more independent sample means. It is the non-parametric equivalent of 1-way ANOVA. The observations from various groups are combined and are arranged in order of magnitude from the lowest to the highest. Ranks are assigned before the observations are restored to their original groups. The ranks are summed in each group and the test statistic is constructed as follows: H = 12/N(N+1) kj=1 Rj2 / nj – N(N+1) where nj = number of observations in a group, N= number of observations combined for all groups, Rj = sum of ranks in the jth group, k = number of groups.

 

FRIEDMAN

This is a (2-way test for 3 or more independent sample means.

Professor Omar Hasan Kasule Sr. April 2001