Why are parametric assumptions important




















Minimum and maximum data values are represented by whiskers protruding from the box. The geometric representations of these values can be qualitatively examined for symmetry and amount of variance.

Potential outlying data points can also be identified. A newer graphical tool is the Violin Plot , which represents the data in a unique way that allows visualization of normality and homogeneity of variance by enclosing data in a single shape for each comparison group.

Simply examining the symmetry and size of the shapes will reveal if data conforms to a normal distribution with homogeneous variability. A final graphical tool that is particularly useful in assessing normality assumptions is the Quantile-Quantile plot, also referred to as the QQ plot. By graphing the actual values of data along the x-axis against predicted values of similar hypothetical data that obey a perfect normal distribution along the y-axis , one can assess normality by simply observing how closely the data points adhere to the diagonal line.

It is an elegant way to illustrate if data conform to normality and if either comparison group has outliers or patterns of variability that defy the assumption of homogeneity of variance.

Exploring data graphically is an indispensable qualitative tool to check the assumption of normality before applying parametric tests, but there are also formal tests available within Prism to assess how likely it is that data are adhering to normality.

Four formal methods to test data distribution for normality are the following:. Each of these methods will calculate a p value for each comparison group evaluating whether it is likely that a normal distribution is reflected by the data. The application of these formal tests can be a reassuring tool to confirm the normality assumption that was visually observed during graphical exploration of the data. The consideration of potential outliers is an important topic unto itself, but graphical exploration of data can be a good tool to identify when further consideration is warranted.

Data points that are far removed from the group, lying an abnormal distance from the mean, can skew the symmetry of distribution and call in a question the assumption of normality. The required test is then the t -test Table 2. However, if the input variable is continuous, say a clinical score, and the outcome is nominal, say cured or not cured, logistic regression is the required analysis. A t -test in this case may help but would not give us what we require, namely the probability of a cure for a given value of the clinical score.

As another example, suppose we have a cross-sectional study in which we ask a random sample of people whether they think their general practitioner is doing a good job, on a five point scale, and we wish to ascertain whether women have a higher opinion of general practitioners than men have.

The input variable is gender, which is nominal. The outcome variable is the five point ordinal scale. Each person's opinion is independent of the others, so we have independent data. Note, however, if some people share a general practitioner and others do not, then the data are not independent and a more sophisticated analysis is called for.

Note that these tables should be considered as guides only, and each case should be considered on its merits. However, they require certain assumptions and it is often easier to either dichotomise the outcome variable or treat it as continuous. Parametric tests are those that make assumptions about the parameters of the population distribution from which the sample is drawn.

This is often the assumption that the population data are normally distributed. Table 3 shows the non-parametric equivalent of a number of parametric tests.

Non-parametric tests are valid for both non-Normally distributed data and Normally distributed data, so why not use them all the time? It would seem prudent to use non-parametric tests in all cases, which would save one the bother of testing for Normality. Parametric tests are preferred, however, for the following reasons:. This may include identifying the tests or analyses you need to run and what assumptions need to be satisfied and what data you need to collect and how much.

After doing a lot of reading and researching, somehow, I have managed to put direction to what I am doing. In the first place, they place constraints on our interpretation of the results. If we really do have normality and homoscedasticity, and if we obtain a significant result, then the only sensible interpretation of a rejected null hypothesis is that the population means differ.

What could be neater? The second reason for the assumptions is that we use the characteristics of the populations from which we sample to draw inferences on the basis of the samples. By assuming normality and homoscedasticity, we know a great deal about our sampled populations, and we can use what we know to draw inferences. For example, in a standard t test.



0コメント

  • 1000 / 1000