Why does scientist design experiments
As we discussed at the beginning of this chapter, experimental design is commonly understood and informally implemented in everyday life. We often say that we are conducting an experiment when we try a new restaurant or date a new person. If you wanted to trying a new restaurant to be a true experiment, you would need to recruit a large sample, randomly assign participants to control and experimental groups, pretest and posttest, and clearly and objectively use defined measures of restaurant satisfaction.
Social scientists use this level of rigor and control because they try to maximize the internal validity of their experiment. Internal validity is the confidence researchers have about whether their intervention produced variation in their dependent variable. Thus, experiments are attempts to establish causality between two variables—your treatment and its intended outcome. As we talked about in Chapter 7, nomothetic causal relationships must establish four criteria: covariation, plausibility, temporality, and nonspuriousness.
The logic and rigor of experimental designs allows for causal relationships to be established. Experimenters can assess covariation on the dependent variable through pre- and posttests. The use of experimental and control conditions ensures that some people receive the intervention and others do not, providing variation in the independent variable.
Moreover, since the researcher controls when the intervention is administered, they can be assured that changes in the independent variable the treatment happened before changes the dependent variable the outcome.
In this way, experiments assure temporality. In our restaurant experiment, the assignment to experimental and control groups would show us that people varied in the restaurant they attended.
Additionally, experimenters will have a plausible reason why their intervention would cause changes in the dependent variable. Either theory or previous empirical evidence should indicate the potential for a causal relationship. Perhaps we discover a national poll that found pizza, the type of food our experimental restaurant served, is the most popular food in America. Perhaps this restaurant has good reviews on Yelp or Google.
This evidence would give us a plausible reason to establish our restaurant as causing satisfaction. One of the most important features of experiments is that they allow researchers to eliminate spurious variables. True experiments are usually conducted under strictly controlled laboratory conditions. The intervention must be given to each person in the same way, with a minimal number of other variables that might cause their posttest scores to change.
In our restaurant example, this level of control might prove difficult. We cannot control how many people are waiting for a table, whether participants saw someone famous there, or if there is bad weather. Any of these factors might cause a diner to be less satisfied with their meal. These spurious variables may cause changes in satisfaction that have nothing to do with the restaurant itself, which is an important problem in real-world research.
For this reason, experiments use the laboratory environment try to control as many aspects of the research process as possible. Researchers in large experiments often employ clinicians or other research staff to help them. Researchers train their staff members exhaustively, provide pre-scripted responses to common questions, and control the physical environment of the lab so each person who participates receives the exact same treatment.
Experimental researchers also document their procedures so others can review how well they controlled for spurious variables. While this may seem strange, the systems of our mammalian relatives are similar enough to humans that causal inferences can be made from animal studies to human studies.
It is certainly unethical to deliberately cause humans to become addicted to cocaine and measure them for weeks in a laboratory, but it is currently more ethically acceptable to do so with animals.
There are specific ethical processes for animal research, similar to an IRB review. Researchers claimed that this behavior in rats explained how addiction worked in humans, however Alexander was not so sure. He knew rats were social animals and the procedure from previous experiments did not allow them to socialize. Instead, rats were isolated in small cages with only food, water, and metal walls.
To Alexander, social isolation was a spurious variable that was causing changes in addictive behavior that were not due to the drug itself. Alexander created an experiment of his own, in which rats were allowed to run freely in an interesting environment, socialize and mate with other rats, and of course, drink from a solution that contained an addictive drug.
In this environment, rats did not become hopelessly addicted to drugs. In fact, they had little interest in the substance. This makes intuitive sense to me. If I were in solitary confinement for most of my life, the escape of an addictive drug would seem more tempting than if I were in my natural environment with friends, family, and activities. If the causal relationship is real, it should occur in all or at least most replications of the experiment.
One of the defining features of experiments is that they diligently report their procedures, which allows for easier replication. Recently, researchers at the Reproducibility Project have caused a significant controversy in social science fields like psychology Open Science Collaboration, Despite close coordination with the original researchers, the Reproducibility Project found that nearly two-thirds of psychology experiments published in respected journals were not reproducible.
The implications of the Reproducibility Project are staggering, and social scientists are developing new ways to ensure researchers do not cherry-pick data or change their hypotheses simply to get published.
The conclusions he drew from experimenting on rats were meant to generalize to the population of people with substance use disorders with whom I worked. Experiments seek to establish external validity , or the degree to which their conclusions generalize to larger populations and different situations.
Alexander contends that his conclusions about addiction and social isolation help us understand why people living in deprived, isolated environments are more likely to become addicted to drugs when compared to people living in more enriching environments.
Similarly, earlier rat researchers contended that their results showed these drugs to be instantly addictive, often to the point of death. Neither study will match up perfectly with real life. The real world was much more complicated than the experimental conditions in Rat Park, just as humans are more complex than rats. Social workers are especially attentive to how social context shapes social life. We are likely to point that experiments are rather artificial.
How often do real-world social interactions occur in the lab? Experiments that are conducted in community settings may be less subject to artificiality, though their conditions are less easily controlled. This relationship demonstrates the tension between internal and external validity.
The more researchers tightly control the environment to ensure internal validity, the less they can claim external validity and that their results are applicable to different populations and circumstances. Correspondingly, researchers whose settings are just like the real world will be less able to ensure internal validity, as there are many factors that could pollute the research process.
He started to say "Allons" instead of "Walkies". To his delight, Rover very quickly understood and came running. The dog may respond to a total situation after dinner, going to the door, coat on, call of which what is actually called is only a small part. A change in the call may not matter much to the dog. The results of these and similar tests should indicate whether Rover is specifically responding to the word "allons", or more likely to an overall situation he is well used to.
Notice that these tests do not tell us anything of a dog's ability to learn French words. They are only concerned with the specific case of responding to one French word. We will see later that extrapolating from the specific to the general is very important in scientific methodology. Long-term success of a foreteller of the future.
The Institute for Psychical Research conducted a study on the performance of well-known fortune-tellers. The most positive results involve Arnold Woodchuck who, at the start of each year, makes a series of ten predictions for the coming year in a national tabloid newspaper.
For example, for he predicted a political crisis in Europe the former Yugoslavia? A spokesman for the Institute was 'optimistic' about future studies on Mr Woodchuck. The apparent observation is that Mr Woodchuck has got more predictions correct than would have been expected by chance.
The Institute's hypothesis would be that Mr Woodchuck has some kind of 'psychic powers'. Can we devise an alternative hypothesis? We are dealing here with probability. If we toss an unbiassed coin we get on average the same number of heads as tails. If we asked someone to predict the outcome of the toss, we would not be terribly surprised if from a small number of trials, he got 4 out of 5 right.
Is his 'coin' biased, is he cheating, or does he have psychic powers? The most likely explanation is the 'biased coin' one, i. For example, almost invariably every year there is at least one 'political crisis' in Europe and a 'major human disaster' in Africa. Similarly, football managers have a short shelf-life.
Public sector employees such as nurses, railway signalmen or indeed University teachers have for years perceived themselves to be underpaid whilst their masters either cannot or will not respond appropriately. In contrast, the chances of England's winning the Rugby Union World Cup were over-stated by the English press - and this is a prediction that failed.
Again, the results of this investigation would be limited. They would probably show that the 'biased coin' explanation is the most likely. They would not show a whether Mr Woodchuck has some kind of psychic power; or b whether psychic powers are possible. Notice also that even a large deviation from an expected result can occur by chance in a small sample e. This is very important in Biology, and the basis of the use of statistical methods in biological analysis.
After reading this section you should be able to discriminate between good and bad experimental design. The design of a suitable experiment to test an hypothesis often requires some ingenuity and a suspicious nature. In modern biology, the experiment may involve very sophisticated equipment.
But there are a number of features common to all good experiments and often absent from bad ones which exist whatever the technical details. In summary these are:. Experiments should be capable of discriminating clearly between different hypotheses. It often turns out that two or more hypotheses give indistinguishable results when tested by poorly-designed experiments. Living material is notoriously variable. Usually experiments must be repeated enough times for the results to be analysed statistically.
Similarly, because of biological variability, we must be cautious of generalising our results either from individual creatures to others of the same species, or to other species. For instance, if our hypothesis is about mammals, it is inadequate simply to carry out our experiments on laboratory rats.
Similarly, it is dangerous to extrapolate from healthy students to elite athletes. The experiment must be well controlled. We must eliminate by proper checks the possibility that other factors in the overall test situation produce the effect we are observing, rather than the factor we are interested in. An example: Growth hormone is secreted in response to a number of agents, including the amino acid arginine.
This was shown by injecting volunteers with arginine. As a control, the investigators injected the volunteers with a saline solution. To their surprise, growth hormone was again secreted. The investigators then waved a syringe and needle in front of their volunteers, and found that that provoked growth hormone secretion too.
Growth hormone is now known to be secreted in response to stress as well as arginine. At a more technical level, we must be sure that our method of measurement is reproducible from day to day, between operators in the same laboratory, or between laboratories. Whilst we might be confident about a balance or a ruler, can we be as sure about, say, a method for measuring haemoglobin?
Do two groups of students measuring the same samples by the same methods produce the same results? Quality control helps here. Investigators can subconsciously 'fudge' their data if they know what result they want to find. The answer is to do the experiment 'blind', so the investigators and the subjects, if humans are being studied do not know which treatment's effect they are observing. This can make the logistics of doing the experiment more complex: for example, when determining the haemoglobin concentration of male and female class members.
There is a story about a professor who devised a maze for measuring the intelligence of rats. One day he gave his technicians, who actually made the measurements, three groups of rats. He told them one group had been specially bred for intelligence, one for stupidity and the third was average. The technicians assessed the rats' intelligence and confirmed that the 'bright' group performed the best and the 'stupid' group the worst.
The point is, of course, that the professor had put animals into the three groups at random. They did not differ in intelligence.
Good experiments often, though not always, involve measuring something: a weight, say. When you make measurements, it is important you know both the accuracy and the precision of your measuring system. These two terms are not synonymous: 'accuracy' means the ability of the method to give an unbiassed answer on average, whereas 'precision' is an index of the method's reproducibility.
Ideally your method should be both accurate i. Sometimes one is more important than the other. For example, if you were looking for small changes with time in a quantity such as an athlete's haemoglobin concentration , you would need a precise measure of it rather more than an accurate one.
Accuracy and precision together help you to judge the reliability of your data. They also help you to judge to how many significant figures you should quote your results. For example, if you use a balance reading to the nearest gram, you should give the results to the nearest gram and not, say, to the nearest tenth of a gram. Some experiments are very difficult to do because it is not obvious what can be measured.
This is a real problem in animal behaviour: for example, there is no obvious unit or measure for 'emotional state'. It is usually necessary to isolate measurable components of behaviour.
Thus the speed at which a tiger paces up and down a cage can give some indication of the internal state of the animal but can never give a full picture of it. Many of these points are rather abstract, but they should become clearer when you think about the following examples. Forty bean plants, growing in pots, were covered one afternoon by individual glass containers and left in the laboratory overnight. Next morning, the inside of the lid of each container was found to be covered in droplets of a fluid which proved to be water.
The water could have come from the plants, the soil, the pots, or the air in the jar. Control experiments should have been set up to test for these possibilities. Example 2: Is your supermarket's 'own brand' of washing powder as good as a nationally-advertised one?
Eric Triton bemoaned the fact that his wife Ariel insisted on washing his clothes with their local supermarket's own brand of powder. He was sure the well-known brand he saw performing miracles on television most evenings would do better. He therefore set out to prove as much. Mr Triton decided to compare the effectiveness of the two products on what his wife called 'difficult' dirt: grass stains on white linen handkerchiefs.
He followed the instructions on the packets exactly, weighing out the same amount of powder and using their washing machine's programme for white linens. Mr Triton was aware of the need for an index of 'cleanliness' and therefore devised a subjective scale, ranging from 10 'whiter than white' to 0 the starting level of dirtiness.
Mr Triton's belief was substantially confirmed. He scored the handkerchief cleaned by the national brand an impressive 8, whereas the own-brand powder only managed 7. Triumphantly, he reported the outcome to his wife. Mrs Triton, however, was unimpressed. She pointed out to her husband that there were several flaws in his experiment and convinced him that the outcome was 'not proven'. There is a story about an eminent Professor at Cambridge who gave a paper at a scientific meeting and was asked by a questioner "what statistical test did you use to verify your results?
I draw a histogram of my results, pin it to the notice board, then walk to the other end of the corridor. If I can still see a difference between the treatments then it's significant".
The relevance of this story lies in what it does not say! If an experiment is designed and executed properly - as we would expect of an eminent scientist - then the results often speak for themselves. For example, this might be true of experiments in which mutants are generated or genes inserted in an organism, giving a clear change of behaviour such as resistance to an antibiotic or expression of a new trait.
Such "all or nothing" effects seldom need to be backed by statistical tests, but they still need good experimental design. However, in many areas of biology we work with variable effects - differences in the growth rates of organisms, quantitative differences in antibiotic resistance or in size or in rates of biochemical reactions, etc.
Then we not only need statistical tests to analyse those differences but we also need good experimental design to ensure that we haven't biased our results in some way, without realising it. Good experimental design is the key to good science. But it's not as easy as it might seem. In many cases good experimental design involves having a clear idea about how we will analyse the results when we get them.
That's why statisiticians often tell us to think about the statistical tests we will use before we start an experiment. Three important steps in good experimental design.
Define the objectives. Record i. Devise a strategy. Record precisely how you can achieve the objective. This includes thinking about the size and structure of the experiment - how many treatments? Set down all the operational details. How will the experiment be performed in practice? In what order will things be done? Should the treatments be randomised or follow a set structure?
Can the experiment be done in a day? Will there be time for lunch? If all this sounds trivial or obvious, then read on. It's not as easy as you think! Example 1. Experiments that yield no useful results because we did not collect enough data. Suppose that we want to test the results of a Mendelian genetic cross. We start with 2 parents of genotype AABB and aabb where A and a represent the dominant and recessive alleles of one gene, and B and b represent the dominant and recessive alleles of another gene.
We know that all the F 1 generation first generation progeny of these parents will have genotype AaBb and that their phenotype will display both dominant alleles e.
0コメント