Introduction to Psychology Research Methods
Index:
What this Is and What this Isn't
What this is:
These notes were made by a student with a bad memory, in order
to attempt to retain highlights of the lectures.
What this isn't:
They are not a substitute for reading the text and taking a class.
Is Psychology a Science?
Psychology is the science of mind and behavior when it is studied using
the scientific method.
When one studies something by using
the scientific method of observation and test,
then one is doing science.
Science does not provide "The Truth", which is a matter of faith not science.
However, scientific research allows us to say:
"At this point,
the best evidence shows that XXX is what is most likely to be happening."
The goals of Psychology scientific research are to discover:
- What we do - the description.
- Why we do it - the explanation, including testable theories.
Lesson 1: How do you know what you know?
|
- Psychology research is how psychologists learn
- What people do and
- Why they do it.
-
How do you know what you know?
- An expert told you.
But different experts say different things, or they may agree but still be in error.
- Common sense: knowledge based on personal experience.
But it can be affected by mood and made into a biased sample.
Also, we might experience a "False Consensus" because we tend to believe that people have the same kinds of experiences
that we have, and tend to have the same opinions, and tend to do what we do.
- Common knowledge: information repeated so many times that people don't bother to think about it.
But common knowledge (such as proverbs and aphorisms) can be self-contradictory.
-
If someone makes a claim about how the world operates, or what they can do,
you can tell them "Show me!".
-
1624: Francis Bacon made his prescription for knowing:
- Gather as much data as possible.
- Arrange the data in lists. Throw none of the data away.
That means you write it down instead of trusting your memory.
- Look for patterns in the data.
-
1660s: Isaac Newton developed the scientific method:
- Observe events carefully using procedures that anyone else could copy and use.
- Record your observations.
- Explain your observations. i.e. Develop your theory.
- Test your observation by predicting what is going to happen next.
- Repeat the above sequence of steps, revising your theory as you acquire more data.
Newton's Rules of Reasoning:
- Law of Parsimony: Admit only as many causes of natural things as are
(a) true and (b) sufficient
to explain their appearances.
- To the same natural effects assign the same causes.
- Extrapolation from experiments may be performed.
- "Propositions inferred by general induction from phenomena"
(i.e., theories obtained from experiments)
can be considered roughly true until other phenomena
(note: not theories)
occur.
Exercise:
Pick any statement that you argue with someone about.
Self-check on why you hold your opinion:
- You believe some expert.
- Common sense, based on recollections of personal experience.
- Common knowledge.
- The scientific method.
Lesson 2: Boundaries of science.
|
- Science is a set of beliefs about knowledge of the natural universe, including:
- there are limits to what we can find out.
- there are methods to acquiring knowledge,
and some of these are more accurate than others.
- The main goals of science:
- to describe events accurately.
- to explain events.
- Science:
- is a way of looking at the world.
- is not technology, not things that have been invented.
- Data and theory.
Either can come first.
- Scientist study measurable events, directly or indirectly:
"If you want to know about something, look at it."
In the scientific or empirical method, your conclusions and beliefs
are based on observation.
- Determinism: Everything is caused by something.
Every scientific theory assumes that events are determined by identifiable factors.
- A theory
is based on evidence (not speculation) collected in ways that safeguard
against bias and error. It is a set of ideas:
- whose purpose is to describe and predict events.
- used to explain events.
- about the relationships of variables.
A good theory is:
- accurate (seems to be working; has predictive validity).
- parsimonious (explain the phenomenon with fewer possible concepts
and assumptions than the others).
- fruitful (generative; stirs up new ideas).
- extensive (explains many different events).
- internally consistent.
- useful.
- unlikely to be wrong (even though it can never be proved true).
- Data
are events that can be measured.
The observed events are facts that do not change with time.
Their interpretation and the theory,
however, may change.
- Basic science emphasizes understanding the relationship among variables,
develops ideas that help us see how these variables are related,
and how the universe operates.
Basic psychological research
is designed to increase knowledge and understanding about behavior.
- Applied science emphasizes finding solutions to everyday problems.
Applied psychological research
is designed to increase our understanding of and solutions to specific and common problems:
- To discover what we do (description).
- To discover why we do it and describe this with testable theories (explanation).
- The two approaches interact. Kurt Lewin: "there is nothing as practical
as a good theory."
- Induction:
data leads to a theory;
collect data
by observing events; then create a theory
to explain the data.
Move from a part to a whole. Data from a particular study are used to develop a general theory.
Advantage: you go where the data
leads you and avoid having preconceptions.
Disadvantage: the resulting theory is limited to the current set of observations; you don't get a general principle.
- Deduction:
theory data leads to data;
construct a theory
to predict events; then observe the data.
Advantage: you can develop general principles that apply to many sets of data.
Advantage: having a theory and doing deductions helps us avoid fishing expeditions.
Disadvantage: one can become too strongly attached to a theory.
- Experiments: some terms:
Dependent variable
Independent variable(s).
Exercise:
Pick any statement that you make or hear.
Self-check on whether it is:
- A fact: testable. Often the statement includes numbers or greater-less-than comparisons. Can be
confirmed or disconfirmed by careful observation.
- An opinion: value-laden; expresses how someone feels; right and wrong.
Lesson 3: Testable hypotheses.
|
- Science is a self-correcting way of looking at the world.
- Karl Popper said scientists should emphasize falsifiability.
To improve scientific knowledge:
- Assume that our goal is to make accurate predictions.
- All scientific knowledge is tentative.
We can't prove that a theory is correct, only that it is false.
While it's useful to obtain data that confirms a theory,
you make greater contributions in your field by picking apart a theory.
- We make progress when a theory is falsified because we eliminate false theories.
Each new theory is better because it is more inclusive and explains more evidence,
i.e. the data explained by the old theory plus the new data that eliminated the old theory.
- To be falsifiable, a theory must be testable; concepts must refer to measurable events.
Note that psychoanalysis includes theories that are not testable (such as proof of the Oedipus complex).
A "universal" has no exceptions, so can't be tested.
"Language is universal": but it can't be if 50% of people with autism
don't develop language.
- Pseudo science:
A set of beliefs and evidence that masquerade as science, while violating at least one rule of the
the scientific method.
e.g., ESP; its "shyness" says ESP is always there even if you can't see it;
that's a universal; so it can't be measured scientifically; can't be falsified.
e.g., Creationism: claim that God created fossils and geological forms to make Earth
appear old.
This is not testable.
Supernatural powers are not testable.
They do not belong in biological science.
- A theory (set of beliefs) never becomes a fact (event).
- Testable
hypothesis:
a statement about the relationship of two (or more) measurable variables.
IV (Independent Variable): Manipulated by the experimenter;
causes things to happen.
DV (Dependent Variable): Alleged effect;
the response that the experimenter measures.
e.g. Hypothesis: The consumption of alcohol reduced tension.
You must create
operational definitions
of your concepts in terms of what you will measure.
Here, IV could be vodka consumed (0 or 2 oz) or beer consumed (pints or 6-packs).
DV
could be measured blood pressure, galvanic skin response (G.S.R., a quick and easy way of measuring psychological arousal), etc.
- Exercise: How would you code emotions, e.g. of people in a parade? Posture? Gestures? Vocalization? Multidimensional.
Lesson 4: Thinking errors; validity.
|
- Validity ~ Truthfulness.
- Concurrent Validity.
The score on one measure relates strongly to the score on a different measure.
When two different procedures lead to the same conclusion, we increase our confidence that what we are doing
is based on reality.
e.g. for depressions, Beck's Depression Inventory plus clinical interviews by two psychiatrists.
Beware: results could match but they could both be wrong.
- Construct Validity.
You show you are measuring what you intend to measure.
The procedures are closely linked to reality.
- External Validity.
Allows findings to be applied to other populations.
- Internal Validity.
Essential: without this, you have nothing.
You show clear cause and effect by ruling out
confound variables.
- Predictive Validity.
Very important if the measurement helps you predict future behavior.
(Example: job-related standardized tests can be better than interviews at predicting job success.)
- Intervening variables:
- 1st event may be non-observable but produce an observable result.
(Example: dreams are not observable directly but they change the observed EKG;
REM can be recognized and a subject woken and asked what they dreamed.
- An abstract concept (such as gravity or memory) can link an
independent variable
with a
dependent variable.
- Three features of a good theory, after Diana Kuhn (2002):
- Absolutist View. The world is what it appears to be:
"This is this.".
- Multiplist View. Everyone has an opinion about the world.
All claims are equally valid:
"I don't care what anyone else says; I believe X."
- Evaluativist View. Different claims exist about the way the world is.
Claims can be judged by the evidence used to support them.
- Why we make mistakes on seeing the evidence:
- Routine errors: (a) miss a crucial piece of information or
(b) get overpowered by previous knowledge.
- Misinterpret random events:
(a) our brains automatically organize information.
(b) we desire control.
(Example: basket ball free throws are randomly successful.
But people prefer to believe that confidence influences performance,
which biases people to pay attention to streaks.
- Hindsight bias.
- Regression toward the mean.
- Confirmation bias leads to incomplete recall and records, and to excuses.
Claims can be judged by the evidence used to support them.
- "Necessary" does not equal "sufficient", to prove that a belief is correct:
- It is "necessary" for at least one case where
the behavior occurs, for a belief to be possibly correct.
- Multiple instances are needed to provide enough evidence that this belief is highly likely to be correct.
Lesson 5: Observations.
Types of observations:
- Narrative Recording.
- A continuous written record of a subject's behavior. Field notes.
- Good for:
- Preliminary study; general impression; often it's the first type of data collected.
- Determination of what other form of recording is most appropriate, especially
Frequency Recording,
Duration Recording, or
Time Sampling.
- Suggesting questions about specific behaviors.
- Suggesting ideas on operalization of variables.
e.g. a narrative that suggests social withdrawal can lead to
an operalization of "not responding to peers".
- Disadvantages:
- Requires continuous observation: but can't write while observing.
- Presence of observers can affect behavior.
- Doesn't quantify or count.
- If a camera is used (instead of an observer) the record shows only what the
camera points at and only when it is on and recording; tapes are tedious to review.
- Frequency Recording.
- Record how many times a discrete behavior occurs.
- Good for:
- Behavior that has obvious boundaries: a clear start and end.
- Behavior that lasts roughly the same amount of time for each occurrence.
- Disadvantages:
- Requires continuous observation: but can't write while observing; may miss occurrences while
writing. Though it's easier than
narrative recording.
- Difficult if behavior is very frequent or quick (e.g. tics).
- Difficult if behavior does not have a clear start and end
(e.g. thought processes; emotions).
- Difficult if behavior lasts for greatly different amounts of time (such
as 2 minutes to 20 minutes) for each occurrence.
- Difficult if more than about six behavior must be recorded simultaneously.
- Duration Recording.
- Record the interval of time during which a behavior occurs.
Need a stopwatch and continuous observation.
- A subtype is
Latency Recording,
measuring the delay between a stimulus (e.g. a request)
and the beginning of the response, or the time between getting into bed and falling asleep
(5-10 minutes is normal).
- Good for:
- Behavior that has obvious boundaries: a clear start and end.
- Behavior that lasts for varying amounts of time for different occurrences.
- Disadvantages:
- Requires continuous observation: but can't write while observing; may miss occurrences while
writing.
- You need one or more stop watches.
- Difficult if behavior does not have a clear start and end
(e.g. thought processes; emotions).
- Time Sampling.
- Record the presence or absence of a behavior at serial points in time, e.g. 10-minute intervals.
The observer does not look all the time but only every 10 minutes.
- Good for:
- Busy people! It was invented to avoid the need for continuous observation.
- Behaviors that are not discrete: very flexible.
- Can be used for many different behaviors.
- Disadvantages:
- Less precise than
Frequency Recording and
Duration Recording,
as you get only a rough sample.
- Not good for behaviors that occur at a very low rate, e.g. stealing.
- The subject may learn the schedule and do the "desired"
behavior only at the observation time; to avoid this,
use a variable schedule.
Observable Definitions of Behavior:
- We want to see what people will do, and to find connections.
We need to avoid problems with observations, particularly:
- Subjective observations can be influenced by personal bias.
- General terms are hard to quantify.
- Therefore we establish an Objective Definition of Behavior that:
- Is limited to what we see and hear.
- Is so specific that anyone reading the definition will know
what the behavior looks and sounds like.
- Specifies examples and non-examples.
- Advantages of an Objective Definition include:
- Accuracy is increased because bias is decreased.
- Subject is more likely to be treated fairly.
- Communication is easier and more precise: other people know what you counted and what you didn't count.
Lesson 6: Problems with observations; indirect observation.
|
- Observe an effect (only an estimate) of a behavior.
Later could collect field data for physical traces that might supplement the observations, e.g.
- Frequency of public clocks might indicate cultural attitude toward punctuality.
- Students' use of text books.
or archival data:
- Galton (1872) used historical data to look for correlation of power of prayer for royalty.
[Though this is flawed in that he did not look at the results of prayers not being given
for royals that were sick but unannounced as such.]
- Hirt (1982); Schlenker (1995): home-field advantage, especially when the audience is close
as in basket ball.
- Reactivity.
Changes occur in the subjects behavior when the subjects know
they are being observed. To avoid this:
- Hide: be remote; blend in:
- 1-way mirror.
- Remote observer with binoculars.
- Hidden remote-control camera.
- Blend into crowd in restaurant, bar, train terminal.
- Habituate.
- Habituate people till they stop responding to the stimulus of your presence;
in effect they become bored with you observing them.
- Some people habituate quickly while others remain anxious too long.
- Can depend on the situation.
- Deceive.
Deceive them about what you are looking for.
[e.g. tell them that you want to see how they make decisions.
But you might be looking for how they deal with the frustration caused by a confederate.]
- Indirect Observation.
Measure an effect of the behavior, rather than the actual behavior.
e.g. look at # of speeding tickets, sales records, etc.
- Estimate Size of Reactivity Problem.
Measure other kinds of the behavior, as well as those that interest you.
If those behaviors seem "normal" then maybe there is minimal reactivity.
- Less precise than
Frequency Recording and
Duration Recording,
as you get only a rough sample.
- Not good for behaviors that occur at a very low rate, e.g. stealing.
- The subject may learn the schedule and do the "desired"
behavior only at the observation time; to avoid this,
use a variable schedule.
- Researcher Bias.
- Systematic errors in observations result from the researcher's bias.
- a.k.a. confirmation bias
- The observer notices what confirms her pet theory and ignores what does not.
- e.g. Rosenthal's experiment: gave each participant a rat to train; some participants were
told that their rat was from a strain that learns maze running fast;
the others were told that their rat was from a strain that learns maze running slowly;
researchers with high expectations trained their rats faster. BUT THE RATS WERE ALL FROM THE SAME STRAIN.
The difference was caused by each researcher's expectations and bias.
- Solutions:
- Use automatic recording equipment to eliminate human recording.
Good in cognitive psychology; not useable much in other areas.
- Have more than one person observe the same event.
Estimate the seriousness of the problem through inter-observer (i-o) reliability:
the degree to which independent observers observe the same result.
Divide the smaller number of observations (e.g. one observer sees 20 events)
by the other (e.g. 22 events) => 20/22 = 90.9%, which is good.
Low i-o reliability (under 70%) can result from a poor definition of target behavior,
or fatigue, or boredom.
High % does not guarantee accuracy: both observers may have made the same mistake.
But if the i-o reliability is low, then the procedure is not reliable and thus cannot be accurate.
Rule-of-thumb: we expect 85% or better.
70% or lower makes us wonder what was wrong with the experiment.
- Naturalistic observations.
- Memory is rarely precise; survey are less accurate than naturalistic observations.
-
Research on pace of life. Observers in 36 USA cities recorded four types of data
related to people being in a hurry:
- Time to walk 60 feet.
- How fast bank tellers make change for the same bill.
- How fast postal workers explain the difference between second and third class mail
- What portion of randomly stopped people are wearing a watch.
The fastest paced were on the East coast (Boston, Buffalo, NYC) [dial "NERVOUS" for time]
and the slowest on the West [dial "POPCORN"] and south (Sacramento, Shreveport LA, San Jose)
- Compare with another pace-of-life study where naturalistic observations were made but
not in clearly similar circumstances:
- Speed of walking: fastest adults in Ireland; slowest adults in Brazil.
- Speed of kids walking through grocery stores.
- Also a study observing helping:
- Confederate with fake injury drops papers.
- Confederate asks passersby to change a dollar.
- Lost letter technique - who puts a found addressed envelope in the mail.
- What donations are made to the United Way.
Lesson 7: Experiments.
|
- Experiment.
"The manipulation of some environment in order to produce a specific comparison
while other aspects of the situation are held constant." [DD]
- Experimental Method.
"A method based on strict control in experimentation, for making valid inferences
concerning the relationships between one variable and another."
[Ray].
- Participants (preferred term) = subjects [old term and the generic term]
= people who participate in experiments.
- Design.
- Between-groups design.
- Divide a given population into groups. These groups are equal at the start
of the experiment.
- One of the groups received the treatment.
- Then the researcher must determine if the groups are still equal, or if the treatment made a difference.
- Within-subjects designs.
- All of the participants go through all the steps of the procedure.
- At some points they get the therapy and at some points they do not.
- This saves time because you need fewer subjects.
But many hypotheses cannot be tested thus.
- Does it look like the therapy makes a difference.
- Design. You must have control:
- Manipulate one variable at a time.
- Hold the testing situation constant.
Ideally:
- one thing (the IV) is varied.
- All other variable that might influence behavior are held constant. These are the control variables.
- How not to do experiments - examples:
- Possible control variables are the individual pre-existing differences
of subjects.
- Experimental setting should be consistent and controlled.
Same light, noise, types of people present.
- Experimental procedure should be consistent and controlled,
with the same number of steps taking the same amount of time to both groups.
If some experiment can't be run with a required fixed variable (e.g., at same time of day),
and you suspect that variable may affect the responses,
add that variable to you design and check for it.
- Individual pre-existing differences of subjects.
- Rats can be bred true by selective breeding. Can't do this with humans.
- Compensate by random assignment:
each subject has an equal chance of being assigned to any group.
e.g. flip a coin or use a random number table to assign to any group.
Behind this lurk the statistics to analyze the result.
How big is random enough?
In practice, 20-plus.
If they are as small as 15 or 16, statisticians become skeptical that there is
enough randomness.
OK with 17 or 18 in each cell.
Note that a population may be heterogeneous (as at Cabrillo) or bimodal (as at Harvard
where they may be either smart or rich).
- Random assignment
increases the internal validity.
If we have equivalent groups then we are more likely to have a valid procedure.
You must have internal validity first. Then you consider
external validity.
- Random sampling (by contrast) can increase external validity.
Taking a random sample from the general population will generalize.
- We do not usually do random sampling. Instead we get an accumulation of evidence
or else you discover that cross-cultural effects are substantial.
- Subject matching can be used if you can't get enough subjects (e.g. people
with Tourett's). Important subject characteristics are matched across the different groups.
It is not entirely random.
If I think there might be a characteristic that could contaminate my results, I will match.
e.g. 2 people with IQ of 100; randomly assign one of each matching pair to each of the two conditions.
If someone can't be matched, they can't be used.
e.g. excluding from a taste test people with objection to cola has no effect on internal validity
but it does affect external validity.
- Effect of increasing amount of control: internal validity rises,
giving you a greater chance to see cause and effect.
But the more artificial the situation is, the less the external validity.
Therefore you need to run more than one experiment. Replication in different situations can fill in the gaps.
You must have a control group or control condition.
- Portacaval Shunt Studies made in 1940s were reviewed 20 years later. Although more studies were
"enthusiastic" about the Shunt, those tended to be the uncontrolled studies:
Degree of control.
| High Enthusiasm.
| Moderate Enthusiasm.
| No Enthusiasm.
| Total = 51
|
Uncontrolled.
| 24 | 7 | 1 | 32
|
Poorly controlled.
| 10 | 3 | 2 | 15
|
Well controlled.
| 0 | 1 | 3 | 4
|
Neglecting to include a control condition is a basic flaw in the design.
e.g. 1980 study claimed that having kids play with dolls that are anatomically correct
is a reliable way of diagnosing sexual abuse.
In fact, they neglected to test a control group. If they had,
they would have found that the behavior of non-abused kids (a control group) was the same as the test group.
- Assume the null hypothesis:
"Hypothesis that the differences between two or more population parameters are zero.
Used non-technically to refer to the condition that no difference exists between groups in an experiment."
[Ray.]
In statistics, stick with the
null hypothesis
until the results show it is extremely unlikely that nothing happened.
Probability of getting results by chance alone should be <5% (i.e., p<0.05).
If it's extremely unlikely there is no relationship, then reject the
null hypothesis.
With inferential statistics you are always working backwards.
Similarly, assume that a person before a jury had the bad luck to be accused.
Be extremely skeptical of claims of guilt:
"We are all supposed to be here because the person is here because of an innocent mistake."
[DD].
- Statistics allow you to test the significance of a difference, to test how likely it is to occur by chance.
When it's not 100%, you need to calculate probability. e.g.:
Bystander helping response when a girl drops papers as she walks by.
| Alone.
| With others.
| Subtracted difference
|
This is what we measure.
| 70% | 20% | 50%
|
These are the perfect data known only to God
| 66% | 25% | 41%
|
Anything is possible but few things are likely.
Test the
null hypothesis:
that the independent variable
had no effect on the dependent variable.
How likely is it that we could get this effect if the IV has no effect on the DV.
If there is less than 5% chance (by convention) that this could happen by random,
we are certain enough.
e.g., Statistician Fisher (inventor of analysis of variance)
gives example of testing if a lady can tell if milk or tea is poured first into cup; make 3 cups of each; arrange
randomly. As there are 20 different combinations in which the cups could be presented (6+5+4+3+2), there is a
1-in-20 chance that random guesses match the presented cups.
If she is right, simply repeat again and if she is also correct, your results are very significant.
- Chi-square test.
Lesson 8: Writing reports.
- Use
APA format.
Lesson 9: Threats to validity.
|
- Treatment variance is the amount of change caused by our treatment.
Error variance is the noise.
Try to get Treatment variance >> Error variance.
- Reminder: Internal validity shows the relationship of IV and DV.
Construct validity means that we are observing what we intend to observe.
-
Obscuring factors prevent you from seeing a relationship
between
independent variable
and
dependent variable.
With too much noise in the data, you err by missing a relationship (type II error).
Contrast with
confounding factors.
Causes:
- Variability through individual differences.
N too small.
Solution: use a larger sample and/or counterbalance;
or change to use matched-group design;
or change to use within-subjects design.
- DV
is not sensitive enough; task is too hard or too easy;
results crowd in a floor or ceiling effect.
- Ineffective manipulation: range of values of IV is weak
or too small.
- Variability through the situation,
such as when there are distractions or
when the behavior requires a large effort.
- Measurement error, such as by self expression.
Includes unreliability of self reports.
- Confounding factors
[also confound (or confounding) variables]:
an irrelevant variable that varies with the IV.
It was not introduced by the experimenter but it can bias the research.
when a change occurs in the DV but there is also a change in some other
factor besides the IV,
which might be interfering with the DV.
If any confounding factor is not taken into account, you can err by
claiming a relationship that does not exist (type I error).
Contrast with
obscuring factors.
Confound or cause of problem
| Solution
|
Lack of group equivalence.
| Random assignment of subjects
to groups.
|
Changes within procedures, such as wear and tear on equipment
or tiring of experimenters.
| Standardize procedures.
Use computers as much as possible.
|
Experimenter effects or bias.
| Double blind.
|
Demand characteristics: features of the procedure give the subjects a hint
of the expected behavior.
| Remove such cues.
|
Attrition: loss of participants during the course of the study can bias the samples.
| Make efforts to avoid attrition. Document what attrition occurs.
|
Maturation: effects of growing up (for children) and aging are more
significant in longer-term studies.
| Control group.
|
Lesson 10: External validity.
|
- Threats to external validity.
- Population.
If the group is homogeneous (e.g. middle-class, white, 19-20), it may not generalize.
Volunteers may behave differently from people that are paid.
- Experimental setting: mundane realism (to what extent do the events in the lab
look like events in everyday life) and experiential realism
(do the participants have a real experience, take it seriously, and get involved).
- Operational definition.
Stay within the limits of your definitions when drawing conclusions.
- The most common mistakes when interpreting results:
- Misuse of statistics.
e.g. the wrong test for the design or for the data.
- Ignoring lack of group equivalence.
- Overlooking confounding variables,
where things were not controlled for.
- Believing correlation shows causation.
- Generalizing beyond the bounds of the experiment.
- Ignoring alternative explanations.
Parsimony many help here. Or be aware if there
are two or more equally simple and good explanations.
- Assuming that statistical significance is the same as
practical significance.
Lesson 11: Within-subjects designs.
|
- Within-subjects design:
Experimental design such that each participant serves in every group and receives every
level (two or more levels) of the
independent variable.
For each different treatment, each individual's performance is measured through the
dependent variable.
- Advantages: each participant serves every group so the groups are all the same at the
start of the experiment; fewer participants are needed than for between-subjects design;
greater sensitivity to changes in treatment effects.
- Problems can arise due to order effects,
fatigue effects (see as a confounding decrease in performance),
and practice effects (see as a confounding increase in performance).
A poor choice if any experimental condition has a long-term effect on a subject.
Use reversal design to check for carry over: Condition A -> Condition B -> Condition A.
Does DV reverse so that its value is the same from each Condition A?
- To minimize the confounding effects of such problems, use
intragroup counterbalancing:
- Each condition occurs equally often.
- Each condition precedes and follows all other conditions the same number of times.
- Every possible sequence appears at each presentation of the treatment.
Lesson 12: Factorial Designs
|
- Factorial design:
Two or more IVs are used in such a way that
all possible combinations of the IVs are included in the experiment.
- A 2*2 experiment has 2*2 = 4 combinations.
A 4*4 experiment (which has 2 factors each of 4 levels) has 4*4 = 16 combinations or conditions.
- Benefits:
- Better approximation to the real world.
- Can resolve conflicts.
- Most theories today predict interactions,
which a factorial design can investigate.
- Testing two variables in a factorial experiment can be better than testing each variable
in separate experiments because you have more control over setting and experimenter variables.
- Between groups: assign randomly to each combination.
E.g., Godden and Baddeley (1975)
tested learning and remembering in distinctive situations (wearing frog suits).
- Within subjects either:
- Counter balance: half of the subjects chosen randomly experience condition A
then condition B; the other half experience B then A.
- Or randomize: the trials are quick and participants can flip quickly
among conditions, e.g. mental rotation.
- Mixed factorial design:
Reduces variability caused by individual differences.
Includes
within-subjects
and between-subjects components.
- Main effect:.
The influence of one independent variable on the dependent variable.
In a graph, the distance between lines indicates the strength
of the main effect of one IV.
The slope of the lines indicates the main effect
of the other IV.
- Interaction effect:.
The effect of IV1 on the DV depends on the effect of IV2.
In a graph, the difference of slopes indicates the interaction.
Lesson 13: Surveys and Sampling Procedures
|
- Difference between survey researcher and experimental researcher:
- Survey researcher bases general conclusions on one study.
- Experimental researcher bases general conclusions on many different studies.
- Surveys:
- Are best where:
(a) direct observation is difficult;
(b) you actually do want opinion rather than behavior;
(c) for exploratory research to see if variables are related;
(d) intervention would be unethical.
(e) you need correlation rather than causality data.
- Need care on how the subjects are selected, how they are asked,
and what they are asked. It's most important to get a representative sample
of the population.
Use Random selection to get a
representative sample from a complete population
and avoid bias.
Cluster sampling. e.g. from a list of all counties in the USA,
randomly select a sample of 60 counties; this is the first cluster.
For those counties, get a list of all the high schools that have home rooms for
seniors; from that list randomly select 400; this is the second cluster.
Then give the survey to all students in the randomly selected homerooms.
- Need caution on interpretation because what people say they do can be different
from what they actually do.
- Need further caution because if the participants think too much they have been found
to make worse estimates than if they respond quickly.
- Adjust the survey sample size depending on the homogeneity of the population.
Nationwide, for a representative sample with confidence interval +/-3%,
1500 responses can be sufficient.
Ray (p.331) says that the size depends on how many people
are available to be in the survey and how homogenous are these people. He gives this formula:
Sample size =
SQUARE OF
(Confidence level * Variation in population / Desired precision )
|
- Make stratified random samples where appropriate.
Define your strata by subgroups, e.g. gender or race.
Within each group get a random sample.
- Social desirability bias: tendency of someone being surveyed to say whatever makes them look good.
- Non-response bias: distortion of the results that occur because many
randomly selected people did not do the survey.
They are replaced by others and yet the sample is now no longer random.
- Face-to-face bias: response can depend on reactions to the characteristics of the
person collecting the data or the perceived social stigma of the truthful response.
- Typical non-response rates:
85% for phone surveys (cheap, fast, unreliable).
70% for survey by mail.
20% face-to-face.
- Steps in questionnaire constructions:
- Decide what information you need.
- Choose the type of questionnaire to use (phone, pencil and paper, etc).
- Draft the questionnaire with easy, non-threatening questions first.
If there are sensitive questions save them for the end.
- Decide which questions are closed (true/false and multiple-choice are like this)
and which are open-ended (fill-in-the-blank and essays are like this).
- Quantify where possible. e.g. ask "How often do you do X?"
rather than "Do you do X?"
- For an valid survey, avoid loaded questions.
- Pilot test the first draft.
- Revise it.
- Give it to a few people in the population, to discover ambiguities, omissions, etc.
- Sampling procedures:
Non-probability
| Convenience
| People available, e.g. on street.
|
| Quota
| Sample a number of individuals of each desired type.
|
Probability
| Cluster
| Select randomly population units; enlist inhabitants.
|
| Multi-stage
| Randomly sample in different levels, e.g. sample to get a set of
colleges, then at each of those, sample students.
|
| Simple random
| Use a random number table.
Every member has an equal chance of selection.
|
| Stratified random
| Divide by feature (e.g. gender) and then select randomly.
|
| Systematic
| Choose every Nth person from a randomized list.
|
Lesson 14: Bad Statistics
|
- Two common mistakes:
- Ignoring regression toward the mean.
But extreme values of any variable tend to be followed by values that are less extreme.
- Ignoring base rates.
e.g. more people are killed at traffic lights if they walk when the light is green.
Also more fluctuations in a smaller than a larger hospital,
over 60% of births are boys
- Bad statistics are reported by the press:
- Research suggests that 200K people exhibit stalker's traits.
This mutated to 200K celebrities are being stalked.
- Research suggests that 150K women have anorexia and this can result in death.
This mutated to 150K deaths per year.
But CDC shows 70 deaths/year and only 8.5K/year for ALL causes in females 15-24 years.
- Always ask about statistics in the press:
- Who created the statistic?
Is the source neutral or biased?
- Why was the statistic created?
To motivate people to action?
To persuade the citizens that the problem is under control?
- Was the phenomenon measured properly?
Do you get a clear definition of what was measured?
How loose are the definitions?
e.g. prostrate cancer 5-year survival doubled, but
there is a 10% increase in deaths.
Discrepancy due to earlier detection.
- Compared to what?
What is the average? the trend?
- Beware of large numbers.
- How bad statistics are defended:
- Attack the motive of the person.
- Talk about the Dark Figure,
which is the unknown number of unreported cases.
Avoid speculation on this unmeasureable and (by definition) unknown quantity.
Lesson 15: Ethics
- Risk.
You want a strong enough manipulation to find something but not strong enough to do harm.
Minimal risks: the risks encountered in this procedure are no more than are encountered in
everyday life.
- Informed consent.
Must be given information before the research begins.
Participants have the right to refuse.
They must always have an alternative.
Aftercare: make sure they are O.K. before they leave the lab.
- Debriefing.
Subjects must be told the purpose of the research after it's concluded
and deceptive procedures should be explained.
Deception must be revealed gently.
- Control groups: if treatment is clearly beneficial,
it may not be ethical to consign participants to a control group.
- Animals: Person in charge must have experience working with animals.
PI is responsible.
- Fraud: Misleading colleagues.
Lesson 16: Small-n Designs
Small-n:
- Small-n often called "single subject" design.
Each subject is a separate experiment.
- Reliability is established by replication rather than a significance test.
- Used by practitioners: to see if treatment makes a difference;
when subjects are difficult to get;
and to avoid ethical problem of withholding.
- Usually features very careful controlled procedures and
many measurements of DV.
- Always begin with a baseline measure of behavior, which should be reliable and stable.
- Advantage: Results are very useful for seeing the effects on one person studied.
Also: very flexible.
- Best predictor of future behavior is past behavior.
Can predict what they can do.
- Disadvantage: Does not generalize; cannot be added to other study.
- Do not do "AB" design (baseline plus intervention only), but reversal
"ABA" design, where the intervention is applied and later withdrawn.
- Or do multiple baseline design:
good for ethical concerns when you do not want to remove Tx.
Large-n:
- Subjects are grouped. Data are presented in group averages.
"This is how most people behave."
- Reliability among conditions is established by
determining if the means are significantly different, testing statistical significance.
- Used by practitioners to see if treatment makes a difference,
when subjects are difficult to get,
and to avoid ethical problem of withholding.
Other forms of replication:
- Systematic replication does show generalizability and reliability.
e.g. Pavlov's conditioning and Darley and Latane's helping behavior experiments.
- Conceptual replication uses different procedures to test the same
hypothesis. e.g. (1) Men on a scary bridge follow-up by being more likely to phone
an attractive confederate; (2) Men on exercise bikes might rate erotic films as
more exciting.
Lesson 16 (part 2): Types of Faulty Argument
- Appeal to ignorance: e.g.,
Anything that has not been proved cannot be proved;
or anything that has not been proved false much be true.
- Begging the question.
Making a statement that has not been proved, & using that as if it were established.
- Ad hominem attack.
- Argument from authority.
"The argument is good because the person making the claim is good."
- Justify with aversive consequences.
"This claim must be true; if it isn't true then all kinds
of evils will happen." e.g.,
"This person must be convicted because otherwise many more
crimes of this sort will take place."
- Non sequitur: present two unrelated ideas as if they were related.
"Our country will win the war because God is great."
- Post hoc ergo propter hoc: After that, therefore because of that.
- False dichotomy.
Argument includes only two extremes and leaves out the
many in-between shades.
- Straw man.
Present an absurd argument, attributed to the opposition and knock it down.
"Psychologists claim that if we banned most TV violence there would be no more violence."
[In fact, they claim there would be a small but measurable reduction.]
Lesson 17: Quasi-experiments
Quasi research:
- Don't control variables as much as true experiments.
Therefore don't know cause and effect.
- Useful where manipulation of the IV would be unethical or impossible.
- Usually avoids random assignment of participants to different situations.
- If there is a control group is might not be equivalent to the treatment group.
- One-group pretest-posttest, O-X-0.
O=observation.
X=treatment.
Fancy name for "before and after".
Note that this is NOT a
quasi experiment.
Fatally flawed because of:
- History.
Other things happen during the "X" period that could cause change.
- Practice effect due to testing.
- Maturation.
e.g. brain development with children including kids with autism.
- Non-equivalent control group design:
O-X-O and
O---O.
- Includes a control group.
- Get pretest and posttest for both groups.
- Can't assume that these people are equivalent on major factors
because there was no random assignment.
- Make pretest measure of several attributes related to your hypothesis.
- Can use self reports and direct observations.
- Simple interrupted time-series design: O-O-O-X-O-O-O.
- Make repeated measurements before and after an event.
- Use archival data for information on baseline
that occurred before some traumatic event.
- E.g., use this method to study changes prior to and after major changes in legislation.
- Time-series with non-equivalent control group design:
O-O-O-X-O-O-O and
O-O-O---O-O-O.
"X" might indicate a law (like a speed limit) going into effect in one state.
The other group (control group)
could be people of an adjacent state without the law change but with similar
economics, weather, etc.
-
Subject variables:
= pre-existing characteristics of the participants (sex, age, introversion, ...):
- You can select and classify participants according to the variable(s).
- Problem:
In personality research, you observe changes in behavior.
You explain in terms of personality traits, e.g. openness.
But the alleged cause is never observed directly.
- You can show correlation of a test score and a behavior.
But the trait cannot be manipulated directly, nor observed,
nor shown to exist.
- You can show correlation of biology and behavior.
But you cannot show a cause, i.e. cannot have an experiment.
- You can manipulate the
subject variable
of Age:
- Longitudinal design. Within subjects.
Minimizes random differences between subjects.
But attrition can be high.
e.g. Kraut et al. (1998) in American Psychologist (53, pp. 1017-31)
found the 'internet paradox', where a social technology reduced social involvement and
psychological well-being'.
A small correlation (though a not a causation) was shown amount of Internet use
and amount of depression.
- Cross-sectional design.
Avoid attrition.
But the age may not be the only difference.
e.g. political orientation is formed in the ages of 17-25.
Glossary: Terms and Jargon
|
- actor-observer effect (n.)
- The tendency of a person to attribute her own behavior to external reasons but
that of others to internal causes.
- alpha level (n.)
- The probability of making a
Type I error.
Contrast with
beta level and
Type II error.
- aggression (n.)
- Behavior intended to harm or injure a person or object.
- ANOVA (n.)
- Analysis of variance, a technique of inferential statistics.
Use ANOVA to compare two (or more) groups in order to make a decision on whether
the independent variable
is influenced by
the dependent variable(s).
- archival research (n.)
- Research using previously collected records that were gathered for another purpose
than the purpose of the present study.
However:
- Data may be incomplete or padded (sales figures or expense accounts, for example).
- Missing or incomplete records.
- Change with time in what data are collected.
- Possible accidental correlation.
- basic research (n.)
- Improves our understanding of behavior.
- beta level (n.)
- The probability of making a
Type II error.
Contrast with
alpha level and
Type I error.
- between-subjects design (n.)
- An experimental design to establish comparisons between one group and another.
Participants of one group receive a different level of
the independent variable
or treatment than do participants in another group.
- bias (n.)
- Prejudice in the design, performance, analysis, or presentation of a research project.
- Systematic errors can result from researchers' expectations.
See also "confirmation bias"
- Minimize by (a) automatic recording equipment and (b) more than one observer.
- See also Researcher Bias.
- central limits theorem (n.)
- If a number of samples are drawn from a population at random,
then the means of the samples tend to be normally distributed.
- classical conditioning (n.)
- Learning through association: a neutral stimulus (conditioned stimulus) is
paired with a stimulus (unconditioned stimulus) that produces an emotional response.
- confederate (n.)
- An accomplice of an experimenter; a participant in a research experiment
assumes that each confederate is another participant or a bystander.
- conformity (n.)
- Acquiescence to perceived group pressure.
- confound (n.)
- A factor that systematically biases the research
but was not purposely introduced by the experimenter.
Techniques to minimize their influence can be built in, and include:
- Counterbalancing.
- Elimination of the confound, e.g. study a single gender.
- Equate the numbers of participants with different values of the confound,
e.g. an equal number of men and women in each group controls for possible gender differences.
- Match.
- Randomize the selection through use of a random number table.
When all else fails,
repeat the experiment under different circumstances that eliminate or alter suspected confounds.
See more at
confounding factor
and
confound variable.
- construct (n.)
-
- control group (n.)
-
Experimental participants that so not receive the experimental treatment
(manipulation of the independent variable).
- covariation principle (n.)
- For something to be the cause of a particular behavior,
it must be present when the behavior occurs and absent when it does not.
[A principle of attribution theory.]
- convenience sample (n.)
- Researchers often generalize from a sample that is convenient rather
than truly random. Especially true of behavioral research.
- correlation coefficient (n.)
-
From -1.00 to +1.00;
a statistical indicator of the direction and strength of the relationship
between two variables.
- correlation studies (n.)
-
Research designed to examine the nature of the relationship between two or more
naturally occurring variables.
- counter balancing (n.)
- Every possible sequence occurs in each presentation of the treatment.
All participants receive the tasks in the same counter-balanced order.
Each condition appears equally often, and it precedes and follows
the other conditions an equal number of times.
- debriefing (n.)
- A procedure at the conclusion of a research session, to give participants
complete information about the purpose of the study.
- deception (n.)
- A research technique where participants in a study are given false information.
- deduction (n.)
- Use a well-established theory (or general principles) to predict observations.
- dependent variable (n.)
- The variable whose value depends upon the value of the
independent variable,
which is manipulated by the experimenter.
An experimental variable (a variable that is measured in an experiment)
because it is believed to depend on the manipulated changes in
the independent variable(s).
- demand characteristics (n.)
- The overall effects of the situation on the behavior of a participant.
This can bias the result such as by preventing adequate influence of the
independent variable
on the
dependent variable.
- experimental methods (n.)
- Research that is designed to test cause-and-effect relationships between variables.
- experimental realism (n.)
- The extent to which an experiment involves the participants, feels to them like 'real life',
and lets them forget the observers inherent in the experiment.
- experimenter effects (n.)
- Unintended effects caused by an experimenter on the behavior of a participant.
- external attribution (n.)
-
Explaining the cause of an event as due to factors (such as luck, other people,
or the situation) external to the subject.
- external validity (n.)
- The degree to which a study's findings can be generalized to subjects other than
those in the study.
- factorial design (n.)
- A design with more than one
independent variable,
where each
IV is presented at every level of the other
IV.
- false consensus bias (n.)
- The tendency to exaggerate how commonly one's own characteristics and opinions
appear in the general population.
- falsification (n.)
- The philosophical position that the goal of science is the falsify proposed hypotheses.
- foot-in-the-door technique (n.)
- A two-step compliance technique where the influencer makes a small request
and secures compliance, and then later follows this with a larger, less desirable request.
- fundamental attribution error (n.)
- The tendency to explain the behavior of others
through internal attributions rather than external attributions.
- Heuristic (n.)
- A mental shortcut that reduces time taken to reach a decision by replacing complex judgment sequences
by approximations and other simple rules of thumb.
- Hypothesis (n.)
- A specific proposition or expectation derived from a theory
about the nature of things.
- Independent Variable (n.)
- An experimental variable that is manipulated by an investigator.
The variable whose value is defined by the experimenter, and thus is outside of the
control of the participants in the experiment.
Compare with
dependent variable.
- induction (n.)
- Observe events, then create a theory ('a statement of general principles') that explains them
and helps us understand the specific events.
- inferential statistics (n.)
- Statistics that describe a population.
Measures how often the result could occur by chance alone if the
null hypothesis
is true.
From a given sample of scores, the statistician infers parameters
related to the set of all possible scores from which that sample was drawn.
E.g., t-test.
E.g., inferential statistics variance = sum-of-squares divided by (N-1).
Compare with descriptive statistics, where variance = sum-of-squares divided by (N).
- internal validity (n.)
- The procedure shows a clear effect of the
independent variable
on the
dependent variable.
- intervening variable (n.)
- Used by researchers to relate
independent variable
and
dependent variable.
- laboratory experiment (n.)
- An experiment in a controlled environment.
- low-ball technique (n.)
- A two-step compliance strategy: the influencer secures agreement with a
request by understating its true cost.
- meta-analysis (n.)
-
A statistical technique that combines data from several empirical studies,
in order to estimate more reliably the overall size of the effect of interest.
If the studies use different independent variables, the effect size for each
study is required.
- null hypothesis (n.)
- Assumption that the population mean is equal to the sample mean.
- operational definition (n.)
- A definition that presents a
construct
in terms of observable operations that can be measured and used.
- objective definition of behavior (n.)
- A definition that is limited to what you see and hear;
it's specific;
and includes examples and non-examples.
- subject variable (n.)
- Pre-existing characteristics of the participants (sex, age, introversion, ...):
- quasi-experiment (n.)
- When the experimenters cannot control variables as much as in a true experiments,
we do not know the cause and effect.
This is used where manipulation of the IV
would be unethical or impossible,
and/or lack of control prevents definitive statements about cause and effect,
and/or it is difficult to control for potential
confound variables.
More information is at Lesson 17: Quasi-experiments.
- The experimenter attempts to isolate a causal influence by selection of situations
instead of manipulation of variables.
e.g., we select cases in which 'X' actually does vary,
such as a person's choice of crossing a ravine on a high bridge rather than a
low one (Dutton and Aron, 1974).
- random assignment (n.)
- Placement of research participants into experimental conditions in a manner
which guarantees that all have an equal chance of being exposed to each level
of the independent variable.
- random selection (n.)
- Each member of the population has an equal chance of being chosen for the sample.
- reliability (n.)
- The degree to which an experiment is consistent and reproducible.
Contrast with validity.
- representative sample (n.)
- A subset of a population that closely matches the
overall characteristics of the population with respect to the distribution
of males and females, racial and ethnic groups, and so on.
- self-fulfilling prophecy (n.)
- An expectation that something will happen, and as a result the searching by a researcher
for confirmation of expectations.
- significant difference (n.)
- A difference between the results for experimental groups or
conditions that would have occurred by chance less than a specified criterion.
In psychology, the usual criterion is a probability, p, of less than 5% or:
- statistics (n.)
-
| Population | Sample |
Mean | μ | M |
Variance | σ2 | S2 |
Standard deviation | σ | S |
- t-test (n.)
- Statistical test.
t = association / (lack-of-association)
If r2 = variance-accounted-for
then
t = (r / squareroot(1 - r2) * squareroot(df)
where
df = degrees of freedom
|
- testable hypothesis (n.)
- Refers to measurable events.
- that's-not-all strategy (n.)
- A two-step compliance technique: the influencer makes a large request
then immediately offers an attraction (particularly a discount or a free bousy)
before the initial request can be refused.
- theory (n.)
- A statement of general principles that helps us understand specific events.
- treatment group (n.)
- Experimental participants exposed to nonzero levels of the
independent variable.
- Type I error (n.)
- The error of rejecting the null hypothesis when it is TRUE.
Contrast with
Type II error.
- Type II error (n.)
- The error of rejecting the null hypothesis when it is FALSE.
Contrast with
Type I error.
- validity (n.)
- The degree to which our ideas and research
are accurate, true, and capable of support.
See also external validity
and internal validity.
Contrast with reliability.
Books
- Mook
-
Contents
1. Testing Our Ideas
Making Friends With Statistics (MFWS): A Look Ahead
2. Theory and Data in Psychology
MFWS: Statistical Significance
3. Data
MFWS: Frequency Distributions
4. Observation and Description I
MFWS: Descriptive Statistics
5. Observation and Description II: Some Technical Problems
MFWS: Scatterplots and Correlations
6. Experiments with One Independent Variable
MFWS: Analysis of Variance and the t Statistic
7. Experimental Control I: Obscuring Factors
MFWS: More about Significance Testing
8. Experimental Control II: Confound Variables
MFWS: Statistical Control
9. Experiments with More than One Independent Variable
MFWS: Factorial Analysis of Variance and More about Interactions
10. Single-Subject and "Small-N" Experiments
MFWS: Signal-Detection Theory
11. Quasi Experiment.
MFWS: The Chi-Square Test
12. The Reliability and Generality of Findings
MFWS: Meta-analysis
13. Ethical Considerations in Research
14. Research Psychology, Pop Psychology, and Intuitive Psychology
MFWS: When is a Problem a Statistical Problem?
Appendix A: Random Numbers and How to Use Them
Appendix B: Statistical Tables
Appendix C: How to Report Research
|
- Ray
-
Methods: Towards a Science of Behavior and Experience (1997)
by William J. Ray.
[Thanks for visiting.]