|
Results
and Discussion
The result of the
statistical tests completed using the samples of data obtained from the two
different types of questionnaires yielded six significant results on
questions/statements one, two, three, five, eleven, and fourteen. The p-values
for the t-tests of these questions/statements all fell below the conventional
p-value of 0.05 needed in order to be considered significant (see Table
1). This means that the differences between those questions mentioned have a
low probability (less than a five percent chance) of being the result of chance
variation but rather are most likely due to differences in the
questions/statements themselves.
In regard to
question one, the difference between the two questions is the scale on which
subjects were asked to respond. In one case, the scale was zero to twenty
whereas in the other case the scale was negative ten to positive ten (see Appendices
A & B). We predicted that the responses on the zero to twenty scale
would be more moderate than the answer on the negative ten to ten scale. The
reason being that negative ten sounds more extreme in terms of its negative
connotations than does zero. This idea was supported as the mean for
questionnaire 1 (with scale zero to twenty) was 11.45 while the mean for
questionnaire 2 was 8.64 (the value was originally -1.36 but was adjusted by +10
in order to be compared on the same scale as questionnaire 1; see Table
1 for more data). A t-test yielded significant results (t (123) = 2.12, p =
0.04). This indicates that people tended to answer that Native Americans were
treated worse on questionnaire 2, with the negative ten to ten scale.
Statement
number two on both questionnaires was identical except for the fact that one was
the negative form of the other. Therefore all the data that was obtained from
questionnaire 1 (see Appendix A) was
inverted, that is to say all “1’s” were converted to “5’s”, all
“2’s” to “4’s”, all “4’s” to “2’s”, and all “5’s”
to “1’s” while “3’s” were not altered. This was done so that the two
data sets could be compared statistically. We were not sure what the effect this
wording difference would have, but we predicted the significant difference that
made itself apparent (t (125) = -1.28, p = 0.003). It turned out that the in
regard to the positive form of the question (in questionnaire 2; see Appendix
B), students tended to say that teachers should be required to report such
information (the mean was 3.48) while on the negative form, students tended to
say this less so (the inverted mean was 2.81; the original mean was 3.19 and
5.00 - 3.19 = 2.81 (see Table 1). This
might be because in the positive version, the students were not considering the
teacher’s need to respect statements said in confidence, but rather they were
considering the danger the student posed to himself and others. Conversely, in
the negative version, the right of the teacher to “not” have to do something
was most likely given more importance. If someone were given a choice between
being required to do something or not being allowed to do the opposite of that
something, they would probably tend to choose being required to do something.
This is because this does not inhibit their freedom to decline to take some
action; they are not prevented from doing something. This would seem to be less
negative than being required to do something else.
The wording
effect of the statement in question three of questionnaire 2 (see Appendix B) was meant to take advantage of the emotions of the
students. This was accomplished by stating in questionnaire 1 that the death
penalty “involves cardiac arrest and inhibition of diaphragm function” while
the other states that lethal injection involves the administration of “doses
of poisons causing the heart to stop beating and violently suffocating the
prisoner by preventing his/her diagram from functioning.” While both
statements relay the same general information (that lethal injection inhibits
heart and diaphragm function), the statement for questionnaire 2 does so while
also appealing to the emotions of the student and utilizing empathy felt for
death row inmates by utilizing words such as “poisons” and “violently
suffocating,” both of which carry more negative connotations than do medical
jargon such as “cardiac arrest” and other words such as “inhibition.” A
t-test yielded highly significant results (t (125) = 4.61, p = 0.0001), meaning
that there less than a 0.01% chance that the differences between the two data
sets (i.e. questionnaire 1 and 2) could be attributed to chance variation in the
subject responses.
From the
beginning, statement five was one we felt would yield significant results.
Rowland Hall - St. Mark’s is a very liberal school, George W. Bush is not
particularly well-liked by the students. Thus, we expected the statement
regarding “President Bush’s” handling of events following September 11,
2001 rather than “the federal government’s” handling of events to be rated
significantly lower. These expected results were produced. The mean in
questionnaire 1, concerning “the federal government,” was 3.28 while in
questionnaire 2 it was only 2.80 (see Table
1), and the t-test produced quite significant results (t (126) = 2.40, p =
0.02). Thus, even though President Bush was more or less representative of
“the federal government” in this cause, his handling of the situation was
rated lower.
The major
difference between statement eleven in the two questionnaires dealt with
statement length. In questionnaire 1 (see Appendix
A), the statement was quite short while in questionnaire 2, the statement
relayed the same basic information but added the phrase “even though this
money helps spur on the economy of not only the United States, but the rest of
the world as well.” The intent of this longer statement was to obtain
responses that were higher in numerical value; unfortunately, these expected
results were not found. The mean for questionnaire 1 was 4.05 while the mean in
questionnaire 2 was only 3.55. A t-test yielded significant results, but in the
oppisite direction of what we predicted (t (126) = 2.63, p = 0.01). Thus,
although our expectation was incorrect, the differences between the responses
given were meaningful nonetheless. The reason for this is most likely in the
connotation the extra phrase in the second questionnaire produces. By saying
that this money “spur(s) on the economy of . . . the rest of the world,” a
more positive image is imposed on the United States as an entity. Thus, instead
of thinking of how individual Americans spend too much on pointless items, the
statement invokes a thought that this money then helps those people in poorer
countries via stimulating their economy. That is to say, this more positive
image displaces somewhat the negative image produced by the beginning of the
statement, “Americans spend too much money on useless trivial items, luxuries,
which many of the people throughout the world do without.”
By
definition, “anthropology” is the study of “the origins of humans.” The
only difference between statement fourteen in the two questionnaires was on
relating how interesting “anthropology” was as compared to how interesting
“the origins of humans” was (see Appendices
A & B). We predicted that “anthropology” would be rated as
significantly less “interest(ing),” as people either did not know what it
was, or had negative connotations (such as being a nerd) associated with it. The
means as well as the t-test supported this hypothesis directly: the mean on
questionnaire 1 was only 3.11 while it was 3.67 1 questionnaire 2 (t (125) =
-2.62, p = 0.01).
In evaluating
questions four, six, seven, eight, nine, ten, twelve, and thirteen of our
questionnaires, we found evidence to support that the differences in responses
given on the two questionnaires were not significant. Each of these questions
had p-value greater than 0.05 (see Table 1).
The
difference between our two questionnaires regarding question four involves
length and information given within the question (see Appendices A and B). The p-value equaled 0.84, which means that
there was an eighty-four percent chance that the difference in results was due
to chance. The mean responses for both sets of question data were almost
identical (see Table 1).
The
differences between the two questionnaires regarding questions/statements six
and seven involve question order (see Appendices
A and B). Question six on questionnaire 1 was the same as question seven on
questionnaire 2, and vice versa for statement seven on questionnaire 1 and
statement six on questionnaire 2. Thus in order to complete t-tests for these
questions/statements, two t-tests had to be completed, one comparing the
statement “My school, __________________, is a very tough/challenging school
that prepares me well for the rigors that lay ahead in college” and the other
comparing the question “How would you describe your experiences at your
school?” (see Appendices A and B).
We expected that the effect of this order change would be to influence the
students’ respective views of the school at that time. Bt asking how
“challenging” the school was first, we expected the students to rate their
overall “experiences” at the school lower because the academic rather than
social, athletic, etc. side of the school. By asking about the students’
“experiences” first, we expected that students would not be as inclined to
consider academic factors as often. Unfortunately, this was not the case as the
means for both questions/statements were almost the exact same, and both t-tests
yielded non-significant results (for question six on questionnaire 1 and
question seven on questionnaire 2: t (126) = -0.22, 0.83); for statement seven
on questionnaire 1 and statement six on questionnaire 2: t (126) = -0.35,
0.73)).
In evaluating
question eight, the differences between the two tests were the change in time
versus an event. The difference was a reference to the Columbine shootings or
shootings in the year 1999 (see Appendices A and B). The p-value equaled 0.23, which means that
there was a twenty-three percent chance that the variation was due to chance.
Due to the specifics mentioned, the mean differences between the two tests are
almost identical (see Table 1).
Question nine
involved a difference in answer possibilities. In Appendix A, readers were given
answers options such as “1, 2, 3”, whereas in Appendix B, readers were given
answer options such as “disagree, no opinion, agree”.
The differences between the two questionnaires produced a p-value of
0.10. This means that there was a ten percent chance that the variations in our
results were due to chance. Because this question only offered three answer
options, there is a smaller standard deviation within these results (see Table
1).
In regard to
question ten, the difference between the two statements was that one referred a
specific drug while the other was ambiguous in saying only “certain drugs.”
Many people have very strong convictions for or against “cannabis” in
particular while they may have less intense feelings about other drugs. We tried
to associate with these feelings by referring to the legalization of
“cannabis” and “certain drugs” (see Appendices
A and B). Unfortunately, the p-value for this question was 0.67, meaning
that there was a sixty-seven percent chance that the differences in the results
were due to chance (see Table 1) and
that the data could not be considered significant.
Regarding
question twelve, the differences between the two tests given was reference to
President Bush, whereas the other question was much more general in substituting
America for the President (see Appendices
A and B). The question was framed poorly, essentially asking the same
question without much difference at all. The p-value was 0.42 meaning that there
was a forty-two percent chance that the variation in question answers was due to
chance. The means for each test were similar, showing almost no difference in
answer (see Table 1).
Question
thirteen involved the use of relative word strength. Both questions inquired
about the legality of cloning. The question on questionnaire 1 (see Appendix A) used the phrase “should be forbidden,” whereas the
question on questionnaire 2 (see Appendix
B) used the phrase “should not be allowed.” We predicted that people
would be less likely to agree with the statement “should be forbidden”
because of the strength and connotations that came along with the word
“forbidden.” Forbidden is usual seen as a more negative thing that “not be
allowed,” though the two have the same meaning. This hypothesis was confirmed
as the mean for questionnaire 1 was 3.19 and for questionnaire 2 it was only
2.80, a differences of 0.39. But as promising as this difference looks, the
t-test performed on these data sets produced a p-value of only 0.11, meaning the
difference was not acceptably significant.
Pertaining to
question fifteen, the difference in wording effect between the two
questionnaires had to do with answer order. The answers for the question were
identical, but the order in which the answers were given was opposite. For
example, one of the questionnaire answers read “Winston Churchill” and then
“Dwight D. Eisenhower,” and vice versa for the other questionnaire (See Appendices A and B). Unfortunately the data we obtained from the
questionnaires did not show any significant differences. The p-value we obtained
from the chi-squared statistical function was greater than 0.05.
In
conclusion, although the majority of our results were not significant, we were
quite pleased to obtain six data sets that were in fact meaningful. This means
that our hypothesis that the wording effect is valid is true, at least for the
types of wording effects that we obtained significant results for. The reason
that some of our results may not have come out as expected could be because of a
number of factors including some of the following. Rowland Hall - St. Mark's
students tend to be not too fond of psychology surveys. Even though the survey
was disguised as a history survey, some did find out what it really was. Thus,
some may have not filled out the survey to the best of their abilities or may
have randomly filled out answers. Another reason may have been the nature of the
questions we wrote. The wording effect may not have been strong enough, or the
difference between the questions interfered with the wording effect being tested
(i.e. it brought in some other influence not foreseen or wanted; though, in some
cases this turned out to help the significance of our results). In order to
remedy these problems listed, a larger survey would need to be constructed, and
the questions/statements would needed to be studied in great depth to reveal any
possible unintended consequences. Although as a whole, notwithstanding
those non-significant results, we believe we can call our experiment a general
success. We set out to confirm, at least to some degree, the validity of wording
effects and did so effectively.
|
Table 1. Means, Standard Deviations, and t-test values for the first 14
questions and/or statements; for questions and/or statements from Form A,
refer to Appendix A; for questions and/or statements from Form B, refer to
Appendix B
|
|
Question #
|
Form A
|
Form B
|
|
|
|
Mean
(SD)
|
Mean
(SD)
|
t-value
(df)
|
p-value
|
|
1*
|
11.45
(7.16)
|
8.64
(7.65)
|
+2.12
(123)
|
0.04
|
|
2*
|
2.81
(1.19)
|
3.48
(1.35)
|
-2.67
(125)
|
0.003
|
|
3*
|
3.72
(1.23)
|
2.63
(1.43)
|
+4.61
(125)
|
0.00010
|
|
4
|
4.31
(1.08)
|
4.27
(1.15)
|
+0.20
(126)
|
0.84
|
|
5*
|
3.28
(1.11)
|
2.80
(1.15)
|
+2.40
(126)
|
0.02
|
|
6 to 7
|
3.78
(0.63)
|
4.50
(0.74)
|
-0.22
(125)
|
0.83
|
|
7 to 6
|
4.45
(0.87)
|
3.81
(0.90)
|
-0.35
(125)
|
0.73
|
|
8
|
2.91
(1.02)
|
2.67
(1.18)
|
+1.22
(123)
|
0.23
|
|
9
|
2.16
(0.51)
|
2.33
(0.65)
|
-1.63
(124)
|
0.10
|
|
10
|
3.22
(1.42)
|
3.33
(1.45)
|
-0.43
(126)
|
0.67
|
|
11*
|
4.05
(1.10)
|
3.55
(1.05)
|
+2.63
(126)
|
0.01
|
|
12
|
3.88
(0.90)
|
4.02
(1.05)
|
-0.81
(126)
|
0.42
|
|
13
|
3.19
(1.37)
|
2.80
(1.39)
|
+1.60
(126)
|
0.11
|
|
14*
|
3.11
(1.18)
|
3.67
(1.23)
|
-2.62
(125)
|
0.01
|
|
15
|
1.78
(0.42)
|
1.83
(0.39)
|
N/A
|
N/A
|
|
*Significant Results
|
|