Another Analysis of the "Risk-Taking" Experiment

By Marc Leotard

Part of the continuing dialog generated by Simon Szykman's "risk-taking" experiment whose results were initially presented in the Spring 1999 Retreat issue of the zine. See also Simon's followup article in the Fall 1999 Movement issue, as well as Brahm Dorst's related article which appears in this issue.
Simon acknowledged the limitations of his analysis and offered to make the data of the experiment available to others for better analysis. Marc Leotard took Simon up on his offer. Here are the results of Marc's analysis of the data, which he notes include an attempt to be "as 'user-friendly' as possible for people unaccustomed with statistical reasoning."

The first thing to look at, when confronted with numerical data, is an overall pattern of relationships. These, as Simona surmised, can be obtained through correlations, when data are "measures" (ranked numbers), or through various association measures when data are "categorical" (e.g. qualitative features like nationality or colour...). Here, I simply took correlations. I should have gone through "rank correlations", due to the fact that our data are of an ordinal nature but do not measure any objective quantity: "4" is more than "3" but we cannot be sure that the increase is of the same effect as between "3" and "2." Yet, since all results are negative (as we will see), I didn't bother to refine the analysis.

Now, most of the correlations are not significant. This means, the observed result could naturally arise from data that are purely random, hence we cannot draw any conclusion in favour of association. Of course, we have to take into account the subdivision of answers in two groups, namely Map-1 (Italy units poised against France) and Map-2 (against Turkey). In each group, we have a predictably negative correlation between the perceived threats towards France and towards Turkey (-0.722 and -0.701, difference not significant). Also in both groups, intention (of the French player) to defend against Italy is slightly correlated with the perceived threat (0.463 and 0.435) and negatively with the perceived threat against Turkey (-0.431 and -0.405).

Perceived expertise and playstyle are not correlated with any other response, nor significantly between them (0.153). Due to the definite nonlinearity of responses on these two items -- people mostly answered levels 2 or 4 for playstyle, and Simon showed kinks in "experts" behaviour, probably due to the fact that the real expertise-level induces biased perception (or untruthful revelation) of itself -- I ran a test for independence between categorical (i.e., non numbered) response items, which also turned out negative: the category chosen for expertise is not significantly linked with the one chosen for playstyle. This, again, means that the survey result could have arisen from pure random data (more than 1 in 10 odds); yet this does not prove an effect to be non-existent: data do show a small tendency for those who view themselves as experts to also think of themselves as risk-takers. We simply can't ignore the possibility that this tendency is just a sampling effect.

The next step in data analysis, the main reason for the survey in the first place, is to ask, "Is there a difference in perception according to which map has been showed?" To answer this, I tested whether the groups exhibited significant differences in their statistical distribution of answers. To keep it simple, two measurements are classically measured and compared, namely the average response and the variance (i.e. dispersion) of responses in the group. [Note that in refined analysis, one can further compare the shape of distribution, or again use responses as categories rather than ordered measures. The first technique requires more data than we have (and also some sort of available scientific model, which obviously we have not as this is just exploratory analysis). The second is less precise than ordered analysis in this case.]

For all three responses (Perceived threat to France, to Turkey and Intention to defend), both measures are not significantly different from one group to the other (p-values for variances: 0.45, 0.44 and 0.29 -- one-sided p-values for means: 0.13, 0.27 and 0.41). This means that two random groups being shown one same undifferentiated map would have 13% to 45% chance of ending up with small differences like those of the survey. So, in other words, we have no good reason to doubt that the map shown is ineffectual on the determinants used by a subject to analyse the proposed game situation.

Is that it? Of course, I am very cautious in the words I use to explain the statistical conclusions. They only pertain to this experiment, with this game situation, this way of placing units, this way of measuring perceptions. One drawback of statistics is that they rarely prove anything. They just put weights on the chance that such-or-such theory may be true. What is said here is, the theory that says 'no map effect' has a reasonable claim of being true. What is not calculated is the chance of the alternative theory; probably because no clear theory has been drawn up, except negating the former.

Now, I also ran a few tests on the effect of items "Expertise" and "Playstyle." It may sometimes appear that a real influence is occulted in the data by the presence of an unaccounted-for factor. For example, what if "aggressive" players had a tendency to view Map 1 as more threatening than Map 2, while "defensive" players should view it the other way round? (Of course, my example is stupid, but you see the point: mixing them all in one group blurs the image and makes the effect statistically insignificant.)

To begin with, do the data allow us to predict a player's response, based on our knowing which map he was shown and what his self-proclaimed expertise and playstyle levels are? Running a simple (simplistic, to be more correct) linear regression on the data shows the (linear) relation to be utterly nonexistent and nonpredictive (R-squared 0.01, Multiple det. coeff. 0.10). Just because, of the three determinants, only Expertise-level could be thought as slightly significant (p=0.047), I further investigated its impact on responses.

Actually, To prevent biases due to insufficient data, I reduced the number of categories to three: Novice (1 and 2, 165 responses), Middle (3, 173 responses) and Expert (4 and 5, 108 responses). For each of these levels, I checked if there was any significant difference between Map1 and Map2 for our three responses. None was significant (Novice:0.17,0.13,0.62, Middle:0.43,0.64,0.37, Expert:0.24,0.43,0.30 -- Standard level of significance in Social Science is 'lower than 0.05'). To reinforce our belief of No Effect, we notice that two of the results even point in the unexpected direction (e.g. 'Middles' on average see more threat to Turkey on Map 1 than on Map 2!).

Finally, Analysis of Variance shows the expertise level to be of no relevance on the way threats are perceived, either. That is, independently of the way units are poised, the game situation is not perceived (significantly) differently by experts, intermediates or novices.

Conclusion

Negative results do not mean an experiment is a failure. On the contrary, it is an advance in knowledge. Now we know that placing units this or that particular way (or at least, the way Map1 and Map2 differred) is not worth the effort: because, even if there could be a small chance of response from a small sample of possible players, it is still not significant enough for you to build a successful strategy on it.

Accordingly, I think the game Diplomacy has been improved by Simon's survey. Thank you Simon, Manus, and all that participated.

Marc Leotard
([email protected])

If you wish to e-mail feedback on this article to the author, and clicking on the envelope above does not work for you, feel free to use the "Dear DP..." mail interface.