REFLECTIONS ON THE FIELD

 

A Note on the

Death Threshold in Coding Civil War Events

 

Nicholas Sambanis
World Bank; Yale University

 

In the quantitative literature on civil war, a question of central importance is how to define a civil war event?  Following the coding guidelines of the influential Correlates of War (COW) Project, most political scientists treat war as a “distinct phenomenon, differentiated from other types of conflict.”[1]  A civil war is usually defined as an armed conflict that takes place within the boundaries of an internationally recognized state and involves the state as a major participant.  The most important feature that distinguishes civil war from other types of civil violence is the magnitude of the violence.  Most data projects code a war event if it causes more than 1,000 battle deaths.[2] 

In this brief research note, I address the important question of the death threshold in coding civil war events.  I consider a number of controversial questions:  Should a civil war be defined on the basis of an annual or overall number of deaths?  Should the death threshold be defined in absolute or relative terms (i.e. in per capita terms)?  If an absolute threshold is used, what should that threshold be?  And, should we focus only on battle-deaths, or total numbers of deaths, including civilian deaths?

 

Measuring Deaths to Code a War

The death threshold has significant implications for our analyses of the determinants of civil war onset, duration, and termination.  The start- and end-dates of war events and their duration depend entirely on our coding rules.  There is no consensus in the literature on if and why we should use an absolute threshold of deaths as the main criterion in our coding of war events.  

Our conceptual confusion dates back to perceived inconsistencies in the Correlates of War (COW) data sets.  Researchers who have been using COW civil war data are unclear on the definition of a war event.[3]  Some studies cite COW in coding a war event if 1,000 battle-related deaths occur in the entire war.[4]  Others argue that COW uses an annual death threshold of 1,000 deaths to code the onset and duration of a civil war.[5] Gleditsch et. al. (2001, 12) write that “the COW requires 1,000 battle-deaths in a single year to qualify as a war.” It would have been easy to resolve this confusion if it was not reinforced by the codebook of the COW civil and international wars data sets.  The codebook suggests that COW coders have used an annual death threshold for coding.[6]  However, researchers (including myself) who have tried to access the data on annual deaths have been unsuccessful.  To date, there are doubts as to the accuracy and consistency of the coding of annual civil war deaths in the COW Project. 

An important question then is the COW and other data sets have coded the start of war events at the first year during which the level of violence surpasses 1,000 deaths.  Or, is it the case that, in the absence of an accurate annual death threshold, the COW codes the start of the war event at the year during which cumulative deaths since the start of the political conflict surpass the 1,000 threshold?  This uncertainty in coding creates obvious problems.  If the 1,000 deaths do not occur in the first year of the conflict, then what is the threshold of violence used to code the onset of the war event and how is war differentiated from other types of civil violence?  If 1,000 deaths must occur during the entire life of a conflict, then how would we know when to code the end of the conflict?  The difficulty is clear, as in any given period ongoing wars are right-censored and any low-intensity armed conflict could presumably cause 1,000 deaths at some future point.  

Despite the relative clarity that a 1,000 annual death threshold offers, I would argue that it is not realistic or constructive (given the paucity of data) to constrain ourselves by such a stringent criterion.  Violence in civil insurgencies and guerilla warfare occurs in peaks and troughs and need not be sustained at the level of 1,000 deaths annually for the war to be considered ongoing.  Using an annual death threshold may cause analysts to code several war starts (onsets) in what is essentially the same long war between the same parties and without a formal understanding that the war ended for a substantial period of time.  As a step towards addressing these constraints, I would recommend a definition of civil war that requires that all of the following conditions are met: (a) 1,000 deaths occur during the first year of the coded war event; (b) violence is organized and involves the government or a party closely affiliated with the government; (c) violence is ongoing (at least at the intermediate-level) after the first year and ends in either a treaty or a 2-3 year cease-fire; and (d) if a shorter break in the fighting occurs but the fighting resumes between most of the same parties over many of the same issues, then we still code an ongoing war.

A related concern is often expressed with reference to the arbitrariness of the 1,000 death threshold.  Why have we chosen 1,000 and not 800 or 1,200 deaths?  Why did the coding at the start of the COW adopt an absolute level of deaths as opposed to a per capita measure of deaths?  It is true that there is a scale effect that is associated with most people's idea of a war -- thus a very small conflict in a very small country might jump over a per capita measure of deaths and be classified as a war, whereas in most ordinary senses of the term, it may not really be a war. 

I propose the following with reference to the question of the size of the absolute measure of deaths:  As long as an absolute threshold is used, we should consider replacing the 1,000 threshold with a range between 500-1500 deaths for the first year of coding and a threshold of 1,500 or more for the duration of the war.  Looking at currently available data on deaths, we see that the distribution of total deaths in post-1945 civil wars is highly skewed.  The mean value of total deaths in a population of 123 civil wars since 1945 is 86,020 and the standard deviation is 248,519.[7]  The skewness and kurtosis statistics for that variable are 5.114319 and 32.70138, respectively.  The median number of battle deaths is 13,000 for the duration of the war and there are 14 cases of civil war that have produced less than 1,500 battle deaths and 6 cases that, by some accounts, have caused between 500 and 1,000 deaths.  If more than 10% of our population of civil wars is within this narrow range of casualties and nearly half of our cases have caused less than 13,000 deaths, we should be careful about disqualifying cases on the basis of several hundred deaths and we might consider using a range of deaths as our criterion rather than an absolute number. 

 

War as a Distinct Phenomenon vs. Studying War Escalation

The question still remains on whether or not we should also use a relative (per capita) measure of deaths to code a war event.  The argument in favor of such a measure would be that it would allow us to distinguish events of civil violence that are significant within the context of a country's size.  Thus, this question is related to the concept of civil war as a phenomenon distinct from other types of civil violence (riots, coups, pogroms).  In response to this question, I note that, if the rule I described above is applied across countries, then events generating as few as 500 fatalities per year in countries as populous as China or India would also be coded as war events (as long as the total number of fatalities over a number of years).  Thus, we would be able to get small events of violence that may be dropped by a 1,000 annual death rule while (hopefully) not over-sampling events from small countries. 

The present threshold of 1,000 deaths may actually be leading us to over-sample conflicts in large countries.  Note, for example, that a variable measuring population size has one of the largest marginal effects in a statistical model of civil war onset in Collier and Hoeffler's model (2000).  The same result is reported in one of my own papers with reference to ethnic war.[8]  Collier and Hoeffler (2000) argue that a large effect of population size on the likelihood of civil war is theoretically consistent: if we combined the population of the entire world in a single country, we would be maximizing the risk of civil war in that country (since civil war is defined a armed conflict between the state and any one organized group).  However, the large effect of population size in these models may also be due to a selection effect: by coding war events in terms of a 1,000 death threshold, we make it easier to pick cases of violent conflict in more populous countries (perhaps also in more densely populated countries), where the likelihood of an armed conflict causing 1,000 casualties is larger, ceteris paribus.  This problem would only be magnified by a stricter rule that requires an annual death threshold of 1,000.

A solution to this problem would be to generate a per-capita death measure to use in conjunction with the 1,000 threshold.  There is a sense in which war refers to physical destruction of a certain magnitude, so the absolute death threshold is intuitively appealing.  However, for many small countries a per capita death measure would increase the coded duration of the war or add a larger number of war events.  It is easy to identify cases that do not qualify as wars in the COW data set, but which could easily qualify with a per capita death measure.  An example is Cyprus between 1963-1964, which is excluded from the COW list, but which in terms of per capita deaths has been a case of greater human suffering than other COW civil war events.  Some of the cases coded as “minor armed conflicts” in the Uppsala-PRIO data sets (coded annually as of 1989) might also be included in such a dataset. 

Creating a per capita measure of deaths is difficult and labor-intensive.  One would have to expend many resources to start coding levels of civil violence in all countries from scratch.  Clearly, what should not be done is an ex post analysis—i.e. we should not look among war cases identified in our data sets for periods of lower-level violence that might qualify as wars.  This could only bias our data.  Given the resource constraints under which we are all operating, we might want to consider creating a per capita death measure from scratch for a short period (e.g. from 1980 onwards).  This is manageable and it could allow us to conduct out-of-sample tests of statistical results that we have identified using an absolute threshold of deaths for the longer period.

In creating a relative deaths measure, one would have to look at all the cases carefully, year-by-year.  Thus, it would also be possible to create an annual death measure for the more limited period (post-1980 as mentioned above).  A second advantage of this would be that it would allow us to study conflict escalation.  The absence of such data currently makes it impossible to study conflict intensity properly in the civil war literature.  Improvements in that line of research would tell us more on whether or not civil war is a phenomenon distinct from other forms of civil violence or if it has the same basic causes as these other forms and is best understood as an escalation of more common forms of violence.  

 

Battle Deaths vs. Total Deaths

Finally, I close with brief comments on the sort of violence that we should be focusing on in our studies of civil war.  The standard definitions found in the COW were borrowed from the literature on international war, where most battles are fought between regular armies.  Thus, COW measures battle-deaths in coding war events, both civil and international.  Some scholars have suggested that we should focus on total death figures rather than simply battle-deaths (Sarkees and Singer, 2001, p. 12).  In joint work with Michael Doyle, I have also argued that battle deaths typically do not reveal the extent of human suffering that takes place in civil wars.  Indeed, most civil wars affect civilians more than militaries and civilians are often the target of violence.  Moreover, human displacement is as big a problem in civil war as casualties.  Consider that the variance and skewness of the number of battle deaths larger than the respective statistics for the total number of deaths by a factor of three and two, respectively.[9]  The median and mean values of total deaths are at least double that of the battle deaths variable.  Both theoretical and empirical work on civil wars demonstrate that civilians are the ones who suffer the most during such wars.[10]  Moreover, the motives for violence in civil war are often very complicated.  Patterns of violence are particular to the degree of government and/or rebel control of different territories, which causes a high degree of variability in the intensity and scope of violence during civil war.[11]  If civilians are the targets of most violence in civil war, especially wars fought between rival insurgent groups, then battle deaths should be a less effective measure of the magnitude of the armed conflict and we should focus on total deaths.  A similar argument might also be made to include deaths that result from famine and other communicable diseases whose onset was a direct consequence of the war.

 

Conclusion

The quantitative political science literature on civil wars is maturing.  We are now at the stage where there is a substantial body of evidence that we can test against various measures for theoretically relevant variables.  One of our main tasks for the short term should be to refine and improve our definition and coding of civil war events so that we can achieve a measure of confidence in our shared understanding of the phenomenon we are studying.  The ideas summarized in this note on how to code war events are a first step towards initiating a debate on establishing a standard definition of civil war that will make it easier for scholars to compare and contrast their results from empirical studies.  The proposals for a per capita death index and for coding total rather than battle deaths may influence the results of our analyses of the global incidence of civil war and of the intensity of these wars.  Data development should be the focus of the next wave of effort in sub-field.

 

 

 



[1]   Meredith Reid Sarkees and J. David Singer, "The Correlates of War Datasets: The Totality of War," Paper prepared for the 42nd Annual Convention of the International Studies Association, Chicago, IL, 20–24 February 2001, p. 17.

[2]  Nils Petter Gleditsch, Håvard Strand, Mikael Eriksson, Margareta Sollenberg & Peter Wallensteen, “Armed Conflict 1945–99: A New Dataset” Paper prepared for session WB08 ‘New Data on Armed Conflict’ 42nd Annual Convention of the International Studies Association Chicago, IL, 20–24 February 2001.

[3] The key definitions and data can be found at: J. David Singer and Melvin Small, 1994, Correlates of War Project: International and Civil War Data, 1816-1992 [Computer file (April)] (Study #9905) (Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor]).

[4] See, e.g., Nils Petter Gleditsch et. al. (2001, 12).  See, also, James Fearon and David Laitin, 2000, "Ethnicity, Insurgency, and Civil War," Manuscript (Stanford University, December).

[5] Paul Collier and Anke Hoeffler, 2000, “Greed and Grievance in Civil War,” World Bank Policy Research Paper 2355 (May).

[6] Sarkees and Singer (2001, p. 9) confirm this.

[7] The data are from Michael Doyle and Nicholas Sambanis, 2000, “International Peacebuilding: A Theoretical and Quantitative Analysis,” American Political Science Review (December) 94:4.

[8] Nicholas Sambanis, “Civil War: Do Ethnic and Non-Ethnic Wars Have the Same Causes?,” Journal of Conflict Resolution  (June 2001).

[9] This is based on cross-sectional data presented in Doyle and Sambanis (2000).

[10] For a theoretical study that shows that militaries are far safer than civilian groups in civil wars, see Jean Paul Azam (2001).

[11] See Stathis Kalyvas, 2000, “The Logic of Violence in Civil War,” manuscript (University of Chicago).


Return to the June 2001 CP Newsletter.

Return to the CPS Home Page.