Data Hosting Program
A major practical issue in collecting and releasing updated cross-national and cross-time data sets such as those collected by the Correlates of War (COW) Project has been the resources needed to maintain a large number of data sets. As a result, COW has implemented a distributed system of data set hosting based on the notion of “coordinated decentralization.” The goal is for each COW data set to obtain a semi-permanent “home” and “host,” that is, an institution and an individual who will agree to maintain a data set and the related documentation for at least two major updates of a data set. The care given to a data set by its host follows a set of guidelines designed to ensure consistency with COW standards. The Director(s) and the COW Advisory Board are responsible for monitoring data sets and hosts.
Guidelines
There are several guidelines for the adoption of a data set by an institution:
- The Director(s), with the approval of a majority of the COW Advisory Board, are responsible for appointing data hosts.
- The Director(s), with the approval of a majority of the COW Advisory Board, reserves the right to remove a data host if they believe there are significant problems with data collection or updating or a data host is not following the stated guidelines.
- The host agrees to submit a yearly status report on the data set to the Director(s).
- The host agrees to comply with the standards set by the COW project to data collection procedures, coding rules, structure and format of the data set, and documentation procedures.
- The host of a COW data set takes responsibility for revising and routinely updating the data set and documentation and keeping and maintaining the records and resources upon which the data set are based. The host will keep track of reported errors and questions and will release revised versions of each data set at regular intervals. A data host is expected to provide a yearly updated data set that addresses and corrects any minor errors of which they are aware. A data host is expected to produce a major update to a data set once every five years. Every major update to a data set should expand the temporal domain to include more recent observations than were covered in the previous version of the data set, and it should be accompanied by either a new publication of record or a manuscript describing the updates and new descriptive statistics to be posted on the COW website.
- The host shall maintain all available documentation associated with the data sets, and that documentation should be accessible to the Director(s), Associate Director, and Advisory Board.
- Data set hosts must be experienced with the collection of quantitative data sets and should have experience with the data set in question. Sufficient institutional resources should be available to support the hosting, possibly including relevant computer resources and research support.
- The host agrees to serve as the primary contact person and deal with substantive questions concerning both the current and previous versions of a data set. The host’s email address will be listed on the COW web site as the person to contact with data set questions.
- COW data sets will be released only through the COW website (not by individual hosts), and only after the data are final. Procedures for data set review are described on the website. The purpose of this rule is to avoid a proliferation of partial, unofficial, or inconsistent data sets through the research community.
- The host agrees not to publish any analytical results based on the resulting updated COW data set before the data are officially released by the COW project. Exceptions may be made for descriptive papers at conferences and dissertation theses, but it must be noted that such results represent analysis based on work in progress and of possibly incomplete data set and cannot be said to use official COW data. The purpose of this rule is to avoid a proliferation of non-replicable or frequently-revised results through the research community.
- When a major revision or update of a data set is complete, the host agrees to compose and publish an “article of record” concerning the new data set. The article of record could be in a peer-reviewed journal, but a manuscript describing the update and a piece of analysis demonstrating how the data could be used published on the COW website is sufficient. We expect all scholars who use the resulting data set to cite this article of record and to clearly state the data set version used for analysis.
- A subset of data hosts will serve on the COW Advisory Board on a rotating basis.
- Hosts are expected to work with the Director to secure external grant funds for data collection and updating.
Data Hosting Standards
Data set hosts must agree to the following basic standards of data collection and data set management before agreeing to host a data set.
- Data collection procedures must be carefully documented, and actual data collection must follow these procedures. Methods used in the prior data collection (where documentation identifies those procedures) and coding rules for the prior data set must be followed where possible to ensure cross-time reliability of the data. Theoretical and substantive issues, including problems in coding particular cases, must be clearly noted in the documentation.
- Units of analysis must be maintained with the current version of the data set, or if changes to reflect better ways of structuring data sets, must be fully documented and old data converted to the new format.
- To the extent possible, new variables will only be made available in new versions of data sets if coded for the entire set of states and years.
- Data sources must be clearly identified. Documentation and/or the data set should contain information allowing identification of the source of each newly collected data point. Archival material (e.g. copies of pages from source materials) will be given to the central COW office for permanent archiving.
- Each data set released will have a unique version number to maintain a chronological and developmental record of each data set.
Data Hosting Review Procedures
The criteria for release of an updated data set by COW include basic data standards of internal data set consistency, comparability to and compatibility with existing data sets, and high-quality data. These criteria can be met by carefully following the coding rules defined at the beginning of a project by and working with the Director(s) and Associate Director office to ensure consistency of the data set format and structure.
When a host believes that a data set is ready for release as an updated COW data set, he or she will submit the data to the COW Director(s) and Associate Director. A series of checks will then be undertaken before a version number is assigned and the updated data released.
- A series of automated checks will be conducted to ensure that all countries and years have been included in the data set (where a data set is cross-national and cross-time), that all data points are unique (no duplicate records or values), and that country codes and data points included in the data set match the Correlates of War National System Membership lists.
- Variable names and value codes will be examined for uniqueness, descriptive accuracy, and consistency. For example, whenever possible variable names must match names from prior data sets, and must accurately describe of the variables’ content. Dummy variables will be coded as 0=no, and 1=yes. Missing value codes will be consistent and clearly described in the documentation. Historically, COW data sets identified missing observations using the value “-9”. All COW data sets published after January 1, 2025, should identify missing observations with an empty cell (“”). Names and categories deemed unique will be checked for uniqueness.
- A review of procedures will be done to ensure that coding rules have been followed.
- Spot checks of individual data points collected by the individual host will be conducted to verify data values and source identification.
- Documentation will be reviewed, and source lists will be examined to ensure that every new data point can be traced to a point of origin.
- The format of the data set (e.g. unit of analysis [e.g., country-year, dyad-year], file type [e.g., .csv, flat text]) will be examined and made consistent with other data sets.
- In case of problems, the data set may be updated by COW or may be returned to the host for further work.
- The COW Advisory Board may be routinely consulted on issues of data set structure, coding rules, case coding, and other issues that arise in the course of data set review.
- In the case of disagreement between the host and COW about the release status of the data set, whether such disagreements concern issues of format or substantive coding decisions, the COW advisory board is available for consultation and problem resolution.
- The target for final data set release no more than six months after a candidate final release data set is submitted.