This article touches on some basic processes relating to Data Management (DM) that aid straightforward Data Archiving (DA), and facilitate future secondary data analysis (SDA). Prevention and Early Intervention (PEI) projects can be logistically challenging and costly, with limited resources for exhaustive data analysis. Therefore, it is prudent to archive the data for subsequent researchers to analyse and extend the research beyond the scope of the original PEI project. It is becoming common practice for funding organisations (e.g. the Education Endowment Foundation, 2017) to request relevant provision, and planning for DA (Van den Eynden, Corti, Woollard, Bishop and Horton, 2011).
Researchers, who have both experience of conducting PEI projects and preparing data sets for archiving, write this short commentary. Examples of previously archived projects, from the Irish Social Science Data Archive (ISSDA, 2017), are included to highlight some pitfalls of the DM process and provide tips for more efficient DA.
When a data analyst works with a secondary dataset, relevant data information must be available. Variable names must be appropriate and in a logical convention, data labels should be relevant and consistent, and contain precise values. However, if this essential information is missing, the DA process becomes labor intensive and requires the researcher or archivist to rework the data file before archiving.
In the case of the Mate-Tricks dataset, from a randomised controlled trial evaluation of an afterschool programme (O’Hare et al. 2012), there were over 800 variables, therefore a logical and consistent variable naming convention was implemented with appropriate data labels and explicit data values. Variable names may be in a format that allows the person who created them to understand what they relate to (as they set up the data file), but they may not be explicit enough to inform a subsequent user to understand what they represent. Example variables in the Mate-Tricks dataset are CPT_TEI_1 through to CPT_TEI_75. The CPT refers to Child Post Test (time point of measurement), TEI denotes the Trait Emotional Intelligence questionnaire (the measure used), and 1 to 75 represents the item number on the measure. When a standard variable naming convention is not used, the variables will have to be renamed, which requires additional work. If data labels and values are missing, then the data archivist will also need to refer to the measures/ questionnaire in order to modify the variable characteristics in the data file.
These basic DM procedures are implemented easily, and when done as a matter of course they allow for straightforward DA and efficient navigation of the variables for the secondary data analyst.
By investing a small amount of time, adhering to basic DM procedures mentioned above during the data preparation stage, the DA process becomes less time consuming, and more beneficial to an SDA. Consequently, the data will help extend research and can provide a direct credit to the researcher as a research output in its own right (Van den Eynden et al., 2011).