Preparing for Life (PFL) is a prevention and early intervention programme which aims to improve the life outcomes of disadvantaged children in Dublin, Ireland. PFL was designed and implemented by the Northside Partnership and was subject to an extensive evaluation conducted by the UCD Geary Institute for Public Policy between 2008 and 2015 using a randomised control trial design. The evaluation found that the PFL programme had a significant impact on children’s skills by raising cognitive ability, reducing behavioural problems, and improving health. Please see Doyle (2017) and Doyle and PFL Evaluation Team (2016) for a description of the final results. The programme was one of 52 programmes funded by The Atlantic Philanthropies and the Department of Children and Youth Affairs as part of the Prevention and Early Intervention Initiative. In mid-2017 almost all the quantitative data collected as part of the PFL evaluation were placed in the Irish Social Science Data Archive (ISSDA). The decision to archive the data was made prospectively during the design of the study. The aim of this article is to describe the motivation for archiving the PFL data and the processes involved in prospectively designing, collecting, and storing data which was destined for a national archive. 

The PFL study

The goal of the PFL programme was to reduce social inequalities in children’s skills by working with parents from pregnancy and until school entry. Families were recruited during pregnancy and randomly assigned to a high (n=115) or low (n=118) treatment group. The high treatment group received 1) bi-monthly home visits from a trained mentor to support parenting and child development using Tip Sheets, 2) baby massage classes to support reciprocal communication, and 3) the Triple P Positive Parenting Program to support positive, effective parenting practices. Both groups also received developmental toys, access to preschool and public health workshops, and a support worker. A ‘services as usual’ comparison group (n=99) from another community was also recruited.

The impact evaluation investigated the impact of the programme at frequent time points (baseline, 6, 12, 18, 24, 36, 48, and 51 months) using parent-report interviews, observations, and direct assessments. Families also gave consent to access their maternity and child hospital records, and teachers completed online surveys about the children’s school readiness skills. Qualitative interviews with PFL mothers, fathers, children, and the PFL staff were also conducted.

Motivation for archiving the PFL data

Unlike most studies of early intervention programmes, the evaluation of PFL was led by a group of economists. Traditionally economists, particularly those who study human development, conduct secondary analysis of publicly available cohort or registry data. Thus the decision to archive the PFL data was driven by a strong belief among the investigators that any data collected as part of this publically funded study should be made available as a public good to be used by other researchers. The decision was also influenced by the location of ISSDA, which at the time, was housed within the UCD Geary Institute. The investigators, and in particular Professor Colm Harmon the then Institute director, had in-depth knowledge about the value of archiving and disseminating quantitative social science data. Within economic journals in particular, authors were increasingly required to make their datasets and code publicly available.

The decision was also motivated by one member of the study’s advisory group, Professor James Heckman, who was seeking access to data from some of the landmark U.S. early intervention studies. Prior to this, almost all evaluation data were privately held, often by the researchers who conducted the original study. By making these historical data available, the data could be reanalysed and reinterpreted using new methods and different theoretical perspectives. In particular, Heckman and his team at the University of Chicago accessed data from the Perry Preschool Program, the Carolina Abecedarian Program, and the Nurse Family Partnership Program. As a result, several new papers emerged offering new insights into these important studies (e.g. Heckman, Moon, Pinto, Savelyev, and Yavitz, 2010; Heckman, Pinto, and Savelyev, 2013; Gertler et al., 2014; Campbell et al., 2014).

Thus, the resolution to archive the PFL data, which we believed would be another landmark study in the early intervention field, was embedded into the design of the study from its inception.

Impact of archiving the PFL data on study design

The decision to archive the data had a number of implications for the study design which can be broadly grouped into four main categories: consent and ethics process, survey content, data quality and protection, and data documentation.

The first step in prospectively archiving the PFL data was to design an information and consent form which would provide the PFL participants with the necessary information to make an informed decision about joining the study. The form included consent both to join the PFL programme and the evaluation. As we were seeking consent to deposit the evaluation data into ISSDA, the form included a detailed section describing what would happen to the participant’s data when the study ended. The information sheet explained that an anonymised dataset would be placed in ISSDA and could be used by other researchers. We reiterated that this dataset would not contain any personal details and that all names would be replaced by numbers to ensure that no-one could identify any individual responses. The consent form then explicitly asked participants whether or not they would permit an anonymised version of their data to be used in other research studies and publications. Of the 332 participants recruited, only one did not consent for their data to be used. While archiving social science data is slowly becoming standard practice, when we applied for ethical approval to conduct the PFL study in 2007, there was little precedence of prospectively archiving data. Despite this, none of the three ethics committees from whom permission was sought (UCD Human Research Ethics Committee, Rotunda Hospital’s ethics committee, and National Maternity Hospital’s ethics committee), raised any concerns with this aspect of the proposal.

In terms of survey content, to ensure the usefulness of the PFL collection as a panel dataset, the same instruments were used in multiple waves to allow future researchers to model changes over time in child and parent outcomes. We also ensured that each survey could be utilised as a stand-alone dataset to facilitate cross-sectional analysis. As archive users may wish to compare the PFL data to other national and international datasets, we also included commonly used instruments in the field, such as the Child Behavioral Checklist and the Home Observation Measurement of the Environment scores.

In terms of data quality and protection, as these data would be a publically available resource, the highest possible standards were maintained throughout the study to ensure that quality data were collected and stored appropriately. As the data would eventually be archived in electronic format, all the research interviews were conducted using tablet laptops to record responses directly. This served to reduce administrative burden, as well as increase the reliability of the data by minimising imputing errors. To guarantee data protection, we developed a PFL Data Management and Protection Protocol, alongside a Data Confidentiality Agreement, which everyone involved in the study signed. This document detailed the security procedures to be followed regarding the collection, storage, and analysis of the data. As the study was conducted over an extended period of time, with over 30 researchers working on the project, a PFL Research Training Manual was developed to facilitate staff turnover and preserve institutional memory.

In terms of data documentation, archived data requires clear and detailed documentation to ensure that researchers not involved in the original study can effectively re-use the data. Therefore, a number of standardised procedures were put in place to capture key information at each data collection wave. These included maintaining detailed codebooks and instrument descriptions, using a standardised variable naming convention, and recording information on the sample population, attrition, and missing data. After each research assessment was complete, we produced an evaluation report documenting this information, alongside the impact results for that assessment point. These procedures helped to ensure that the data archival process conducted at the end of the study was less onerous. Please see Wong (2017) in this edition for details on the practical steps involved in preparing the PFL data for archival.

Potential uses of the PFL data

The ISSDA website provides detailed information on all the PFL quantitative data that are available for analysis. Broadly, these include the eight research interviews conducted with families and the directly assessed measures of children’s cognitive development. The birth and hospital records are not available due to the sensitive nature of these data, and the qualitative data will be archived in the Irish Qualitative Data Archive (IQDA).

The entire quantitative collection includes over 14,000 variables collected from ~300 families. Thus, as well as providing detailed information about the effectiveness of the PFL programme, they also provide comprehensive information on a population that is often under-represented in social surveys. The archived data has many potential uses. For example, it could be used to reproduce the impact results derived in the original evaluation. The issue of reproducibility of RCTs has received much attention in recent years in both the medical and social sciences (see special editions of Science, December 2011 and the American Economic Review, May 2017) and it is argued that making research data publicly available may help to reduce the dissemination of incorrect results, as well as prevent scientific fraud.

Regarding the PFL data, archive users could test the sensitivity of the original results to different statistical methods. It is also possible for archive users to examine outcomes that were not the primary focus of the original evaluation. For example, while academic papers have been published on the impact of the programme on child outcomes (e.g. Doyle, Harmon, Heckman, Logue, and Moon, 2017; Doyle, Fitzpatrick, Rawdon, and Lovett, 2015; Doyle, Delaney, O’Farrelly, Fitzpatrick, and Daly, 2017), less research has been published on parent outcomes. There is also potential to examine the mechanisms underlying the treatment effects and the longitudinal nature of the data could be exploited by modelling changes in outcomes over time. Finally, the presence of such a large amount of data on child development, health, parenting, social support systems, childcare, service use, as well as detailed socio-economic profiles, allows a thorough investigation of the lives of disadvantaged families in Ireland during a period of economic recession and recovery.

Conclusion

The archival of the PFL data capitalises on the substantial time and financial investment made in its original collection. In an era when the replicability of scientific studies is frequently questioned, the PFL data offer a unique opportunity to test the reproducibility of, as well as extend, the original results, which will ultimately increase the scientific integrity and rigour of the PFL study.