Professor Anderson leads the interdisciplinary Future Proof Computing Group. He has three main research interests: Data Preservation, the History of Computing and Para-consistent Reasoning: a way of overcoming the inability of computers to deal properly with inconsistent data based on classical logic.
David Anderson is Project Quality Manager for the E-Ark project, a multinational big data research project that aims to improve the methods and technologies of digital archiving, in order to achieve consistency on a Europe-wide scale.
Archives provide an indispensable component of the digital ecosystem by safeguarding information and enabling access to it. Harmonisation of currently fragmented archival approaches is required to provide the economies of scale necessary for general adoption of end-to-end solutions. There is a critical need for an overarching methodology addressing business and operational issues, and technical solutions for ingest, preservation and re-use.
In co-operation with commercial systems providers, E-ARK will create and pilot a pan-European methodology for electronic document archiving, synthesising existing national and international best practices, that will keep records and databases authentic and usable over time.
The methodology will be implemented in an open pilot in various national contexts, using existing, near-to-market tools, and services developed by the partners. This will allow memory institutions and their clients (public- and private-sector) to assess, in an operational context, the suitability of those state-of-the-art technologies.
Our objective is to provide a single, scalable, robust approach capable of meeting the needs of diverse organisations, public and private, large and small, and able to support complex data types. E-ARK will demonstrate the potential benefits for public administrations, public agencies, public services, citizens and business by providing simple, efficient access to the workflows for the three main activities of an archive - acquiring, preserving and enabling re-use of information.
The practices developed within the project will reduce the risk of information loss due to unsuitable approaches to keeping and archiving of records. The project will be public facing, providing a fully operational archival service, and access to information for its users. The project results will be generic and scalable in order to build an archival infrastructure across the EU and in environments where different legal systems and records management traditions apply. E-ARK will provide new types of access for business users.
E-ARK will pilot an end-to-end OAIS-compliant e-archival service covering ingest, vendor-neutral archiving, and reuse of structured and unstructured data, thus covering both databases and records, addressing the needs of data subjects, owners and users. The pilot and methodology will also focus on the essential pre-ingest phase of data export and normalisation in source systems. The pilot will integrate tools currently in use in partner organisations, and provide a framework for providers of these and similar tools ensuring compatibility and interoperability. A core component of the project is the integration platform which uses the existing ESSArch Preservation Platform (EPP) application as an Archival Information System, which is already in productive deployment at the National Archives of Norway and Sweden. In order to achieve scalability, E-ARK will adopt a data management and storage layer for this tool on top of the proven open-source Cloudera CDH4 distribution of Apache Hadoop, enabling storage and computational power to be seamlessly added to the system.
The pilot will run in several national archives, each of which will provide data to run in the pilot instance by agreement from an associated government data owner (e.g. national or regional / federal).
To sustain the outputs of our project, project partner The DLM Forum, comprising 22 national archives and associated commercial and technical providers, is well placed to ensure these. Using the open Apache licensing model, commercial suppliers will be able to incorporate the project outputs (particularly the open interfaces for pre-ingest, ingest, archival, access and re-use) into their own systems, enhancing their longevity. National archives running E-ARK pilot instances will serve as exemplars for others wanting to adopt up the new e-archiving open system.
In addition, project partner, The Digital Preservation Coalition will promote best practices in this area, as will our dedicated government institution partners.