Digital Preservation Framework Re-Release Part 1: What is the Digital Preservation Framework?

The NARA Digital Preservation Unit is very excited to announce that the most recent release of the Digital Preservation Framework is now available on GitHub. This release includes a major overhaul of the Risk Matrix, with new and updated questions about how we evaluate risk to file formats, as well as a new report on the file format extensions present in our holdings.

This blog post is the first in a four-part series about this major re-release. In this post, I’ll introduce the Framework to anyone unfamiliar with it. In the next post, I’ll discuss what exactly has changed in this release. The third post will get into the nitty-gritty of our process for revising the Risk Matrix. The final post will discuss some interesting findings and takeaways from this whole process.

We were lucky to work with the Public Affairs team at NARA to put together this article about the updates to the Framework; it’s a great high-level overview of this project and digital preservation at NARA. And if you want even more information about how we assess file format risk at NARA, you can check out our paper, co-written with our Library of Congress colleagues, that we presented at the International Conference on Digital Preservation in September.

What is the Digital Preservation Framework?

The Digital Preservation Framework (or “Framework”) is a resource that we use at NARA to assess the risks posed to file formats and plan for their long-term preservation. 

The Digital Preservation Unit is constantly adding new formats to the Framework and performing maintenance and updates on the information already present. We release these updates on a quarterly basis on GitHub and as linked open data on archives.gov.

Screenshot of the NARA Digital Preservation GitHub repository
The Digital Preservation Framework on GitHub, October 8, 2024.

The Framework consists of three main parts:

Risk Matrix

NARA uses the Risk Matrix to measure the preservation risk of digital file formats in our holdings and to assess formats we anticipate receiving in the future. By answering questions related to the ability to preserve and sustain a file format, we identify relative risk levels.

The Risk Matrix is structured as a series of twenty-seven questions about each file format, organized by eight categories relating to risk and sustainability:

  1. Disclosure
  2. Adoption
  3. Transparency
  4. Self-Documentation
  5. External Hardware Dependencies
  6. External Software Dependencies
  7. Impact of Patents
  8. Technical Protection Mechanism

The answers to all the questions have been assigned numeric values, which are used to calculate an overall numeric Risk Rating and a general Risk Level (Low Risk, Moderate Risk, and High Risk). The final questions in the Risk Matrix represent how NARA prioritizes formats in our holdings for preservation actions; these questions do not factor into the Risk Rating or Risk Level.

The Risk Matrix is available as a spreadsheet that can be downloaded from GitHub.

File Format Preservation Action Plans

For each of the formats present in the Risk Matrix, NARA also creates File Format Preservation Action Plans (or “Plans”). One of NARA’s digital preservation strategies is to perform format migrations on high-risk formats (as identified in the Risk Matrix) in our holdings. The purpose of the Plans is to identify NARA’s preservation actions that will support long-term preservation (including, in some cases, taking no action). 

The Plans collate specifications, standards, and documentation related to the format, propose preservation migration actions to be taken by NARA, and identify tools to be used at NARA for processing and preservation actions. 

These Plans are not exhaustive or universally applicable; the preservation actions and tools used by NARA are often specific to our context. The tools that we use are governed by the regulations around procurement in the U.S. federal government. The actions we propose are often determined by both our capacity to perform actions at scale and managing the potential risks of transforming file formats (which could result in the loss of information). Our hope in making this information publicly available is that it can serve as a model for other organizations that perform similar preservation activities; we also strive to be transparent about the preservation actions that might be taken on records in our custody.

Like the Risk Matrix, the File Format Preservation Action Plans are available as a spreadsheet on GitHub; they are also available as linked open data on archives.gov.

Record Category Preservation Action Plans

Finally, we have a series of 16 Record Category Preservation Action Plans. These documents are meant to capture the Significant Properties that should be retained, if possible, in any format migration for formats that fall into these broad categories:

  • Digital Audio
  • Digital Design and Vector Graphics
  • Digital Still Image
  • Email
  • Geospatial
  • Moving Image: Digital Cinema
  • Moving Image: Digital Video
  • Navigational Charts
  • Presentation and Publishing
  • Software and Code
  • Structured Data: Calendars
  • Structured Data: Databases
  • Structured Data: Generic
  • Structured Data: Spreadsheets
  • Textual and Word Processing
  • Web Records

The Record Category Preservation Action Plans are available as PDF files on GitHub.

Up next

In my next post, I’ll talk about what exactly has changed in this new release of the Framework, and why we made these changes.

2 thoughts on “Digital Preservation Framework Re-Release Part 1: What is the Digital Preservation Framework?

  1. Thank you for starting this blog in general and writing this great intro into your digital preservation framework in particular. I applaud you for not only the stellar work around this but also for being so transparent about this! These resources are incredibly valuable for the wider community.
    Two quick questions:
    1. are the template files for the preservation action plans available somewhere as well? I’ll also add that question to the github issues list as it might very well be me just not being able to find it
    2. will you be talking about the human resources that went into the initial creation and ongoing maintenance of this work in a future post? i would find it very interesting to here who and with how many FTE resources was involved in the inital setup but also in the ongoing watch and update component of this.

    Again, congrats to the whole team on both the blog and the work!
    Micky

    1. Hi Micky,

      Thanks for your questions!

      1. We do not currently release the template version of the category plans, but I think it’s a good idea. I’ll take it back to my team to discuss.

      2. I’ll be talking a LOT about process and the kind of work that went into updating the Risk Matrix in a post next week, but I’m also generally interested in talking more about the labor that goes into maintenance when we’re not doing a big overhaul like this one. So I’m glad to know that this might be of interest for a future post!

Leave a Reply

Your email address will not be published. Required fields are marked *