Digital Preservation Framework Updates, January – March 2026

The Electronic Records Archives Division of the Office of the Chief Operating Officer is happy to report on updates to the Digital Preservation Framework for this past quarter (January to March 2026). The Framework is available through GitHub, and the updated File Format Preservation Action Plans are also available as Linked Open Data.

Five new formats were added this quarter. The research for these entries was conducted by Cooper Clarke, Archivist in the Transparency and Access Program of Research Services while working on a project with the Electronic Records Archives Division.

  • SQLite 3 Database (NF00886)
  • ChemStation Gas Chromatography/Mass Spectrum (GC/MS) Version 2 Data File (NF00887)
  • Apple Safari Web Archive (NF00888)
  • Microsoft Office Binder Document for Windows 95 (NF00889)
  • Microsoft Office Binder Document for Windows 97-2000 (NF00890)

According to Wikipedia, Gas Chromatography/Mass Spectrometry (GC/MS) “is an analytical method that combines the features of gas-chromatography and mass spectrometry to identify different substances within a test sample. Applications of GC–MS include drug detection, fire investigation, environmental analysis, explosives investigation, food and flavor analysis, and identification of unknown samples, including that of material samples obtained from planet Mars during probe missions as early as the 1970s.”

ChemStation Gas Chromatography/Mass Spectrum (GC/MS) Version 2 Data File stores raw GC/MS data collected from Agilent ChemStation instruments in a proprietary, binary format. The files are organized in a folder structure with the suffix “.D” and often paired with .sav files to store the instrument tuning information and .res to store analysis results. The first few lines of the file may be in plain text and contain information about the analysis including the timestamp. Due to the proprietary nature of the format, NARA recommends converting to NetCDF and retaining the original.

Screenshot of the NIST mass spectral library in use
Screenshot of the NIST mass spectral library being used to match and identify a compound. The NIST database interacts with ChemStation software (and many other mass spectrometer file formats) to compare the captured spectra in the .ms data file to its large database of known compounds.

In addition to adding this GC/MS format to the Digital Preservation Framework, Cooper’s research included obtaining various sample files in order to develop a file signature. A file signature, or magic numbers, is a unique string of bytes within a file used to identify the format of the file. File signatures more reliably identify a file format than a file extension (which can easily be modified) and are important in digital preservation to understand what formats we’re receiving and how to best manage them over time. The file signature for ChemStation Gas Chromatography/Mass Spectrum (GC/MS) Version 2 Data File has been submitted in GitHub for inclusion in the PRONOM Technical Registry so that it may be of use to the broader digital preservation community.

The change log in GitHub lists all updates for the quarter.

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.