msData

msAI module for importing mass spectrometry data into dataframes.

Features
  • Extraction of data from MS files (mzML, TBD…)

  • Creation of in-memory data structures for spectra / peaks values

  • Building a set of MS data files

Todo
  • Change MSfile to dataclass

  • Change properties to attributes

  • Modify public / private

  • Create types for peaks and spectra dataframes

msAI.msData.logger = <Logger msAI.msData (DEBUG)>

Module logger.

class msAI.msData.MSfile[source]

Bases: object

Interface class for accessing data from a MS file stored in various file types.

Subclass implementations provide support for the various file types and override the init method to set values. The peaks and spectra properties hold data structured in dataframes.

__init__()[source]

Initializes an instance of MSfile class.

No need to call this superclass initialization, as subclasses provide values for all attributes initialized here.

_run_id: str = None
_run_date: str = None
_ms_file_version: str = None
_spectrum_count: int = None
_peak_count: int = None
_tic_sum: float = None
_peaks: DF = None
_spectra: DF = None
property run_id[source]

Get the sample’s run ID as specified from its MS data file.

property run_date[source]

Get the date the sample was run as specified from its MS data file.

property ms_file_version[source]

Get the data format version in which the sample was originally saved as specified from its ms file.

Note: Currently, this is equivalent to mzML version number.

property spectrum_count[source]

Get the number of MS spectra from a sample run.

This value is calculated from the number of spectra imported, rather than from MS file metadata.

property peak_count[source]

Get the total number of MS peaks from all MS spectra in sample run.

property tic_sum[source]

Get the total ion current sum of all spectra in sample run

property peaks[source]

Get a dataframe of all peaks in a MS file.

Dataframe structure
First Index Level: spec_id
Second Index Level: peak_number
Columns: rt, mz, i
property spectra[source]

Get a dataframe of all spectra in an MS file.

Dataframe structure
Index: spec_id
Columns: rt, peak_count, tic, ms_lvl, filters
class msAI.msData.MZMLfile(mzml_file_path: str)[source]

Bases: msAI.msData.MSfile

Class to access MS data stored in an mzML file.

__init__(mzml_file_path: str)[source]

Initializes an instance of MZMLfile class.

Parameters

mzml_file_path – A string representation of the path to the mzML data file. Path can be relative or absolute.

_run_id = None
_run_date = None
_ms_file_version = None
_spectrum_count = None
_tic_sum = None
_peak_count = None
_create_spectrum_peaks_df(spectrum)[source]

Creates a dataframe of all the peaks for a single spectrum in an mzML file.

_create_spectrum_df(spectrum)[source]

Creates a dataframe of all the spectra in an mzML file.

_create_dfs()[source]

Creates spectra and peaks dataframes for an mzML file.

This method sets the following properties:
  • self._peaks

  • self._spectra

_peaks = None
_spectra = None
property ms_file_version

Get the data format version in which the sample was originally saved as specified from its ms file.

Note: Currently, this is equivalent to mzML version number.

property peak_count

Get the total number of MS peaks from all MS spectra in sample run.

property peaks

Get a dataframe of all peaks in a MS file.

Dataframe structure
First Index Level: spec_id
Second Index Level: peak_number
Columns: rt, mz, i
property run_date

Get the date the sample was run as specified from its MS data file.

property run_id

Get the sample’s run ID as specified from its MS data file.

property spectra

Get a dataframe of all spectra in an MS file.

Dataframe structure
Index: spec_id
Columns: rt, peak_count, tic, ms_lvl, filters
property spectrum_count

Get the number of MS spectra from a sample run.

This value is calculated from the number of spectra imported, rather than from MS file metadata.

property tic_sum

Get the total ion current sum of all spectra in sample run

class msAI.msData.MSfileSet(dir_path: str, data_type: str = 'all', recursive: bool = True)[source]

Bases: object

Class to create a set of MS files from a data directory.

Creating a set enables a large number of datafiles to be viewed / manipulated as a dataframe, without loading their entire contents into memory.

By default, contents of sub directories will be recursively included. However, an error is raised if included filenames are duplicated. A Set can include any MSfile type (mzML, msAIr, or a mix). By default, any datafile matching these extensions will be included. An exclusive type may alternatively be specified.

mzML_exts: ClassVar[List[str]] = ['mzML', 'mzml', 'MZML']

File extensions considered to be mzML files.

msAIr_exts: ClassVar[List[str]] = ['msAIr', 'msair', 'MSAIR']

File extensions considered to be msAIr files.

__init__(dir_path: str, data_type: str = 'all', recursive: bool = True)[source]

Initializes an instance of MSfileSet class.

Parameters
  • dir_path – A string representation of the path to the data directory. Path can be relative or absolute.

  • data_type – (all, mzML, msAIr) The type of MS files to include in the set. By default, all types are included.

  • recursive – A boolean indicating if files in subdirectories are included in the set. Defaults to True.

Raises

MSfileSetInitError – For duplicated filenames.

property df[source]

Get a dataframe of MS files.

Dataframe structure
Index: name (from filename)
Columns: type, size_MB, path