msData¶
msAI module for importing mass spectrometry data into dataframes.
- Features
Extraction of data from MS files (mzML, TBD…)
Creation of in-memory data structures for spectra / peaks values
Building a set of MS data files
- Todo
Change MSfile to dataclass
Change properties to attributes
Modify public / private
Create types for peaks and spectra dataframes
-
msAI.msData.logger= <Logger msAI.msData (DEBUG)>¶ Module logger.
-
class
msAI.msData.MSfile[source]¶ Bases:
objectInterface class for accessing data from a MS file stored in various file types.
Subclass implementations provide support for the various file types and override the init method to set values. The
peaksandspectraproperties hold data structured in dataframes.-
__init__()[source]¶ Initializes an instance of MSfile class.
No need to call this superclass initialization, as subclasses provide values for all attributes initialized here.
-
_run_id: str = None¶
-
_run_date: str = None¶
-
_ms_file_version: str = None¶
-
_spectrum_count: int = None¶
-
_peak_count: int = None¶
-
_tic_sum: float = None¶
-
_peaks: DF = None¶
-
_spectra: DF = None¶
-
property
ms_file_version[source]¶ Get the data format version in which the sample was originally saved as specified from its ms file.
Note: Currently, this is equivalent to mzML version number.
-
property
spectrum_count[source]¶ Get the number of MS spectra from a sample run.
This value is calculated from the number of spectra imported, rather than from MS file metadata.
-
-
class
msAI.msData.MZMLfile(mzml_file_path: str)[source]¶ Bases:
msAI.msData.MSfileClass to access MS data stored in an mzML file.
-
__init__(mzml_file_path: str)[source]¶ Initializes an instance of MZMLfile class.
- Parameters
mzml_file_path – A string representation of the path to the mzML data file. Path can be relative or absolute.
-
_run_id= None¶
-
_run_date= None¶
-
_ms_file_version= None¶
-
_spectrum_count= None¶
-
_tic_sum= None¶
-
_peak_count= None¶
-
_create_spectrum_peaks_df(spectrum)[source]¶ Creates a dataframe of all the peaks for a single spectrum in an mzML file.
-
_create_dfs()[source]¶ Creates spectra and peaks dataframes for an mzML file.
- This method sets the following properties:
self._peaks
self._spectra
-
_peaks= None¶
-
_spectra= None¶
-
property
ms_file_version¶ Get the data format version in which the sample was originally saved as specified from its ms file.
Note: Currently, this is equivalent to mzML version number.
-
property
peak_count¶ Get the total number of MS peaks from all MS spectra in sample run.
-
property
peaks¶ Get a dataframe of all peaks in a MS file.
- Dataframe structure
- First Index Level: spec_idSecond Index Level: peak_numberColumns: rt, mz, i
-
property
run_date¶ Get the date the sample was run as specified from its MS data file.
-
property
run_id¶ Get the sample’s run ID as specified from its MS data file.
-
property
spectra¶ Get a dataframe of all spectra in an MS file.
- Dataframe structure
- Index: spec_idColumns: rt, peak_count, tic, ms_lvl, filters
-
property
spectrum_count¶ Get the number of MS spectra from a sample run.
This value is calculated from the number of spectra imported, rather than from MS file metadata.
-
property
tic_sum¶ Get the total ion current sum of all spectra in sample run
-
-
class
msAI.msData.MSfileSet(dir_path: str, data_type: str = 'all', recursive: bool = True)[source]¶ Bases:
objectClass to create a set of MS files from a data directory.
Creating a set enables a large number of datafiles to be viewed / manipulated as a dataframe, without loading their entire contents into memory.
By default, contents of sub directories will be recursively included. However, an error is raised if included filenames are duplicated. A Set can include any MSfile type (mzML, msAIr, or a mix). By default, any datafile matching these extensions will be included. An exclusive type may alternatively be specified.
-
mzML_exts: ClassVar[List[str]] = ['mzML', 'mzml', 'MZML']¶ File extensions considered to be mzML files.
-
msAIr_exts: ClassVar[List[str]] = ['msAIr', 'msair', 'MSAIR']¶ File extensions considered to be msAIr files.
-
__init__(dir_path: str, data_type: str = 'all', recursive: bool = True)[source]¶ Initializes an instance of MSfileSet class.
- Parameters
dir_path – A string representation of the path to the data directory. Path can be relative or absolute.
data_type – (
all,mzML,msAIr) The type of MS files to include in the set. By default, all types are included.recursive – A boolean indicating if files in subdirectories are included in the set. Defaults to
True.
- Raises
MSfileSetInitError – For duplicated filenames.
-