samples

msAI module to create a unified set of MS data samples paired with any additional metadata.

Features
  • Creation of a sample set from a directory of MS data files

  • Pairing of MS data and sample metadata

  • Extraction of sample metadata from csv files

  • Saving / loading data (serialization, compression, checksum)

Todo
  • init_ms mp logging calls

  • Create a msFile subclass form msAIr files - and move loading to that class

msAI.samples.logger = <Logger msAI.samples (DEBUG)>

Module logger.

class msAI.samples.SampleSet(ms_file_set, *sample_metadata, metadata_inner_merge=False, init_ms=False)[source]

Bases: object

Class to create a dataframe of a set of SampleRuns created from a MSfileSet and paired with 0 or more SampleMetadata.

SampleMetadata objects provide a dataframe with a matching index to MSfileSet.

A dataframe is created from a set of MS data files (MSfileSet) and joined with matching SampleMetadata. By default (metadata_inner_merge=False), all files in the passed MSfileSet will be included- even if no matching metadata is found. Passing metadata_inner_merge=True, will only include MS files that have matching metadata for every SampleMetadata included.

SampleRun objects are created for each MS file when the SampleSet is created, but MS data is not initialized until called.

__init__(ms_file_set, *sample_metadata, metadata_inner_merge=False, init_ms=False)[source]

Initialize self. See help(type(self)) for accurate signature.

static _set_run_metadata(sample_name, run, metadata)[source]

Adds metadata to SampleRuns, if possible.

Samples with missing metadata are logged.

_create_sampleruns()[source]

Creates of SampleRuns for all samples in the SampleSet.

Multi or single process according to MP_SUPPORT.

_create_sampleruns_sp()[source]

Single-process creation of SampleRuns for all samples in the SampleSet.

static _create_samplerun_mpf(row)[source]

Multiprocessing function to create a single SampleRun for a row/sample in the SampleSet.

_create_sampleruns_mp()[source]

Multiprocess creation of SampleRuns for all samples in the SampleSet.

_init_all_ms_sp()[source]

Single-process initialization of MS data for all samples in the SampleSet.

static _init_ms_mpf(row)[source]

Multiprocessing function to initialize the MS data of a single SampleRun (a row of a SampleSet).

_init_all_ms_mp()[source]

Multiprocess initialization of MS data for all samples in the SampleSet.

_save_all_ms_sp(dir_path)[source]

Single-process save of MS data for all samples in the SampleSet.

static _save_ms_mpf(dir_path, row)[source]

Multiprocessing function to save the MS data of a single SampleRun (a row of a SampleSet).

_save_all_ms_mp(dir_path)[source]

Multiprocess save of MS data for all samples in the SampleSet.

property df[source]

Get a dataframe of sample runs paired with sample metadata.

Index: name (from filename) Columns: type, size_MB, path, (metadata…), run (python object)

init_all_ms()[source]

Initializes MS data for all samples in the SampleSet.

Multi or single process according to MP_SUPPORT.

save_all_ms(dir_path)[source]

Saves MS data for all samples in the set as .msAIr files (in dir_path) and add hash value to metadata (msAIr_hash).

Multi or single process according to MP_SUPPORT.

save_metadata(dir_path, filename)[source]

Saves all metadata for a SampleSet as a .msAIm file.

This enables faster loading when recreating a sample set, and verification of msAIr_hash values.

Contents will include all metadata passed at SampleSet creation + msAIr hash values (if created). MSfile data and SampleRuns are not included, as data paths may change.

Data is serialized with pickle and compressed via bzip2. A sha256 hash is returned.

class msAI.samples.SampleRun(file_path)[source]

Bases: object

Holds data from a MS analysis run of a sample and any additional metadata.

A SampleRun instance is created with a path reference that is used to create a future MSfile, or load MS data from a previously saved SampleRun. This allows a cheep view of this data to exist without importing it all into memory. A very large number of SampleRun instances can be created and their MS data initialized when needed.

Typically, SampleRun instances are not manually created, but instead arise from a SampleSet.

Data is extracted from a supported MS file type or loaded from a previous msAIr save. File type is determined by file extension (.mzML .msAIr). A sha256 hash may be provided for a .msAIr file which will be verified during init_ms().

_ms: msAI.msData.MSfile = None

MS data from a`.MSfile` or a msAIr save.

_metadata: NewType.<locals>.new_type = None

The metadata as a Series.

__init__(file_path)[source]

Initializes an instance of SampleRun class.

Parameters

file_path – A string representation of the path to the MS file. Path can be relative or absolute.

file_path: str = None

A string representation of the path to the MS file.

property ms[source]

Access to MS data of a sample run.

property metadata[source]

Access to sample metadata.

property msAIr_hash[source]

Hash value of the SampleRun.

This value was generated when the SampleRun was saved, and re-associated from SampleSet metadata.

save(dir_path, filename)[source]

Save a SampleRun ms data as a msAIr file for fast loading later.

Data is serialized with pickle and compressed via bzip2. A sha256 hash is returned.

init_ms()[source]

Initialize MS data at the SampleRun’s set file_path from a .mzML or .msAIr file.

For a .msAIr file, it is first tested against a sha256 hash, if provided. Data is decompressed via bzip2 and deserialized with pickle.