samples¶
msAI module to create a unified set of MS data samples paired with any additional metadata.
- Features
Creation of a sample set from a directory of MS data files
Pairing of MS data and sample metadata
Extraction of sample metadata from csv files
Saving / loading data (serialization, compression, checksum)
- Todo
init_ms mp logging calls
Create a msFile subclass form msAIr files - and move loading to that class
-
msAI.samples.logger= <Logger msAI.samples (DEBUG)>¶ Module logger.
-
class
msAI.samples.SampleSet(ms_file_set, *sample_metadata, metadata_inner_merge=False, init_ms=False)[source]¶ Bases:
objectClass to create a dataframe of a set of SampleRuns created from a MSfileSet and paired with 0 or more SampleMetadata.
SampleMetadata objects provide a dataframe with a matching index to MSfileSet.
A dataframe is created from a set of MS data files (MSfileSet) and joined with matching SampleMetadata. By default (metadata_inner_merge=False), all files in the passed MSfileSet will be included- even if no matching metadata is found. Passing metadata_inner_merge=True, will only include MS files that have matching metadata for every SampleMetadata included.
SampleRun objects are created for each MS file when the SampleSet is created, but MS data is not initialized until called.
-
__init__(ms_file_set, *sample_metadata, metadata_inner_merge=False, init_ms=False)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
static
_set_run_metadata(sample_name, run, metadata)[source]¶ Adds metadata to SampleRuns, if possible.
Samples with missing metadata are logged.
-
_create_sampleruns()[source]¶ Creates of SampleRuns for all samples in the SampleSet.
Multi or single process according to MP_SUPPORT.
-
_create_sampleruns_sp()[source]¶ Single-process creation of SampleRuns for all samples in the SampleSet.
-
static
_create_samplerun_mpf(row)[source]¶ Multiprocessing function to create a single SampleRun for a row/sample in the SampleSet.
-
_create_sampleruns_mp()[source]¶ Multiprocess creation of SampleRuns for all samples in the SampleSet.
-
_init_all_ms_sp()[source]¶ Single-process initialization of MS data for all samples in the SampleSet.
-
static
_init_ms_mpf(row)[source]¶ Multiprocessing function to initialize the MS data of a single SampleRun (a row of a SampleSet).
-
static
_save_ms_mpf(dir_path, row)[source]¶ Multiprocessing function to save the MS data of a single SampleRun (a row of a SampleSet).
-
property
df[source]¶ Get a dataframe of sample runs paired with sample metadata.
Index: name (from filename) Columns: type, size_MB, path, (metadata…), run (python object)
-
init_all_ms()[source]¶ Initializes MS data for all samples in the SampleSet.
Multi or single process according to MP_SUPPORT.
-
save_all_ms(dir_path)[source]¶ Saves MS data for all samples in the set as .msAIr files (in dir_path) and add hash value to metadata (msAIr_hash).
Multi or single process according to MP_SUPPORT.
-
save_metadata(dir_path, filename)[source]¶ Saves all metadata for a SampleSet as a .msAIm file.
This enables faster loading when recreating a sample set, and verification of msAIr_hash values.
Contents will include all metadata passed at SampleSet creation + msAIr hash values (if created). MSfile data and SampleRuns are not included, as data paths may change.
Data is serialized with pickle and compressed via bzip2. A sha256 hash is returned.
-
-
class
msAI.samples.SampleRun(file_path)[source]¶ Bases:
objectHolds data from a MS analysis run of a sample and any additional metadata.
A
SampleRuninstance is created with a path reference that is used to create a futureMSfile, or load MS data from a previously savedSampleRun. This allows a cheep view of this data to exist without importing it all into memory. A very large number ofSampleRuninstances can be created and their MS data initialized when needed.Typically,
SampleRuninstances are not manually created, but instead arise from aSampleSet.Data is extracted from a supported MS file type or loaded from a previous msAIr save. File type is determined by file extension (.mzML .msAIr). A sha256 hash may be provided for a .msAIr file which will be verified during init_ms().
-
_ms: msAI.msData.MSfile = None¶ MS data from a`.MSfile` or a msAIr save.
-
__init__(file_path)[source]¶ Initializes an instance of SampleRun class.
- Parameters
file_path – A string representation of the path to the MS file. Path can be relative or absolute.
-
file_path: str = None¶ A string representation of the path to the MS file.
-
property
msAIr_hash[source]¶ Hash value of the SampleRun.
This value was generated when the SampleRun was saved, and re-associated from SampleSet metadata.
-