metadata

msAI module for importing sample metadata into dataframes.

Features
  • Extraction of metadata from various file types

  • Importing metadata into a dataframe

  • Verification of metadata usability

  • Auto indexing of metadata

Todo
  • Move .msAIm saving to this module

  • Refactor auto indexing

  • Add anomaly detection

  • Add additional file types: TBD…

msAI.metadata.logger: logging.Logger = <Logger msAI.metadata (DEBUG)>

Module logger.

class msAI.metadata.SampleMetadata(file_path: str, auto_index: bool = True)[source]

Bases: object

Imports sample metadata from a supported file type into a dataframe and assigns an index.

Supported file types: .csv, .msAIm, TBD… (A .msAIm file can be created from a previous SampleSet).

Content from the metadata file is initially imported into a dataframe with a default numerical index. By default, metadata labels and values are analyzed and if possible, a new index is assigned from an existing column. This index is used by SampleSet to match this metadata with corresponding MS data in MSfileSet.

Requirements to auto index metadata imported into a dataframe:
  • Dataframe has 1 or more rows

  • Dataframe has 2 or more columns

  • For one and only one column:

    • All column values are unique

    • All entries/rows have a value for this column

__init__(file_path: str, auto_index: bool = True)[source]

Initializes an instance of SampleMetadata class.

Parameters
  • file_path – A string representation of the path to the metadata file. Path can be relative or absolute.

  • auto_index – A boolean indicating if the metadata should be automatically indexed. Default is True.

Raises

MetadataInitError – For an invalid file type/extension.

file_path: str = None

A string representation of the path to the metadata file.

_hf: DF = None

High fidelity copy of imported data.

Leave this original data untouched for future reference if needed.

df: MetaDF = None

The metadata dataframe.

__repr__()[source]

Returns a string representation of the metadata dataframe.

_verify_import()[source]

Verifies the imported metadata is usable.

Ensures at least one metadata entry/row and at least two metadata labels/columns exist.

Raises

MetadataVerifyError – If No metadata entries or not enough metadata labels are found

_auto_index()[source]

Attempts to identify and set the dataframe index from a metadata label/column.

This index is used to match metadata to SampleRun.

describe()[source]

Prints a summary of metadata contents.

set_index(new_index: str)[source]

Manually sets the metadata dataframe index to an existing label/column.

This index is used to match metadata to SampleRun.

Parameters

new_index – The name of the metadata label/column to use as the index.