miscUtils¶
Miscellaneous utilities used by msAI.
- Todo
Add type info for funcs passed as arguments
-
msAI.miscUtils.logger= <Logger msAI.miscUtils (DEBUG)>¶ Module logger.
-
class
msAI.miscUtils.FileGrabber[source]¶ Bases:
objectFunctions to grab files.
-
static
multi_extensions(directory: str, *extensions: str, recursive: bool = True) → Iterable[pathlib.Path][source]¶ Creates an iterator of path objects to all files in a directory matching the passed extensions.
Use
str(path_obj)to get the platform independent path string. Subdirectories will be recursively searched by default.- Parameters
directory – A string representation of the path to the directory. Path can be relative or absolute.
extensions – One or more file extensions specified as strings without leading (.).
recursive – A boolean indicating if files in subdirectories are included. Defaults to
True.
- Returns
An iterator of path objects to all files found.
-
static
path_type(directory: str = '.') → str[source]¶ Get the path type of a directory.
Path type is identified by the class of Path object created. This test is used for determining what glob patterns to apply based on path case sensitivity. Windows paths are case insensitive, while Posix paths are case sensitive.
- Parameters
directory – A string representation of the path to the directory. Path can be relative or absolute. Defaults to current directory.
- Returns
A string of either
'posix'or'windows', indicating the path type.- Raises
MiscUtilsError – For unknown path type.
-
static
-
class
msAI.miscUtils.Sizer[source]¶ Bases:
objectFunctions to measure memory / storage sizes.
-
static
obj_mb(obj: object) → float[source]¶ Measures the memory size of a python object in MBs.
- Parameters
obj – The python object to measure.
- Returns
The Python object’s size in memory in MBs.
-
static
print_obj_mb(obj: object)[source]¶ Prints the memory size of a python object in MBs to 4 decimals.
- Parameters
obj – The python object to measure.
-
static
-
class
msAI.miscUtils.Saver[source]¶ Bases:
objectFunctions to save / load, serialize, and compress files and objects.
-
static
save_obj(obj: object, file: str) → str[source]¶ Saves a python object to the path / filename given.
Data is serialized with pickle and compressed via bzip2. A sha256 hash is also calculated.
- Parameters
obj – The python object to save.
file – A string representation of the path to the file to save. Path can be relative or absolute.
- Returns
A sha256 hash as a string.
-
static
get_hash(file: str) → str[source]¶ Calculates the sha256 hash of a file.
- Parameters
file – A string representation of the path to the file to calculate a hash for. Path can be relative or absolute.
- Returns
A sha256 hash as a string.
-
static
verify_hash(file: str, test_hash: str) → bool[source]¶ Verifies the sha256 hash of a file.
- Parameters
file – A string representation of the path to the file to calculate and compare hash value for. Path can be relative or absolute.
test_hash – A sha256 hash as a string to test against.
- Returns
A boolean indicating if the hash value is verified.
Truemeans the calculated hash matches the test hash.
-
static
load_obj(file: str, test_hash: Optional[str] = None) → Tuple[object, Optional[bool]][source]¶ Loads a previously saved object.
The file will be tested against a sha256 hash, if provided. Data is decompressed via bzip2 and deserialized with pickle.
- Parameters
file – A string representation of the path to the file to load the object from. Path can be relative or absolute.
test_hash – A sha256 hash as a string to test against.
- Returns
A tuple of the object and an optional boolean indicating if the hash of the saved file was verified.
-
static
-
class
msAI.miscUtils.MultiTaskDF[source]¶ Bases:
objectFunctions to parallelize work on dataframes through multiprocessing.
-
static
_partition_by_rows(df_in: pandas.core.frame.DataFrame, subset_func) → pandas.core.frame.DataFrame[source]¶ Partitions a dataframe into subsets across rows and assigns a worker to each to apply a function.
Creates a process pool with a number of workers equal to cpu count (by default), and splits the dataframe
df_ininto a number of subsets equal to number of workers. Each worker applies thesubset_functo a dataframe subset in parallel.- Parameters
df_in – The input dataframe.
subset_func – A partial object containing the function to apply to each dataframe subset. This is received as a partial object, and its call input is completed with a dataframe subset after the dataframe is split.
Returns: A dataframe formed by concating all subset results.
-
static
_run_on_subset_rows(func, df_subset: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶ Applies a function to each row in a dataframe subset.
Rows are passed to
funcasSeriesobjects whose index is the dataframe’s columns.- Parameters
func – The function to apply to each row in the
df_subset. This function must be a static method and return the row, reflecting the results. Additional arguments can be passed with a partial object by the caller.df_subset – A dataframe subset, to which a single worker applies
functo all rows.
Returns: A dataframe reflecting the changes from the applied
func.
-
static
parallelize_on_rows(df: pandas.core.frame.DataFrame, func) → pandas.core.frame.DataFrame[source]¶ Applies a function to rows in a dataframe in parallel.
- Parameters
df – The input dataframe.
func – The function to apply to each row in the
df. This function must be a static method and return the row, reflecting the results. Additional arguments can be passed with a partial object by the caller.
Returns: A new dataframe reflecting the changes from the applied
func.
-
static
-
class
msAI.miscUtils.EnvInfo[source]¶ Bases:
objectFunctions to get info about the environment running python.
-
static
mp_method() → str[source]¶ Get a string describing the start method used by the multiprocessing module to create new processes.
- Defaults are set according to OS type:
- POSIX = ‘fork’Windows = ‘spawn’
Use this function to test and switch to single processing if necessary. Certain functions will fail under the spawn start method.
-
static