mispr.gaussian.utilities package

Submodules

mispr.gaussian.utilities.db_utilities module

Define db utility functions.

mispr.gaussian.utilities.db_utilities.get_db(input_db=None)[source]

Helper function to create a GaussianCalcDb instance from a file or a dict.

Parameters:
input_db : str or dict, optional

Path to db file or a dict containing db info.

Returns:

GaussianCalcDb.

mispr.gaussian.utilities.dbdoc module

Define functions for cleaning up JSON documents.

mispr.gaussian.utilities.dbdoc.add_solvent_to_prop_dict(prop_dict, solvent_gaussian_inputs, solvent_properties)[source]

Add solvent properties to a property dictionary (e.g. BDE, BE, etc.).

Parameters:
prop_dict : dict

Property dictionary.

solvent_gaussian_inputs : str

Gaussian input parameters corresponding to the implicit solvent model used in the Gaussian calculations, e.g. “(Solvent=TetraHydroFuran)”.

solvent_properties : dict

Additional solvent input parameters used in the Gaussian calculations; e.g., {“EPS”:12}.

Returns:

Property dictionary with solvent properties added.

Return type:

dict

mispr.gaussian.utilities.files module

Define utility functions for handling files and paths.

mispr.gaussian.utilities.files.bibtex_parser(bib_file, working_dir)[source]

Parse a bibtex file and returns a dictionary of the entries.

Parameters:
bib_file : str

Relative or absolute path to the bibtex file.

working_dir : str

Name of the working directory where the bibtex file is located if bib_file path is relative; else None.

Returns:

Dictionary of the entries in the bibtex file.

Return type:

dict

mispr.gaussian.utilities.files.recursive_relative_to_absolute_path(operand, working_dir)[source]

Convert recursively relative paths to absolute paths.

Parameters:
operand : str, list, dict

File, list of files, or a dictionary where the values are the files; the file(s) path can be relative or absolute.

working_dir : str

Name of the working directory where the file(s) is/are located if operand path is relative; else None.

Returns:

File, list of files, or dict where the values are the

absolute paths.

Return type:

str or list or dict

mispr.gaussian.utilities.fw_utilities module

Define utility functions for modifying workflow settings. Based on atomate powerups.

mispr.gaussian.utilities.fw_utilities.add_common_mods(workflow, fw_mods=None)[source]

Wrapper function to add common modifications to a workflow.

Parameters:
workflow : Workflow

The workflow to modify.

fw_mods : dict, optional

A dictionary of modifications to be applied to the workflow; supported ones are CONTROL_WORKER, MODIFY_QUEUE_PARAMETERS, REPLACE_RUNTASK, and RUN_FAKE_GAUSSIAN (see the docstring of each function for more details); values of the dictionary are the inputs to the corresponding function.

Returns:

The modified workflow.

Return type:

Workflow

mispr.gaussian.utilities.fw_utilities.control_worker(workflow, firework_substring=None, task_substring=None, fworker=None, category=None)[source]

Modify the Firework’s fworker name and category in a workflow. Can be used when running workflows on multiple workers at the same time to specify which worker/machine to use.

Parameters:
workflow : Workflow

The workflow to control.

firework_substring : str, optional

A substring to search for in the Firework names to exclude certain fireworks.

task_substring : str, optional

A substring to search for in the Firetask names to exclude certain Firetasks.

fworker : str, optional

The name of the fworker to use for the Firework; should be consistent with the one specified in the FireWorker (my_fworker.yaml file).

category : str, optional

The category to be assigned for the Firework; should be consistent with the one specified in the FireWorker (my_fworker.yaml file).

Returns:

The modified workflow with the specified fworker and/or category.

Return type:

Workflow

mispr.gaussian.utilities.fw_utilities.get_list_fireworks_and_tasks(workflow, firework_substring=None, task_substring=None)[source]

Return a list of (firework_index, task_index) tuples for all fireworks and tasks in a workflow.

Parameters:
workflow : Workflow

The workflow to search.

firework_substring : str, optional

A substring to search for in the Firework names to exclude certain fireworks.

task_substring : str, optional

A substring to search for in the Firetask names to exclude certain Firetasks.

Returns:

A list of (firework_index, task_index) tuples.

Return type:

list

mispr.gaussian.utilities.fw_utilities.modify_queue_parameters(workflow, ntasks_per_node=None, walltime=None, queue=None, pre_rocket=None, other_parameters=None, firework_substring=None, task_substring=None)[source]

Modify the default Firework’s queue parameters in a workflow. Default ones are specified in the my_qadapter.yaml file. Helpful when different workflows requires different computational resources (e.g. number of CPUs, memory, etc.).

Parameters:
workflow : Workflow

The workflow to modify.

ntasks_per_node : int, optional

The number of tasks to run on each node.

walltime : str, optional

The walltime for the job.

queue : str, optional

The queue/partition to run the job on.

pre_rocket : str, optional

The pre-rocket command to run before the job.

other_parameters : dict, optional

Other parameters to be added to the queueadapter.

firework_substring : str, optional

A substring to search for in the Firework names to exclude certain fireworks.

task_substring : str, optional

A substring to search for in the Firetask names to exclude certain Firetasks.

Returns:

The modified workflow with the specified queue parameters.

Return type:

Workflow

mispr.gaussian.utilities.fw_utilities.replace_runtask(workflow, firework_substring=None, operation='remove_custodian', additional_params=None)[source]

Replace all tasks with RunGaussian (e.g. RunGaussianDirect) with RunGaussianCustodian or vice versa.

Parameters:
workflow : Workflow

The workflow to modify.

firework_substring : str, optional

A substring to search for in the Firework names to exclude certain fireworks.

operation : str, optional

The operation to perform on the Firetask; supported ones are remove_custodian and use_custodian.

additional_params : dict, optional

Additional parameters to be added to the new Firetask that are not included in the original Firetask; refer to the corresponding Firetask documentation for supported parameters.

Returns:

The workflow with the replaced run Firetasks.

Return type:

Workflow

mispr.gaussian.utilities.fw_utilities.run_fake_gaussian(workflow, ref_dirs, input_files=None, tolerance=None)[source]

Replace all tasks with RunGaussian (i.e. RunGaussianDirect, RunGaussianCustodian) with RunGaussianFake that runs a fake Gaussian job. We do not actually run Gaussian but copy existing inputs and outputs. Useful for testing purposes.

Parameters:
workflow : Workflow

The workflow to modify.

ref_dirs : list

A list of directories containing the reference calculations for the fake Gaussian job (e.g. [‘home/opt’, ‘home/freq’]).

input_files : list, optional

A list of input files for the fake Gaussian job; order should match that in ref_dirs; e.g. [“opt.com”, “freq.com”].

tolerance : float, optional

The tolerance for the comparison of the provided input file with the existing one.

Returns:

The workflow with the replaced run Firetasks.

Return type:

Workflow

mispr.gaussian.utilities.gout module

Define functions for processing different gaussian output formats.

mispr.gaussian.utilities.gout.process_run(operation_type, run, input_file=None, **kwargs)[source]

Process a Gaussian run and returns a dictionary of the results. Used for creating db documents and/or json files.

Parameters:
operation_type : str

Type of operation to be performed; supported ones are:

  1. get_from_gout: Get data from a GaussianOutput object as defined in pymatgen.io.gaussian.

  2. get_from_gout_file: Get data from a Gaussian output file.

  3. get_from_run_dict: Get data from a Gaussian output dictionary.

  4. get_from_run_id: Retrieve data from dtabase using a run id, e.g. “5e3737d9da0b1cbbd5d556f7”.

  5. get_from_run_query: Retrieve data from dtabase using query criteria, e.g.

    {"smiles": "COCCOC", "type": "freq", "functional": "B3LYP",
    "basis": "6-31+G*", "phase": "gas", ...}
    

run : GaussianOutput, str, dict

The actual Gaussian run; type depends on the operation_type.

input_file : str, optional

The input file for the run; used for adding Gaussian input parameters to the final Gaussian dictionary; if not specified, will get these parameters from the run itself, but in this case, input_parameters usually specified at the end of the Gaussian input file will not be saved since they are not easily retrieved from the Gaussian output file.

kwargs : keyword arguments

Additional keyword arguments for the operation: namely, working_dir and db.

Returns:

Cleaned up Gaussian output dictionary.

Return type:

dict

mispr.gaussian.utilities.inputs module

Define functions for handling gaussian inputs.

mispr.gaussian.utilities.inputs.handle_gaussian_inputs(gaussian_inputs, solvent_gaussian_inputs=None, solvent_properties=None)[source]

Wrapper function to cleanup/modify the Gaussian input parameters for one or more job in a workflow. Checks for implicit solvent parameters and adds missing keywords for a given job.

Parameters:
gaussian_inputs : dict

Dictionary of dictionaries of Gaussian inputs, e.g.

{"opt": {opt_gaussian_inputs}, "freq": {freq_gaussian_inputs}}

solvent_gaussian_inputs : str, optional

String of Gaussian inputs for the solvent, e.g.

"(Solvent=Generic, Read)"

solvent_properties : dict, optional

Dictionary of solvent properties, e.g.

{"Eps": 4.33, "EpsInf": 1.69}

Returns:

Dictionary of dictionaries of reformatted Gaussian inputs.

Return type:

dict

mispr.gaussian.utilities.metadata module

Define functions for creating db schema.

mispr.gaussian.utilities.metadata.get_chem_schema(mol)[source]

Return a dictionary of chemical schema for a given molecule to use in building db documents or json file.

Parameters:
mol : Molecule

Molecule object.

Returns:

Chemical schema.

Return type:

dict

mispr.gaussian.utilities.metadata.get_job_name(mol, name)[source]

Append a molecule label to the name of a workflow for easy monitoring and identification.

Parameters:
mol : Molecule or str

If a Molecule is provided, the appended label will be the molecular formula; otherwise the label will be the provided string.

name : str

Original name of the workflow.

Returns:

Job name with molecule label.

Return type:

str

mispr.gaussian.utilities.metadata.get_mol_formula(mol)[source]

Get the alphabetical molecular formula for a molecule.

Parameters:
mol : Molecule

Molecule object

Returns:

Alphabetical molecular formula.

Return type:

str

mispr.gaussian.utilities.misc module

Define miscellaneous functions useful in many of the mispr levels.

mispr.gaussian.utilities.misc.pass_gout_dict(fw_spec, key)[source]

Helper function used in the Gaussian Fireworks to pass Gaussian output dictionaries from one task to the other, while checking that the criteria for starting the following task are met (e.g. normal termination of the previous job, lack of imaginary frequencies, etc.).

Parameters:
fw_spec : dict

Firework spec dictionary.

key : str

Unique key for the Gaussian output dictionary in fw_spec.

Returns:

Gaussian output dictionary.

Return type:

dict

mispr.gaussian.utilities.misc.recursive_compare_dicts(dict1, dict2, dict1_name, dict2_name, path='')[source]

Compare recursively two dictionaries and returns the differences.

Parameters:
dict1 : dict

First dictionary to compare.

dict2 : dict

Second dictionary to compare.

dict1_name : str

Name of the first dictionary (for messages on the differences).

dict2_name : str

Name of the second dictionary (for messages on the differences).

path : str, optional

Used internally to keep track of the keys in nested dicts, meant to be “” for the top level

Returns:

Differences between the two dictionaries (if any).

Return type:

str

mispr.gaussian.utilities.misc.recursive_signature_remove(d)[source]

Remove Recursively the signature “@” from a dictionary (e.g. those in the name of a module). Used when processing Gaussian runs before saving them to the db.

Parameters:
d : dict

Dictionary to remove the signature from.

Returns:

Dictionary with the signature removed.

Return type:

dict

mispr.gaussian.utilities.mol module

Define functions for processing molecules.

mispr.gaussian.utilities.mol.get_bond_order_str(mol)[source]

Find bond order as a string (“U”: unspecified, “S”, “D”: double, “T”: triple, “A”: aromatic) by iterating over bonds of a molecule. First convert pymatgen mol to openbabel mol to use openbabel in finding bond order.

Parameters:
mol : Molecule

pymatgen Molecule object.

Returns:

Dictionary of bond orders with keys as tuples of atom indexes forming the

bond and values as bond order.

Return type:

dict

mispr.gaussian.utilities.mol.label_atoms(mol)[source]

Get the SMILES representation of a molecule and label the atoms that appear in the SMILES string with the atom indexes as they appear in the molecule.

Helpful to know the atom indexes in the molecule without having to visualize it.

Parameters:
mol : Molecule

The molecule to be labeled.

Returns:

SMILES string followed by atom indexes.

Return type:

str

mispr.gaussian.utilities.mol.perform_local_opt(mol, force_field='uff', steps=200)[source]

Perform a local optimization on the molecule using OpenBabel.

Parameters:
mol : Molecule

The molecule to be optimized.

force_field : str, optional

The force field to be used for the optimization; options include gaff, ghemical, mmff94, mmff94s, and uff; defaults to uff.

steps : int

The number of steps to be performed in the local optimization; defaults to 200.

Returns:

The optimized molecule.

Return type:

Molecule

mispr.gaussian.utilities.mol.process_mol(operation_type, mol, local_opt=False, **kwargs)[source]

Process a molecule. Used for handling different molecule formats provided to Gaussian workflows.

Parameters:
operation_type : str

Operation to perform for the molecule to process the input structure format. Supported commands:

  1. get_from_mol: If the input is a pymatgen Molecule object.

  2. get_from_file: If the input is any file format supported by Openabel and pymatgen.

  3. get_from_gout_file: If the input is a Gaussian output file.

  4. get_from_str: If the input is a string.

  5. get_from_mol_db: If the input is an InChI representation of the molecule to be used to query the database.

  6. get_from_gout: If the input is a pymatgen GaussianOutput object.

  7. get_from_run_dict: If the input is a GaussianOutput dictionary.

  8. get_from_run_id: If the input is a MongoDB document ID to be used to query the database.

  9. get_from_run_query: If the input is a dictionary with criteria to search the database: e.g.

    {'inchi': inchi,
     'type': type,
     'functional': func, ...}
    
  10. get_from_pubchem: If the input is a common name for the molecule to be used in searching the PubChem database.

  11. derive_molecule: Used for deriving a molecule by attaching a functional group at a site and the corresponding mol input should be a dictionary, e.g.

    {'operation_type': <mol_operation_type for the base structure>,
     'mol': <base_mol>,
     'func_grp': func_group_name, ...}
    
  12. link_molecules: Used for linking two structures by forming a bond at specific sites and the corresponding mol input should be a dictionary, e.g.

    {'operation_type': ['get_from_file', 'get_from_mol_db'],
     'mol': ['mol1.xyz', 'mol_inchi'],
     'index': [3, 5],
     'bond_order': 1}
    

mol : Molecule, str, GaussianOutput, dict

Sources of structure, e.g. file path if mol_operation_type is specified as get_from_file, InChI string if mol_operation_type is specified as get_from_mol_db, etc.

local_opt : bool, optional

Whether to perform local optimization on the input structure using OpenBabel; defaults to False.

**kwargs

Keyword arguments:

  1. working_dir.

  2. db.

  3. str_type (format of string if operation_type = get_from_str, e.g. smi or any other format supported by OpenBabel).

  4. force_field (force field to use for local optimization if local_opt is True): gaff, ghemical, mmff94, mmff94s, and uff.

  5. steps (number of steps for local optimization if local_opt is True).

  6. charge.

  7. abbreviation (abbreviation to be used for the molecule when downloading it from the PubChem database; defaults to mol).

Returns:

pymatgen Molecule object.

Return type:

Molecule

mispr.gaussian.utilities.rdkit module

Define functions for processing rdkit molecules.

mispr.gaussian.utilities.rdkit.calc_energy(rdkit_mol, maxIters=200)[source]

Perform local optimization on rdkit Mol object and calculates its energy using UFF.

Parameters:
rdkit_mol : Mol

RDKit Mol object.

maxIters : int, optional

Maximum number of iterations to perform.

Returns:

Energy of the molecule.

Return type:

float

mispr.gaussian.utilities.rdkit.draw_rdkit_mol(rdkit_mol, filename='mol.png', working_dir=None)[source]

Draw the 2D structure of a molecule and saves it to a file.

Parameters:
rdkit_mol : Mol

RDKit Mol object.

filename : str, optional

Name of the file to save the image to; defaults to “mol.png”.

working_dir : str, optional

Directory to save the image to; defaults to current working directory.

mispr.gaussian.utilities.rdkit.draw_rdkit_mol_with_highlighted_bonds(rdkit_mol, bonds, filename='mol.png', colors=None, working_dir=None)[source]

Draw the 2D structure of a molecule and highlights the bonds specified by the user.

Parameters:
rdkit_mol : Mol

RDKit Mol object.

bonds : list

List of tuples of indexes of atoms forming a bond to highlight; e.g. [(3, 11), (5, 13)] to highlight the bonds between sites 3 and 11 and sites 5 and 13.

filename : str, optional

Name of the file to save the image to; defaults to “mol.png”.

colors : list, optional

List of colors to use for highlighting the bonds; colors should be provided in rgb format, e.g. (0.0, 0.0, 0.0) for black; if not provided or number of colors provided is less than number of bonds to highlight, will randomly generate colors.

working_dir : str, optional

Directory to save the image to; defaults to current working directory.

mispr.gaussian.utilities.rdkit.get_rdkit_mol(mol, sanitize=True, remove_h=False)[source]

Convert a pymatgen mol object to RDKit rdmol object. Uses RDKit to perform the conversion <http://rdkit.org>. Accounts for aromaticity.

Parameters:
mol : Molecule

pymatgen Molecule object.

sanitize : bool, optional

Whether to sanitize the molecule.

remove_h : bool, optional

whether to remove hydrogens.

Returns:

RDKit Mol object.

Return type:

Mol

Module contents