fragmenstein.victor package

class fragmenstein.Victor(after Dr Victor Frankenstein) is a class that uses both Monster (makes blended compounds)[source]

Bases: _VictorUtils, _VictorValidate, _VictorCombine, _VictorPlace

and Igor (energy minimises). This master reanimator keeps a .journal (logging, class attribute).

The constructor sets the protein detail. While, place or combine deal do the analyses. These are the attributes of the class or instance (not the constructor, which are below):

Variables:
  • apo_pdbblock (str) – The apo protein template PDB block (inputted)

  • atomnames (Union[None, List, Dict]) – an optional dictionary that gets used by Params.from_smiles to assigned atom names (inputted)

  • category (None/str) – MPro only.

  • constrained_atoms (int) – number of atoms constrained (dynamic)

  • constraint (Constraints) – constrains object from rdkit_to_params

  • constraint_function_type (str) – name of constraint function. Best not to change.

  • covalent_definitions (list) – definitions of warheads (advanced)

  • covalent_resi (str) – the residue index (PDB) for the covalent attachment (int or int + chain) or reference residue

  • covalent_resn (str) – reference residue name3. the residue name for the covalent attachment. For now can only be ‘CYS’ (or anything else if not covalent)

  • energy_score (dict) – dict of splits of scores

  • error_msg (str) – error message if an error of the type error_to_catch was raised and caught

  • error_to_catch (tuple) – catch error_to_catch.

  • extra_constraint (str) – extra constraint text

  • hits (List[Chem.Mol])

  • igor (Igor) – igor object

  • is_covalent (bool/None)

  • joining_cutoff (float) – max distance between joining mol

  • monster_random_seed (int) – random seed for rdkit embedding

  • journal (Logger) – log

  • ligand_resi (str) – the residue index (PDB) for the ligand.

  • ligand_resn (str) – the residue name for the ligand.

  • long_name (str) – name for files

  • merging_mode (str)

  • minimized_mol (Mol)

  • minimized_pdbblock (str)

  • merging_mode

  • modifications (dict)

  • mol (Mol)

  • monster (Monster)

  • monster_average_position (bool)

  • monster_mmff_minisation (bool)

  • monster_throw_on_discard (bool)

  • mrmsd (mRSMD)

  • params (Params)

  • pose_fx (function)

  • possible_definitions (list)

  • quick_reanimation (bool)

  • reference_mol (NoneType)

  • smiles (str)

  • tick (float)

  • tock (float)

  • unbound_pose (Pose)

  • unconstrained_heavy_atoms (int)

  • unminimized_pdbblock (str)

  • warhead_definitions (list)

  • warhead_harmonisation (str)

  • work_path (str) – class attr. where to save stuff

warhead_definitions and covalent_definitions are class attributes that can be modified beforehand to allow a new attachment. covalent_definitions is a list of dictionaries of ‘residue’, ‘smiles’, ‘names’, which are needed for the constraint file making. Namely smiles is two atoms and the connection and names is the names of each. Cysteine is {'residue': 'CYS', 'smiles': '*SC', 'names': ['CONN3', 'SG', 'CB']}. While warhead_definitions is a list of ‘name’ (name of warhead for humans), ‘covalent’ (the smiles of the warhead, where the zeroth atom is the one attached to the rest), ‘noncovalent’ (the warhead unreacted), ‘covalent_atomnames’ and ‘noncovalent_atomnames’ (list of atom names). The need for atomnames is actually not for the code but to allow lazy tweaks and analysis downstream (say typing in pymol: show sphere, name CX).

MMFF_score(mol: Mol | None = None, delta: bool = False) float
class Monster(hits: List[Mol], average_position: bool = False, joining_cutoff: float = 5, random_seed: int | None = None)

Bases: _MonsterFF, _MonsterPlace, _MonsterCombine

This creates a stitched together monster. For initilialisation for either placing or combining, it needs a list of hits (rdkit.Chem.Mol).

Note, the hits have to be 3D embedded as present in the protein —it would defeat the point otherwise! For a helper method to extract them from crystal structures see Victor.extract_mol.

The calculation are done either by place or merge.

## Place

>>> monster.place(mol)

Given a RDKit molecule and a series of hits it makes a spatially stitched together version of the initial molecule based on the hits. The reason is to do place the followup compound to the hits as faithfully as possible regardless of the screaming forcefields.

  • .mol_options are the possible equiprobable alternatives.

  • .positioned_mol is the desired output (rdkit.Chem.Mol object)

  • .initial_mol is the input (rdkit.Chem.Mol object), this is None in a .combine call.

  • .modifications['scaffold'] is the combined version of the hits (rdkit.Chem.Mol object).

  • .modifications['chimera'] is the combined version of the hits, but with differing atoms made to match the followup (rdkit.Chem.Mol object).

.get_positional_mapping, which works also as a class method, creates a dictionary of mol_A atom index to mol_B atom index based on distance (cutoff 2Å) and not MCS.

The code works in two broad steps, first a scaffold is made, which is the combination of the hits (by position). Then the followup is placed. It is not embedded with constrained embedding functionality of RDKit as this requires the reference molecule to have a valid geometry, which these absolutely do not have this. Novel side chains are added by aligning an optimised conformer against the closest 3-4 reference atoms. Note that .initial_mol is not touched. .positioned_mol may have lost some custom properties, but the atom indices are the same.

If an atom in a Chem.Mol object is provided via attachment argument and the molecule contains a dummy atom. Namely element R in mol file or * in string.

## Combine

>>> monster.combine(keep_all=True, collapse_rings=True, joining_cutoff= 5))

Combines the hits by merging and linking. collapse_rings argument results in rings being collapsed before merging to avoid oddities. The last step within the call is fixing any oddities of impossible chemistry via the call rectify. This uses the separate class Rectifier to fix it.

## Attributes

Common input derived

Variables:
  • hits (list)

  • throw_on_discard (bool) – filled by keep_all

Common derived

Variables:
  • matched (List[str]) – (dynamic) accepted hit names

  • unmatched (List[str]) – discarded hit names

  • journal (Logger) – The “journal” is the log of Dr Victor Frankenstein (see Victor for more)

  • modifications (dict) – copies of the mols along the way

  • mol_options (list) – equally valid alternatives to self.positioned_mol

place specific:

Variables:
  • positioned_mol (Mol)

  • attachment (NoneType)

  • initial_mol (NoneType)

  • average_position (bool)

  • num_common (int) – (dynamic) number of atoms in common between follow-up and hits

  • percent_common (float) – (dynamic) percentage of atoms of follow-up that are present in the hits

combine specific:

Variables:
  • joining_cutoff (int) – how distant (in Å) is too much?

  • atoms_in_bridge_cutoff (int) – how many bridge atoms can be deleted? (0 = preserves norbornane, 1 = preserves adamantane)

Class attributes best ignored:

Variables:
  • closeness_weights (list) – list of functions to penalise closeness (ignore for most applications)

  • dummy (Mol) – The virtual atom where the targets attaches. by default *. Best not override.

  • dummy_symbol (str) – The virtual atom where the targets attaches. by default *. Best not override.

  • matching_modes (list)

MMFF_score(mol: Mol | None = None, delta: bool = False, mode: str = 'MMFF') float

Merck force field. Chosen over Universal for no reason at all.

Parameters:
  • mol (Chem.Mol optional. If absent extracts from pose.) – ligand

  • delta (bool) – report difference from unbound (minimized)

  • mode (str) – ‘MMFF’ or ‘UFF’

Returns:

kcal/mol

Return type:

float

Warning:

This was moved out of Igor. Victor has the method for calling it with igor.mol_from_pose

__init__(hits: List[Mol], average_position: bool = False, joining_cutoff: float = 5, random_seed: int | None = None)

Initialisation starts Monster, but it does not do any mergers or placements. This is changed in revision 0.6 (previously mol was specified for the latter)

Parameters:
  • hits

  • average_position

  • joining_cutoff – joining cutoff used in “full” mode

  • random_seed – A random seed for rdkit embedding calculations during placement

atoms_in_bridge_cutoff = 2
by_expansion(primary_name: str | None = None, min_mode_index: int = 0) Mol

Get the maps. Find the map with the most atoms covered. Use that map as the base map for the other maps.

closeness_weights = [(<function _MonsterCommunal._closest__is_warhead_marked>, nan), (<function _MonsterCommunal._closest__is_fullbonded>, 1.0), (<function _MonsterCommunal._closest__is_ring_atom>, 0.5)]
collapse_mols(mols: List[Mol])
collapse_ring(mol: Mol) Mol

Collapses a ring(s) into a single dummy atom(s). Stores data as JSON in the atom.

Parameters:

mol

Returns:

combine(keep_all: bool = True, collapse_rings: bool = True, joining_cutoff: int = 5)

Merge/links the hits. (Main entrypoint)

Parameters:
  • keep_all

  • collapse_rings

  • joining_cutoff

Returns:

convert_origins_to_custom_map(mol: Mol | None = None, forbiddance=True) Dict[str, Dict[int, int]]

The origins stored in the followup differ in format from the custom_map. The former is a list of lists of hit_name+atom_index, while the latter is a dictionary of hit_name to dictionary of hit atom indices to _intended_ followup index. This method converts the former to the latter. If forbiddance is True, non mapping atoms are marked with negatives.

If mol is None, then self.positioned_mol is used.

Returns:

custom_map: Dict[str, Dict[int, int]]
cutoff = 2
draw_nicely(mol, show=True, **kwargs) MolDraw2DSVG

Draw with atom indices for Jupyter notebooks.

Parameters:
  • mol

  • kwargs – Key value pairs get fed into PrepareAndDrawMolecule.

Returns:

dummy = <rdkit.Chem.rdchem.Mol object>

The virtual atom where the targets attaches

dummy_symbol = '*'
expand_custom_map(custom_map: Dict[str, Dict[int, int]], addend: Dict[str, Dict[int, int]]) Dict[str, Dict[int, int]]
expand_ring(mol: Mol) Mol

Undoes collapse ring

Parameters:

mol – untouched.

Returns:

classmethod extract_atoms(protein: Mol, keepers: List[int], expand_aromatics: bool = True) Mol

Extract the given atom indices (keepers) from protein. Expanding to full aromatic ring and copying conformers

extract_from_neighborhood(system: Mol) Mol

Given a system of a neighbourhood + ligand extract everything that is not marked IsNeighborhood.

fix_custom_map(custom_map: Dict[str, Sequence[Tuple[int, int]] | Dict[int, int]]) Dict[str, Dict[int, int]]

This is duplicated in SpecialCompareAtoms, but will be deprecated in favour of this one.

Make sure its Dict[str, Dict[int, int]]

There is a bit of confusion about the custom map. Converts the custom map from dict of lists of 2-element tuples to dict of dicts.

fix_hits(hits: List[Mol]) List[Mol]

Adds the _Name Prop if needed asserts everything is a Chem.Mol calls store_positions :param hits: :return:

full_blending() None

a single scaffold is made (except for .unmatched)

get_atom_map_fromProp(mol)
classmethod get_best_scoring(mols: List[RWMol]) Mol

Sorts molecules by how well they score w/ Merch FF

classmethod get_close_indices(query: Mol, target: Mol, cutoff: float = 5.0) List[int]

Give an rdkit Chem.Mol query get the atom idices of target that are with cutoff Å.

get_color_origins() Dict[str, Dict[int, str]]

Get for the hits and followup the color of the origin as seen in show_comparison :return:

classmethod get_combined_rmsd(followup_moved: Mol, followup_placed: Mol | None = None, hits: List[Mol] | None = None) float

Depracated. The inbuilt RMSD calculations in RDKit align the two molecules, this does not align them. This deals with the case of multiple hits. For euclidean distance the square root of the sum of the differences in each coordinates is taken. For a regular RMSD the still-squared distance is averaged before taking the root. Here the average is done across all the atom pairs between each hit and the followup. Therefore, atoms in followup that derive in the blended molecule by multiple atom are scored multiple times.

As a classmethod followup_placed and hits must be provided. But as an instance method they don’t.

Parameters:
  • followup_moved – followup compound moved by Igor or similar

  • followup_placed – followup compound as placed by Monster

  • hits – list of hits.

Returns:

combined RMSD

get_hit_by_name(name: str) Mol

Given a name of a hit (as defined in _Name property), return the hit. Do note fix_hits will have been called, so the name may be assigned. :param name: :return:

get_largest_fragment(mol)
get_legend(show_positioned_mol: bool = True) str
get_mcs_mapping(hit, followup, min_mode_index: int = 0) Tuple[Dict[int, int], dict]

This is a weird method. It does a strict MCS match. And then it uses laxer searches and finds the case where a lax search includes the strict search.

Parameters:
  • hit – query molecule

  • followup – target/ref molecule

  • min_mode_index – the lowest index to try (opt. speed reasons)

Returns:

mapping and mode

get_mcs_mappings(hit: Mol, followup: Mol, min_mode_index: int = 0, custom_map: Dict[str, Dict[int, int]] | None = None) Tuple[List[Dict[int, int]], ExtendedFMCSMode]

This is a curious method. It does a strict MCS match. And then it uses laxer searches and finds the case where a lax search includes the strict search.

Parameters:
  • hit – query molecule

  • followup – target/ref molecule

  • min_mode_index – the lowest index to try (opt. speed reasons)

  • custom_map – is the user defined hit name to list of tuples of hit and followup index pairs

Returns:

mappings and mode

get_neighborhood(apo_block: str, cutoff: float, mol: Mol | None = None, addHs=True) Mol

Get the neighborhood of the protein from the apo_block around the cutoff of the mol. Note: The atoms will have a prop IsNeighborhood which is used after it is combined.

classmethod get_pair_rmsd(molA, molB, mapping: List[Tuple[int, int]]) float
classmethod get_positional_mapping(mol_A: Mol, mol_B: Mol, dummy_w_dummy=True) Dict[int, int]

Returns a map to convert overlapping atom of A onto B Cutoff 2 &Aring; (see class attr.)

Parameters:
  • mol_A – first molecule (Chem.Mol) will form keys

  • mol_B – second molecule (Chem.Mol) will form values

  • dummy_w_dummy – match */R with */R.

Returns:

dictionary mol A atom idx -> mol B atom idx.

guess_origins(mol: Mol = None, hits: List[Mol] | None = None)

Given a positioned mol guess its origins…

Parameters:

mol

Returns:

static inspect_amide_torsions(mol)

The most noticeable torsions are the amide ones. This is to describe what is happening.

join_neighboring_mols(mol_A: Mol, mol_B: Mol)

Joins two molecules by first calling _find_closest to find closest. That method does all the thinking. then by calling _join_atoms.

Parameters:
  • mol_A

  • mol_B

:return:Ï

journal = <Logger Fragmenstein (DEBUG)>
keep_copies(mols: List[Mol], label=None)
keep_copy(mol: Mol, label=None)
property linker_atom_zahl

Getter for linker_atom_zahl To change set linker_element class property.

linker_element = 'O'
make_chimera(template: Mol, min_mode_index=0) Mol

This is to avoid extreme corner corner cases. E.g. here the MCS is ringMatchesRingOnly=True and AtomCompare.CompareAny, while for the positioning this is not the case.

Called by full and partial blending modes.

Returns:

make_ideal_mol(mol: Mol | None = None, ff_minimise: bool = False) Mol
make_pse(filename='test.pse', extra_mols: Iterable[Mol] | None = None)

This is specifically for debugging the full fragment merging mode. For general use. Please use the Victor method make_pse.

Parameters:

filename

Returns:

property matched: List[str]

This is the counter to unmatched. It’s dynamic as you never know…

Returns:

matching_modes = [{'atomCompare': rdkit.Chem.rdFMCS.AtomCompare.CompareAny, 'bondCompare': rdkit.Chem.rdFMCS.BondCompare.CompareAny, 'ringCompare': rdkit.Chem.rdFMCS.RingCompare.PermissiveRingFusion, 'ringMatchesRingOnly': False}, {'atomCompare': rdkit.Chem.rdFMCS.AtomCompare.CompareAny, 'bondCompare': rdkit.Chem.rdFMCS.BondCompare.CompareOrder, 'ringCompare': rdkit.Chem.rdFMCS.RingCompare.PermissiveRingFusion, 'ringMatchesRingOnly': False}, {'atomCompare': rdkit.Chem.rdFMCS.AtomCompare.CompareElements, 'bondCompare': rdkit.Chem.rdFMCS.BondCompare.CompareOrder, 'ringCompare': rdkit.Chem.rdFMCS.RingCompare.PermissiveRingFusion, 'ringMatchesRingOnly': False}, {'atomCompare': rdkit.Chem.rdFMCS.AtomCompare.CompareAny, 'bondCompare': rdkit.Chem.rdFMCS.BondCompare.CompareAny, 'ringCompare': rdkit.Chem.rdFMCS.RingCompare.PermissiveRingFusion, 'ringMatchesRingOnly': True}, {'atomCompare': rdkit.Chem.rdFMCS.AtomCompare.CompareAny, 'bondCompare': rdkit.Chem.rdFMCS.BondCompare.CompareOrder, 'ringCompare': rdkit.Chem.rdFMCS.RingCompare.PermissiveRingFusion, 'ringMatchesRingOnly': True}, {'atomCompare': rdkit.Chem.rdFMCS.AtomCompare.CompareElements, 'bondCompare': rdkit.Chem.rdFMCS.BondCompare.CompareOrder, 'ringCompare': rdkit.Chem.rdFMCS.RingCompare.PermissiveRingFusion, 'ringMatchesRingOnly': True}]
max_from_mol(mol: Mol = None)
merge_pair(scaffold: Mol, fragmentanda: Mol, mapping: Optional = None) Mol

To specify attachments use .merge. To understand what is going on see .categorize

Parameters:
  • scaffold – mol to be added to.

  • fragmentanda – mol to be fragmented

  • mapping – see get_positional_mapping. Optional in _pre_fragment_pairs

Returns:

merge_pairing_lists(nonunique_pairs: List[Tuple[Atom, Atom]], ringcore_first=True) List[Tuple[Atom, Atom]]
mmff_minimize(mol: Mol | None = None, neighborhood: Mol | None = None, ff_max_displacement: float = 0.0, ff_constraint: int = 10, ff_max_iterations: int = 200, ff_cutoff: float = 100.0, allow_lax: bool = True, prevent_cis: bool = True) MinizationOutcome

Minimises a mol, or self.positioned_mol if not provided, with MMFF constrained to ff_max_displacement Å. Gets called by Victor if the flag .monster_mmff_minimisation is true during PDB template construction.

Parameters:
  • mol – Molecule to minimise. If None, self.positioned_mol is used.

  • neighborhood – Protein neighboorhood (ignored if None)

  • ff_max_displacement – Distance threshold (Å) for atomic positions mapped to hits for MMFF constrains. if NaN then fixed point constraints (no movement) are used. This is passed as maxDispl to MMFFAddPositionConstraint.

  • ff_constraint – Force constant for MMFF constraints.

  • ff_cutoff – kcal/mol diff value to consider a failed minimisation.

  • allow_lax – If True and the minimisation fails, the constraints are halved and the minimisation is rerun.

Returns:

None

Note that most methods calling this via Victor now use its .settings['ff_max_displacement'] and .settings['ff_constraint'] and do not use the defaults.

no_blending(broad=False) None

no merging is done. The hits are mapped individually. Not great for small fragments.

property num_common: int
offset(mol: Mol)

This is to prevent clashes. The numbers of the ori indices stored in collapsed rings are offset by the class variable (_collapsed_ring_offset) multiples of 100. (autoincrements to avoid dramas)

Parameters:

mol

Returns:

origin_from_mol(mol: Mol = None)

these values are stored from Monster for scaffold, chimera and positioned_mol See make_chimera or place_from_map for more info on _Origin

Parameters:

mol – Chem.Mol

Returns:

stdev list for each atom

partial_blending() None

multiple possible scaffolds for placement and best is chosen

partially_blend_hits(hits: List[Mol] | None = None) List[Mol]

This is the partial merge algorithm, wherein the hits are attempted to be combined. If the combination is bad. It will not be combined. Returning a list of possible options. These will have the atoms changed too.

Parameters:
  • hits

  • distance

Returns:

property percent_common: int
pick_best() Tuple[Mol, int]

Method for partial merging for placement

Returns:

unrefined_scaffold, mode_index

place(mol: Mol, attachment: Mol | None = None, custom_map: Dict[str, Dict[int, int]] | None = None, merging_mode: str = 'expansion', enforce_warhead_mapping: bool = True, primary_name=None)

Positioned a given mol based on the hits. (Main entrypoint) accepts the argument merging_mode, by default it is “expansion”, but was “permissive_none”, which call .by_expansion and .no_blending(broad=True) respectively. “off” (does nothing except fill the attribute initial_mol), “full” (.full_blending()), “partial” (.partial_blending()) and “none” (.no_blending(), but less thorough) are accepted.

Parameters:
  • mol

  • attachment – This the SG of the cysteine if covalent

  • custom_map – Dict of hit_name to Dict of hit_idx to followup_idx

  • merging_mode

  • primary_name – override the name of the primary hit if merging_mode is ‘expansion’

Returns:

place_from_map(target_mol: Mol, template_mol: Mol, atom_map: Dict | None = None, random_seed=None) Mol

This method places the atoms with known mapping and places the ‘uniques’ (novel) via an aligned mol (the ‘sextant’) This sextant business is a workaround for the fact that only minimised molecules can use the partial embedding function of RDKit.

The template molecule may be actually two or more fragments, as happens for the no blending mode. In RDKit, the fragments within a “molecule” are not connected, but have sequential atom indices.

Parameters:
  • target_mol – target mol

  • template_mol – the template/scaffold to place the mol

  • atom_map – something that get_mcs_mapping would return.

Returns:

place_smiles(smiles: str, long_name: str | None = None, **kwargs)
post_ff_addition_step(mol: ~rdkit.Chem.rdchem.Mol, ff: <module 'rdkit.ForceField' from '/home/docs/checkouts/readthedocs.org/user_builds/fragmenstein/envs/latest/lib/python3.11/site-packages/rdkit/ForceField/__init__.py'>)

THis is an empty method for user created subclasses to add their own constraints to the MMFF minimisation.

posthoc_refine(scaffold: Mol, indices: List[int] | None = None) Mol

Given a scaffold and a list of indices, refine the scaffold.

Parameters:
  • scaffold

  • indices – if absent, use all atoms

Returns:

pretweak() None

What if the fragments were prealigned slightly? Really bad things happen. Nothing currently uses this without user interverntion.

Returns:

propagate_alternatives(fewer: List[Mol]) int

Given the alt atoms strored in the Chem.Atom property _AltSymbol try those

rectify()
static renumber_followup_custom_map(original_mol: Mol, new_mol: Mol, custom_map: Dict[str, Dict[int, int]]) Dict[str, Dict[int, int]]

Give a followup original_mol and a copy but with its atom indices changed return a new map for the copy

sample_new_conformation(random_seed=None)

This method is intended for Multivictor. It generates a new conformation based on different random seeds.

save_commonality(filename: str | None = None)

Saves an SVG of the followup fragmenstein monster with the common atoms with the chimeric scaffold highlighted.

Parameters:

filename – optinal filename to save it as. Otherwise returns a Draw.MolDraw2DSVG object.

Returns:

save_temp(mol)

This is a silly debug-by-print debug method. drop it in where you want to spy on stuff.

static score_mol(mol: Mol) float

Scores a mol without minimising

show(to_display: bool = False, show_positioned_mol: bool = True, viewer_mode='rdkit') view' id='133176890420560'>, str]

generates a NGLWidget or a py3Dmol.view depending on viewer_mode, which is set by default to what’s installed. With the compounds and the merger if present. To override the colours: The colours will be those in the Chem.Mol’s property _color if present.

to_display if True will display the legend and the viewer. (IPython.display.display will show them too)

Returns -> nv.NGLWidget

show_comparison(*args, **kwargs)

” Show the atom provenance of the follow molecule.

Parameters:

  • nothing -> uses self.origin_from_mol

  • hit2followup -> uses the hit2followup dict

simply_merge_hits(hits: List[Mol] | None = None, linked: bool = True) Mol

Recursively stick the hits together and average the positions. This is the monster of automerging, full-merging mapping and partial merging mapping. The latter however uses partially_blend_hits first. The hits are not ring-collapsed and -expanded herein.

Parameters:
  • hits – optionally give a hit list, else uses the attribute .hits.

  • linked – if true the molecules are joined, else they are placed in the same molecule as disconnected fragments.

Returns:

the rdkit.Chem.Mol object that will fill .scaffold

stdev_from_mol(mol: Mol = None)

these values are stored from Monster for scaffold, chimera and positioned_mol

Parameters:

mol – Chem.Mol

Returns:

stdev list for each atom

store_origin_colors_atomically()

Store the color of the origin in the mol as the private property _color :return:

store_positions(mol: Mol) Mol

Saves positional data as _x, _y, _z and majorly _ori_i, the original index. The latter gets used by _get_new_index.

Parameters:

mol

Returns:

strict_matching_mode = {'atomCompare': rdkit.Chem.rdFMCS.AtomCompare.CompareElements, 'bondCompare': rdkit.Chem.rdFMCS.BondCompare.CompareOrder, 'matchChiralTag': True, 'ringCompare': rdkit.Chem.rdFMCS.RingCompare.StrictRingFusion, 'ringMatchesRingOnly': True}
throw_on_discard = False
to_3Dmol(show_positioned_mol: bool = True, *args, **kwargs) <Mock name='py3Dmol.view' id='133176890420560'>

User: you probably want to use show() instead… unless you have both ngl and py3Dmol installed.

This is called by both Monster.to_3Dmol and Victor.to_3Dmol

The args and kwargs do nothing

to_nglview(show_positioned_mol: False, *args, **kwargs) <Mock name='MolNGLWidget' id='133176890126032'>

User: you probably want to use show() instead… unless you have both ngl and py3Dmol installed.

This is called by both Monster.to_nglview and Victor.to_nglview The color can be dictated by the optional private property _color, which may have been assigned by walton.color_in() or the user.

The args and kwargs do nothing

Returns:

transfer_ring_data(donor: Atom, acceptor: Atom)

Transfer the info if a ringcore atom.

Parameters:
  • donor

  • acceptor

Returns:

__init__(hits: List[Mol], pdb_filename: None | str = None, pdb_block: None | str = None, ligand_resn: str = 'LIG', ligand_resi: int | str | None = None, covalent_resn: str = 'CYS', covalent_resi: int | str | None = None, extra_protein_constraint: str = None, pose_fx: Callable | None = None, monster_random_seed: int | None = None, **settings) object

Initialise Victor in order to allow either combinations (merging/linking without a given aimed for molecule) or placements (using a given aimed for molecule).

param hits:

list of rdkit molecules

param pdb_filename:

file of apo structure

param pdb_block:

alternative for above: a string of apo structure

param ligand_resn:

3 letter code or your choice

param ligand_resi:

Rosetta-style pose(int) or pdb(str)

param covalent_resn:

only CYS accepted. if smiles has no * it is ignored

param covalent_resi:

Rosetta-style pose(int) or pdb(str)

param extra_protein_constraint:

multiline string of constraints relevant to the protein

param pose_fx:

a function to call with pose to tweak or change something before minimising.

param monster_random_seed:

a random seed for rdkit Embedding

param settings:

Not used in base version of Victor

Some arguments can be defined externally:

# These are the default settings for fragmenstein. # To overide please define $FRAGMENSTEIN_SETTINGS as a yaml file. # or pass an environment variable _prior_ to import # e.g. ff_constraint becomes $FRAGMENSTEIN_FF_CONSTRAINT. # These will have priority over the defaults. # Note there are no safeguards against typos. # For the command line interface, see fragmenstein/_cli_defaults.py

# General settings work_path: output monster_average_position: false monster_throw_on_discard: false ff_minisation: true

# During the RDKit minisation, how much lee-way to give an atom before it gets penalised. ff_max_displacement: 0.1

# During the RDKit minisation, how much to penalise an atom that is too far from its ideal position. ff_constraint: 5.

# During the RDKit minisation, how many iterations to run. ff_max_iterations: 200

# During the RDKit minisation, use the neighbourhood to constrain the molecule. ff_use_neighborhood: true ff_neighborhood: 6.0 ff_allow_lax: true ff_prevent_cis: true

# For Wictor, weird things happen if True ff_minimise_ideal: false

# OpenMM settings mm_restraint_k: 1000.0 mm_tolerance: 10.0 # mmu.kilocalorie_per_mole / (mmu.nano * mmu.meter) mm_max_iterations: 0 # 0 is infinite mm_mobile_radius: 8.0 # mmu.angstrom

classmethod add_constraint_to_warhead(name: str, constraint: str)

Add a constraint (multiline is fine) to a warhead definition. This will be added and run by Igor’s minimiser.

Parameters:
  • name

  • constraint

Returns:

None

add_extra_constraint(new_constraint: str = None)
calculate_score()
classmethod capture_logs()
classmethod capture_rdkit_log()

RDKit spits a few warning and errors. This makes them inline with the logger.

classmethod capture_rosetta_log()

Rosetta normally prints to stout. This captures the messages into journal. It technically simply passes the handlers of journal to that of the Rosetta logger. For alternatives, https://github.com/matteoferla/pyrosetta_scripts/tree/main/init_helper

checkpoint()
classmethod closest_hit(pdb_filenames: List[str], target_resi: int, target_chain: str, target_atomname: str, ligand_resn='LIG') str

This classmethod helps choose which pdb based on which is closer to a given atom.

Parameters:
  • pdb_filenames

  • target_resi

  • target_chain

  • target_atomname

  • ligand_resn

Returns:

combine(long_name: str | None = None, atomnames: Dict[int, str] | None = None, warhead_harmonisation: str = 'first', joining_cutoff=5.0, extra_ligand_constraint: str = None)

Combines the hits without a template.

If the class attribute monster_throw_on_discard is True, it will raise an exception if it cannot.

The cutoff distance is controlled by class attribute monster_joining_cutoff. At present this just adds a hydrocarbon chain, no fancy checking for planarity.

The hits are collapsed, merged, expanded and bonded by proximity. In (self.monster.expand_ring(..., bonded_as_original=False) changing to True, might work, but most likely won’t.

warhead_harmonisation fixes the warhead in the hits to be homogeneous.

  • keep. Don’t do anything

  • none. strip warheads

  • first. Use first warhead

  • warhead name. Use this warhead.

Parameters:
  • long_name

  • atomnames – an optional dictionary that gets used by Params.from_smiles

  • warhead_harmonisation – keep | strip | first | chloracetimide | nitrile …

  • joining_cutoff

  • extra_ligand_constraint

Returns:

property constrained_atoms: int

Do note that the whole Origins list contains hydrogens. So do not divided by len! :return:

constraint_function_type = 'FLAT_HARMONIC'
classmethod copy_names(acceptor_mol: Mol, donor_mol: Mol)

Copy names form donor to acceptor by finding MCS. Does it properly and uses PDBResidueInfo.

Parameters:
  • acceptor_mol – needs atomnames

  • donor_mol – has atomnames

Returns:

covalent_definitions = [{'atomnames': ['CONN3', 'SG', 'CB'], 'residue': 'CYS', 'smiles': '*SC'}]
classmethod distance_hits(pdb_filenames: List[str], target_resi: int, target_chain: str, target_atomname: str, ligand_resn='LIG') List[float]

See closest hit for info.

Parameters:
  • pdb_filenames

  • target_resi

  • target_chain

  • target_atomname

  • ligand_resn

Returns:

dock() Mol

The docking is done by igor.dock(). This basically does that, extacts ligand, saves etc.

Returns:

draw_nicely(*args, **kwargs)
classmethod enable_logfile(filename='reanimation.log', level=20, captured: bool = True) None

The journal is output to a file. Running it twice can be used to change level.

Parameters:
  • filename – file to write.

  • level – logging level

  • captured – capture rdkit and pyrosetta?

Returns:

None

classmethod enable_stdout(level=20, captured: bool = True) None

The cls.journal is output to the terminal. Running it twice can be used to change level.

Parameters:
  • level – logging level

  • captured – capture rdkit and pyrosetta?

Returns:

None

error_to_catch = ()
classmethod extract_mol(name: str, filepath: str | None = None, block: str | None = None, smiles: str | None = None, ligand_resn: str = 'LIG', removeHs: bool = False, proximityBonding: bool = False, throw_on_error: bool = False) Mol

Extracts the ligand of 3-name ligand_resn from the PDB file filepath or from the PDB block block. Corrects the bond order with SMILES if given. If there is a covalent bond with another residue the bond is kept as a */R. If the SMILES provided lacks the * element, the SMILES will be converted (if a warhead is matched), making the bond order correction okay.

Parameters:
  • name (str) – name of ligand

  • filepath (str) – PDB file

  • smiles (str) – SMILES

  • ligand_resn (str) – 3letter PDB name of residue of ligand

  • removeHs (bool) – Do you trust the hydrgens in the the PDB file?

  • throw_on_error (bool) – If an error occurs in the template step, raise error.

Returns:

rdkit Chem object

Return type:

Chem.Mol

classmethod extract_mols(folder: str, smilesdex: Dict[str, str], ligand_resn: str = 'LIG', regex_name: str | None = None, proximityBonding: bool = False, throw_on_error: bool = False) Dict[str, Mol]

A key requirement for Monster is a separate mol file for the parent hits.

This is however often a pdb. This converts. igor.mol_from_pose() is similar but works on a pose. _fix_minimized() calls mol_from_pose and copy_bonds_by_atomnames which does not destroy pdbinfo. The latter is glitchy. Use combine_for_bondorder.

See extract_mol for single.

Parameters:

folder – folder with pdbs

Returns:

classmethod find_closest_to_ligand(pdb: Mol, ligand_resn: str) Tuple[Atom, Atom]

Find the closest atom to the ligand Warning requires the protein to be loaded as an rdkit.Chem.Mol

Parameters:
  • pdb – a rdkit Chem.Mol object

  • ligand_resn – 3 letter code

Returns:

tuple of non-ligand atom and ligand atom

classmethod from_files(folder: str) _VictorUtils

This creates an instance form the output files. Likely to be unstable. Assumes the checkpoints were not altered. And is basically for analysis only.

Parameters:

folder – path

Returns:

classmethod get_isomers(mol: Mol) List[Mol]

For placement operations in particular it is important to differentiate the isomers. Therefore requiring multiple victor calls.

classmethod get_isomers_smiles(smiles: str) List[str]

Same as get_isomers, but with smiles.

get_plip_interactions()

Optional, but useful to have. And highly experimental! Get the interactions from PLIP.

classmethod get_warhead_definition(warhead_name: str)
classmethod guess_warhead(smiles: str) Tuple[str, str]

Going backwards by guessing what the warhead is. Normally there’d be better data handling so no guessing

harmonize_warheads(hits, warhead_harmonisation, covalent_form=True)

Harmonises and marks the atoms with _Warhead Prop.

Parameters:
  • hits

  • warhead_harmonisation

  • covalent_form

Returns:

classmethod inventorize_warheads(hits: List[Mol], covalent_form: bool = True) List[str]

Get the warhead types of the list of hits

Parameters:
  • hits

  • covalent_form – Are the hits already covalent (with *)

Returns:

list of non-covalent, chloroacetimide, etc.

journal = <Logger Fragmenstein (DEBUG)>
classmethod make_all_warhead_combinations(smiles: str, warhead_name: str, canonical=True) dict | None

Convert a unreacted warhead to a reacted one in the SMILES

Parameters:
  • smiles – unreacted SMILES

  • warhead_name – name in the definitions

  • canonical – the SMILES canonical? (makes sense…)

Returns:

dictionary of SMILES

make_coordinate_constraints_for_combination()

See also cls.make_coordinate_constraints_for_placement. This operates based on atom.HasProp('_Novel'), not origins! :return:

make_coordinate_constraints_for_placement(mol: Mol | None = None, origins: List[List[str]] | None = None, std: List[float] | None = None, mx: List[float] | None = None) str

See also make_coordinate_constraints_for_combination in combine. This is the normal function and uses the origin data, while the other constrains based on lack of novel attribute.

Parameters:
  • mol – self.monster.positioned_mol if ommitted

  • origins – self.monster.origin_from_mol(self.monster.positioned_mol) if ommitted list of list of names of hit atoms used as original position

  • std – self.monster.stdev_from_mol(mol)(self.monster.positioned_mol) if omitted list of standard devs

  • mx – elf.monster.max_from_mol(self.monster.positioned_mol) if omitted list of maximum euclidean distance

Returns:

classmethod make_covalent(smiles: str, warhead_name: str | None = None) str | None

Convert a unreacted warhead to a reacted one in the SMILES

Parameters:
  • smiles – unreacted SMILES

  • warhead_name – name in the definitions. If unspecified it will try and guess (less preferrable)

Returns:

SMILES

make_output_folder()
make_pse(filename: str = 'combo.pse', extra_mols: Mol | None = None)

Save a pse in the relevant folder. This is the Victor one.

make_steps_pse(filename: str = 'step.pse')

Saves the steps in a pse file. For a more exhaustive file, use make_pse in Monster.

migrate_sw_origins(row: Series) Dict[str, Dict[int, int]]

Given a Victor object and a SmallWorld seach result row, return the “custom_map”

monster_average_position = False
monster_mmff_minisation = True
monster_throw_on_discard = False
place(smiles: str, long_name: str = 'ligand', merging_mode='expansion', atomnames: Dict[int, str] | None = None, custom_map: Dict[str, Dict[int, int]] | None = None, extra_ligand_constraint: str = None)

Places a followup (smiles) into the protein based upon the hits. Do note that while Monster’s place accepts a mol, while place_smiles a smiles Victor’s place accepts only smiles.

Parameters:
  • smiles – smiles of followup, optionally covalent (_e.g._ *CC(=O)CCC)

  • long_name – gets used for filenames so will get corrected

  • merging_mode

  • atomnames – an optional dictionary that gets used by Params.from_smiles

  • custom_map – see Monster.place and Monster.renumber_followup_custom_map

  • extra_ligand_constraint

Returns:

pose_mod_step()

This method is intended for make inherited mods easier. :return:

possible_definitions = [{'covalent': 'S[Au]*', 'covalent_atomnames': ['SY', 'AUX', 'CONN1'], 'name': 'aurothiol', 'noncovalent': 'S[Au]P(CC)(CC)CC', 'noncovalent_atomnames': ['SY', 'AUX', 'PL', 'CL1', 'CL2', 'CL3', 'CL4', 'CL5', 'CL6']}, {'covalent': 'C(O)*', 'covalent_atomnames': ['CX', 'OX', 'CONN1'], 'name': 'aldehyde', 'noncovalent': '[C:H1]=O', 'noncovalent_atomnames': ['CX', 'OX']}]
post_igor_step()

This method is intended for make inherited mods easier. :return:

post_monster_step()

This method is intended for make inherited mods easier. :return:

post_params_step()

This method is intended for make inherited mods easier. :return:

pre_igor_step()

This method is intended for make inherited mods easier. :return:

property preminimized_mol: Mol

This cached property is the preminimised molecule. It extracts the neighbourhood (monster.get_neighborhood) and minimises the molecule (monster.mmff_minimize)

This method is called by the plonking into structure methods. Not “positioning” as intended by monster is done.

property preminimized_undummied_mol: Mol

See preminimized_mol. This strips the dummy atoms from the preminimised molecule.

quick_reanimate() float

Correct small deviations from what the forcefield likes. Generally flattens buckled rings and that is it. Reanimate is normal.

Returns:

quick_reanimation = False
reanimate() float

Calls Igor recursively until the ddG is negative or zero. igor.minimize does a good job. this is just to get everything as a normal molecule

Returns:

ddG (kcal/mol)

reanimate_n_store()
remove_other_hetatms = True
show_comparison(*args, **kwargs)
classmethod slack_me(msg: str) bool

Send message to a slack webhook

Parameters:

msg – Can be dirty and unicode-y.

Returns:

did it work?

Return type:

bool

classmethod slugify(name: str)
summarize()
to_3Dmol(print_legend: bool = False) <Mock name='py3Dmol.view' id='133176890420560'>
to_nglview(print_legend: bool = False) <Mock name='MolNGLWidget' id='133176890126032'>

` generates a NGLWidget (IPython.display.display will show it) with the compounds and the merged if present. To override the colours: The colours will be those in the Chem.Mol’s property _color if present.

Returns -> nv.NGLWidget subclass

static to_simple_smiles(mol: Mol) str
property unconstrained_heavy_atoms: int
uses_pyrosetta = True
validate(reference_mol: Mol) Dict[str, float]

Get how well the results compare. Alternative, do a docking with victor.dock() (-> Chem.Mol)

Parameters:

reference_mol – Crystal structure mol

Returns:

warhead_definitions = [{'constraint': 'AtomPair  H  143A  OZ  1B HARMONIC 2.1 0.2\n', 'covalent': 'C(=O)CC*', 'covalent_atomnames': ['CZ', 'OZ', 'CY', 'CX', 'CONN1'], 'name': 'acrylamide', 'noncovalent': 'C(=O)C=C', 'noncovalent_atomnames': ['CZ', 'OZ', 'CY', 'CX']}, {'constraint': 'AtomPair  H  145A  OY  1B HARMONIC 2.1 0.2\n', 'covalent': 'C(=O)C*', 'covalent_atomnames': ['CY', 'OY', 'CX', 'CONN1'], 'name': 'chloroacetamide', 'noncovalent': 'C(=O)C[Cl]', 'noncovalent_atomnames': ['CY', 'OY', 'CX', 'CLX']}, {'constraint': 'AtomPair  H  145A  NX  1B HARMONIC 2.1 0.2\n', 'covalent': 'C(=N)*', 'covalent_atomnames': ['CX', 'NX', 'CONN1'], 'name': 'nitrile', 'noncovalent': 'C(#N)', 'noncovalent_atomnames': ['CX', 'NX']}, {'constraint': 'AtomPair  H  143A  OZ1 1B HARMONIC 2.1 0.2\n', 'covalent': 'S(=O)(=O)CC*', 'covalent_atomnames': ['SZ', 'OZ1', 'OZ2', 'CY', 'CX', 'CONN1'], 'name': 'vinylsulfonamide', 'noncovalent': 'S(=O)(=O)C=C', 'noncovalent_atomnames': ['SZ', 'OZ1', 'OZ2', 'CY', 'CX']}, {'constraint': 'AtomPair  H  145A  CY  1B HARMONIC 2.1 0.2\n', 'covalent': 'C(=C)*', 'covalent_atomnames': ['CX', 'CY', 'CONN1'], 'name': 'bromoalkyne', 'noncovalent': 'C#C[Br]', 'noncovalent_atomnames': ['CX', 'CY', 'BRX']}]
work_path = 'output'
class fragmenstein.victor.MinimalPDBParser(block: str, remove_water=False, remove_other_hetatms=False, ligname='LIG')[source]

Bases: object

This purpose build PDB parser simply fixes the serial numbers. The reason is that writing a custom 50 line class is easier that having biopython or other non-builtin requirement as a requirement Importing the PDB into RDKit is inadvisable.

__init__(block: str, remove_water=False, remove_other_hetatms=False, ligname='LIG')[source]
append(other: MinimalPDBParser)[source]

Add a second parser data to it. But only its coordinates and connections.

get_atomname(entry: str) str[source]
get_chain(entry: str) str[source]
get_max_serial() int[source]
get_residue_index(entry: str) int[source]
get_residue_name(entry: str) int[source]
get_serial(entry: str) int[source]
has_residue_index(index: int, chain: str)[source]
has_residue_name(name: str)[source]

residue name, resn 3-letters

offset_connections(offset: int) None[source]
offset_serials(offset: int) None[source]
parse(block: str) None[source]
set_serial(entry: str, value: int) None[source]