fragmenstein.m_rmsd submodule

Module contents

Combined RMSD

class fragmenstein.m_rmsd.mRMSD(followup: Mol, hits: Sequence[Mol], mappings: List[List[Tuple[int, int]]])[source]

Bases: object

Mols are non-superposed (‘aligned’) for the RMSD and in Å.

The RMSD has been calculated differently. The inbuilt RMSD calculations in RDKit (Chem.rdMolAlign.GetBestRMS) align the two molecules, this does not align/superpose them. This deals with the case of multiple hits. As a comparision, For euclidean distance the square root of the sum of the differences in each coordinates is taken. As a comparision, For a regular RMSD the still-squared distance is averaged before taking the root. Here the average is done across all the atom pairs between each hit and the followup. Therefore, atoms in followup that derive in the blended molecule by multiple atom are scored multiple times.

\[\sqrt{\]

rac{sum_{i}^{N_{ m{hits}}} (sum_{i}^{n} (q_{i, m{x}} - h_{i, m{x}})^2 +

(q_{i,

m{y}} - h_{i, m{y}})^2 + (q_{i, m{z}} - h_{i, m{z}})^2 }{ncdot m}}

HIT_BASED = 0
IDENTITY_BASED = 2
XYZ_BASED = 1
__init__(followup: Mol, hits: Sequence[Mol], mappings: List[List[Tuple[int, int]]])[source]

This is not meant to be called directly. mappings is a list of len(hits) containing lists of tuples of atom idx that go from followup to hit

The hit _Name must match that in origin! currected output of monster.origin_from_mol() or cls.get_origins(to-be-scored-mol, annotated)

Parameters:
  • followup – the followup compounds

  • hits – the fragment hits

  • mappings – a complicated affair…

calculate_msd(molA, molB, mapping) float[source]

A nonroot rmsd.

Parameters:
  • molA

  • molB

  • mapping – lists of tuples of atom idx that go from molA to molB

Returns:

nonroot rmsd

calculate_rmsd(molA, molB, mapping) float[source]
classmethod copy_all_possible_origins(annotated: Mol, target: Mol) Tuple[List[Mol], List[List[int]]][source]

Monster leaves a note of what it did. atom prop _Origin is a json of a list of mol _Name dot AtomIdx. However, the atom order seems to be maintained but I dont trust it. Also dummy atoms are stripped.

Parameters:
  • annotated

  • target

Returns:

a list of mols and a list of orgins (a list too)

classmethod copy_origins(annotated: Mol, target: Mol)[source]

> This is no longer used by from_other_annotated_mols.

Monster leaves a note of what it did. atom prop _Origin is a json of a list of mol _Name dot AtomIdx. However, the atom order seems to be maintained but I dont trust it. Also dummy atoms are stripped.

Parameters:
  • annotated

  • target

Returns:

a list of origins

classmethod from_annotated_mols(annotated_followup: Mol, hits: Sequence[Mol] | None = None) mRMSD[source]

Monster leaves a note of what it did. atom prop _Origin is a json of a list of mol _Name dot AtomIdx. This classmethod accepts a followup with has this.

Parameters:
  • annotated_followup

  • hits

Returns:

classmethod from_internal_xyz(annotated_followup)[source]

This is an alternative for when the atoms have _x, _y, _z

Parameters:

annotated_followup

Returns:

classmethod from_other_annotated_mols(followup: Mol, hits: Sequence[Mol], annotated: Mol) mRMSD[source]

The two molecules are the same (atom names included) but the followup lacks annotations.

classmethod from_unannotated_mols(moved_followup: Mol, hits: Sequence[Mol], placed_followup: Mol) mRMSD[source]

Mapping is done by positional overlap between placed_followup and hits This mapping is the applied to moved_followup. The mapping is not between placed and moved. But the former acts as a go between.

Parameters:
  • moved_followup – The mol to be scored

  • hits – the hits to score against

  • placed_followup – the mol to determine how to score

Returns:

classmethod from_unannotated_same_mols(moved_followup: Mol, placed_followup: Mol) mRMSD[source]

Mapping is not done by positional overlap between placed_followup and moved_followup But by shape overlap that yields the lowest RMSD This means that every atom matches reguardless of hits.

Parameters:
  • moved_followup – The mol to be scored

  • placed_followup – the mol to determine how to score

Returns:

static generate_overlap_mapping(mol_a: Mol, mol_b: Mol) List[Tuple[int, int]][source]
classmethod is_origin_annotated(mol: Mol) bool[source]

This is atom not mol.

classmethod is_xyz_annotated(mol: Mol) bool[source]
classmethod migrate_origin(mol: Mol, tag='_Origin') Mol[source]

The origin list may be saved as a molecule property rather than an atom -saved as a mol say.

Parameters:
  • mol – mol to fix

  • tag – name of prop

Returns:

the same mol

classmethod mock()[source]
classmethod overannotate(mol: Mol, hits: List[Mol], priority=('mol_origin', 'atom_origin', 'xyz'))[source]

Unfortunately, in an attempt to make users happy, I have to make the code more complicated. There are three different annotation systems going on. The first is the property ‘_Origin’ in the Chem.Mol. The second is the property ‘_Origin’ in the Chem.Atom. The third is the properties ‘_x’, ‘_y’, ‘_z’ in the Chem.Atom.

This method will take the first priority that is available and propagate it to the other two.

Parameters:
  • mol – the annotated molecule

  • hits – the hits that are used to annotate the molecule

  • priority – the priority of the annotation, a sequence of up to four strings: ‘mol_origin’, ‘atom_origin’, ‘xyz’, ‘surrender’