Fragmenstein — full merger

This documenation may be out of date.

First the inspiration hits are merged regardless of horrid torsions and bond lengths via one of three modes
Then the followup compound is mapped on (with appropriate atom changes) and novel parts added
Then the compound is minimised in the protein with constraints to the mapped atoms.
.scaffold is the combined version of the hits (rdkit.Chem.Mol object).
.chimera is the combined version of the hits, but with differing atoms made to match the followup (rdkit.Chem.Mol object).
.positioned_mol is the desired output (rdkit.Chem.Mol object)

Note, the hits have to be in the same coordinate reference system —i.e. extracted from crystal structures in bond form.

.get_positional_mapping, which works also as a class method, creates a dictionary of mol_A atom index to mol_B atom index based on distance (cutoff 2Å) and not MCS.

The code works in two broad steps, first a scaffold is made, which is the combination of the hits (by position). Then the followup is placed. It is not embedded with constraint embed, which requires the reference molecule to have a valid geometry. .scaffold and .chimera and .positioned_mol absolutely do not have this. Novel side chains are added by superposing an optimised conformer against the closest 3-4 reference atoms. Note that .initial_mol is not touched. .positioned_mol may have lost some custom properties, but the atom idices are the same.

Algorithm

The following steps are done:

.simply_merge_hits(): merges the hits, the output rdkit.Chem.Mol object added as .scaffold.
.make_chimera(): makes the atomic elements in .scaffold match those in the followup, the output rdkit.Chem.Mol object added as .chimera.
.place_from_map() followup is places like the scaffold.

Merger

The merger of the hits does not use MCS. Instead all atoms are matched uniquely based on cartesian position with a cutoff of 2 Å (changeable). The method get_positional_mapping works as a class method, so can be used for other stuff. Here are the atoms in x0692 that map to x0305 and vice versa.

With maximum common substructure, a benzene ring can be mapped 6-ways, here none of that ambiguity is present although different issues arise. Then the next step is fragment the second molecule to get the bits to add.

The first fragment when added results in:

Ditto for the second:

And ditto again for a second fragment (x1249):

Elemental changes

During the elemental change, valence is taken into account resulting in appropriate positive charge. This step is needed to avoid weird matches with the followup.

Example

hits = [Chem.MolFromMolFile(f'../Mpro/Mpro-{i}_0/Mpro-{i}_0.mol') for i in ('x0692', 'x0305', 'x1249')]
followup = Chem.MolFromSmiles('CCNc1nc(CCS)c(C#N)cc1CN1C(CCS)CN(C(C)=O)CC1')
monster = Fragmenstein(followup, hits)
#monster = Fragmenstein(followup, hits, draw=True) for verbosity in a Jupyter notebook
monster.make_pse('test.pse')

display(monster.scaffold)
display(monster.chimera) # merger of hits but with atoms made to match the to-be-aligned mol
display(monster.positioned_mol) # followup aligned

# further alignments... not correct way of tho
monster.initial_mol = new_mol
aligned = monster.place_from_map(new_mol)

Issues to be aware of

See also work in progress

Here is an example with a few issues.

Non-overlapping fragments

Non-overlapping fragments are discarded. Ideally they should be joined using the followup compound as a template if possible. In the pictured case the SMILES submitted may not have been what was intended and should not be used connect the fragments.

More than 4 templates

When there are too many templates with large spread, these aren’t merged resulting in a spiderweb scaffold. This results in a non-unique mapping.