Molecule properties
This file is for discussion of standards
SDFile
As of February 2021, Monster stores information in the Properties fields of the molecule. These are not saved in MOL files, but are in SDF files.
import tempfile
def _MolToSDBlock(mol:Chem.Mol):
with tempfile.NamedTemporaryFile(suffix='.sdf') as fh:
w = Chem.SDWriter(fh.name)
return w.GetText(mol)
Chem.MolToSDBlock = _MolToSDBlock
mol = Chem.MolFromSmiles('*CCC')
mol.SetProp('_Name','Something')
mol.SetProp('_Hide','Nothing')
mol.SetProp('Show','Everything')
print('Mol file has no properties')
print(Chem.MolToMolBlock(mol))
print('SD file has non-private properties')
print(Chem.MolToSDBlock(mol))
Binary
A pickled Chem.Mol does not require sanitation and a Chem.Mol object can be piped to a subprocess.
However, an issue is that pickled Chem.Mol objects lack properties,
hence the presence of PropertyMol workaround (link).
However, the properties assigned to a Mol before conversion to a PropertyMol are lost.
So a better solution is to use the ToBinary method of a mol instance.
bstr = mol.ToBinary(propertyFlags=0b00010111)
with open('out.b', 'b') as fh:
fh.write(bstr)
The binary string can be passed to a Chem.Mol:
Chem.Mol(bstr)
I could not find any documentation for the exact definitions of property flags argument, but empirical I have figure out:
0b00000001 -> mol
0b00000010 -> atom
0b00000100 -> bond
0b00010001 -> inc. private
I am guessing that it’s not a full octet of bits, but simply 5 bits. 0b1000 is probably include calculated.
I’d say this is probably the best way to store them, albeit non-standard.
Here is proof of what I am saying:
from multiprocessing import Pool, cpu_count
from rdkit import Chem
from typing import *
# Make data
mols : List[Chem.Mol] = []
for smiles in ('CC','CCCC','COC'):
mol = Chem.MolFromSmiles(smiles)
mol.SetProp('_Name', 'Foo')
mols.append(mol)
# define functions
def fun(mol: Chem.Mol) -> bool:
return mol.HasProp('_Name')
def fun2(binary: bytes) -> bool:
mol = Chem.Mol(binary)
return mol.HasProp('_Name')
binarize : Callable[[Chem.Mol], bytes] = lambda mol: mol.ToBinary(propertyFlags=0b00010111)
# define pool
pool = Pool(2, maxtasksperchild=1)
# test
print('regular', pool.map(fun, mols ) )
print('binary', pool.map(fun2, map(binarize, mols) ) )
This will show that a passed Chem.Mol has no properties, but a binary string workaround will.
Custom
Alternative, one could have a custom extractor to save as JSON.
def get_properties(mol: Chem.Mol) -> Dict[str, Union[dict, List[dict]]]:
data = dict(mol= mol.GetPropsAsDict(includePrivate=True, includeComputed=False),
atoms=[atom.GetPropsAsDict(includePrivate=True, includeComputed=False) for atom in mol.GetAtoms()],
bonds=[bond.GetPropsAsDict(includePrivate=True, includeComputed=False) for bond in mol.GetBonds()])
return data
def set_properties(mol: Chem.Mol, data: Dict[str, dict]):
def _assign(obj, k, v):
if isinstance(v, str):
obj.SetProp(k, v)
elif isinstance(v, int):
obj.SetIntProp(k, v)
elif isinstance(v, bool):
obj.SetBoolProp(k, v)
elif isinstance(v, float):
obj.SetDoubleProp(k, v)
elif v is None:
pass
else:
obj.SetProp(k, str(v))
for k,v in data['mol'].items():
_assign(mol, k, v)
for atom, atomdata in zip(mol.GetAtoms(), data['atoms']):
for k,v in atomdata.items():
_assign(atom, k, v)
for bond, bonddata in zip(mol.GetBonds(), data['bonds']):
for k, v in bonddata.items():
_assign(bond, k, v)