Generalised potentials
In Rosetta there are three atom types, the primary ones, used by the ref2015 scorefunction and others,
the CHARMM (MM) atom types, used nearly for reference only, and the generalised potential atom types
for the beta_genpot scorefunction.
The first atom types are protein specific, so in the generation of residue types (params files) one may encounter some problematic cases, for example a nitrogen in a tryptophan will be used as a proxy for a nitrile nitrogen. The generalised potentials are aimed at addressing these.
The topology file (params) generation module used by Fragmenstein is rdkit-to-params, which can assign generalised potential atom types.
from rdkit_to_params import Params
# acetonitrile
params = Params.from_smiles('CC#N', generic=True, name='CCN')
print(params.ATOM[2].rtype) # ``NG1`` vs. ``NtrR``
Whereas it is mostly robust, there are some corner cases where it can be problematic. For example, let’s assume we have a covalently bound ligand in a protein with a non-canonical amino acid. These will independently cause issues when read from a PDB file, which is the primary was Fragmenstein operates (primarily to allow inter-operability).
Startup:
import pyrosetta_help as ph
import pyrosetta
extra_options= ph.make_option_string(no_optH=False,
ex1=True,
ex2=True,
gen_potential=True,
score=dict(weights='beta_genpot_cart'),
#mute='all',
ignore_unrecognized_res=False, # raise error if unrecognized
load_PDB_components=False,
ignore_waters=False)
# weights can be one of ['beta_genpot', 'beta_genpot_cart', 'beta_genpot_cst', 'beta_genpot_soft']
pyrosetta.init(extra_options=extra_options)
# no idea why I get this, but it is not an issue:
# core.init.score_function_corrections: [ WARNING ] Flag -beta_nov16 is set but either -weights are also specified or this is being used with the genpot score function. Not changing input weights file!
# core.init.score_function_corrections: [ WARNING ] Flag -gen_potential is set but -weights are also specified. Not changing input weights file!
scorefxn = pyrosetta.get_fa_scorefxn()
assert scorefxn.get_name() == 'beta_genpot_cart'
NB. One can only declare a beta_genpot scorefunction if the option gen_potential is true and at first instantiation
of pyrosetta. This will not work:
pyrosetta.init()
pyrosetta.rosetta.basic.options.set_boolean_option('corrections:gen_potential', True)
pyrosetta.create_score_function('beta_genpot_cart')
A simple case of a protein and a ligand works fine, either constructed or read from file.
# ## Prep
pose = pyrosetta.pose_from_sequence('ELVISISALIVE')
ph.make_alpha_helical(pose)
codeine_smiles = 'CN1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)OC)O[C@H]3[C@H](doc_C=C4)O'
params = Params.from_smiles(codeine_smiles, generic=True, name='COD')
codeine_pose = params.to_pose()
pyrosetta.rosetta.core.pose.append_pose_to_pose(pose, codeine_pose, new_chain=True)
# ## minimise
cycles=3
relax = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn, cycles)
movemap = pyrosetta.MoveMap()
movemap.set_chi(True)
movemap.set_bb(True)
movemap.set_jump(True)
relax.set_movemap(movemap)
relax.minimize_bond_angles(True)
relax.minimize_bond_lengths(True)
relax.apply(pose)
# ## round trip
params.dump('COD.gp.params')
pose.dump_pdb('test.pdb')
qose = ph.pose_from_file('test.pdb', ['COD.gp.params'])
relax.apply(qose)
# ## view
import nglview as nv
view = nv.show_rosetta(pose)
view.add_representation('hyperball')
view.download_image('codeine.png')
view
However, things get hairy with more complex cases: non-canonical amino acids and/or covalent ligands read from file.
# ## peptide with NCAA
pyrrolysine_smiles = 'O=C(NCCCC[C@@H](doc_C(=O)*)N*)[C@@H]1/N=C\C[C@H]1C'
params = Params.from_smiles(pyrrolysine_smiles, generic=True, name='PYR')
params.BACKBONE_AA.append('LYS') # as BACKBONE_AA is a singleton it overrides
params.IO_STRING.data[0].name1 = 'O' # equivalent to "appending" a new value
params.dump(f'PYR.gp.params')
pose = params.to_polymeric_pose(sequence='ACROSSLINK') # the alanine is to avoid an N-terminal patch
ph.make_alpha_helical(pose)
# ## covalent ligand
# the product of the nucleophilic addition of thiol to maleimide is a succinimide,
# the amine reactive succinimide is N-hydroxylsuccinimide (NHS)
xmethylsuccinimide_smiles = 'C1=C(*)C(=O)N(C)C1=O'
params = Params.from_smiles(xmethylsuccinimide_smiles, generic=True, name='XMS')
params.rename_atom(' C2 ', ' CX ')
params.dump(f'XMS.gp.params')
xmethylsuccinimide_pose = params.to_pose()
# ## merge and link
pyrosetta.rosetta.core.pose.append_pose_to_pose(pose, xmethylsuccinimide_pose, new_chain=True)
pyrosetta.rosetta.protocols.simple_moves.MutateResidue(target=2, new_res='CYS:S-conjugated').apply(pose)
pose.conformation().declare_chemical_bond(seqpos1=2, atom_name1=' SG ', seqpos2=11, atom_name2=' CX ')
assert 'LINK' in ph.get_pdbstr(pose)
print(f'The ligand {pose.residue(11).type().name()} has {pose.residue(11).connect_map_size()} connections '
f'which are correctly defined: {not pose.residue(11).connect_map(1).incomplete()}\n'
f'The residue {pose.residue(2).type().name()} has {pose.residue(2).connect_map_size()} connections')
This works.
The ligand XMS has 1 connections which are correctly defined: True
The residue CYS:S-conjugated has 3 connections
The cysteine is CYS:S-conjugated as opposed to SidechainConjugation because it just does not work otherwise
even in standard potentials mode.
However, on minimisation things go south:
relax.apply(pose)
[ ERROR ] Issue getting stub for atom atomno= 26 rsd= 4 -- possibly due to degenerate/colinear atoms
which results in the RunTimeError
Cannot create normalized xyzVector from vector of length() zero. Error occurred when trying to normalize the vector between points A and B. A=[1.506000,-5.628818,-21.391746], B=[1.506000,-5.628818,-21.391746].
Residue 4 is the NCAA. So let’s make it a lysine:
pyrosetta.rosetta.protocols.simple_moves.MutateResidue(target=4, new_res='LYS').apply(pose)
relax.apply(pose)
Nope:
ERROR: Error in core::scoring::methods::RamaPreProEnergy::residue_pair_energy(): The RamaPrePro term is incompatible with cyclic dipeptides (as is most of the rest of Rosetta).
Note that if the covalent bond is not there, it minimises fine most of the time with pyrrolysine. Also, reading from file does not work:
pose = ph.pose_from_file('test2.pdb', ['PYR.gp.params', 'XMS.gp.params'])
[ WARNING ] Patch SidechainConjugation implies it can apply to residue type CYZ, but actually applying it fails.
This non–S-conjugated pose will give the error:
ERROR: No structure mods allowed during scoring!
When any of these operations are done with ref2015 scorefunction (and parameterised with generic=False) it works fine.