# Command line interface

The strength of Fragmenstein is as a python module, but there is a command line interface.
The terminal command `fragmenstein` is the main entry point, and has the following subcommands:

* `utils`: A few utilities, such as minimising a PDB file.
* `monster`: stitching (by combining/placing) compounds regardless of protein, yielding monster molecules.
* `victor`: stitching (by combining/placing) compounds in a protein
* `laboratory`: combinatorial operations
* `pipeline`: The full pipeline, which does the whole thing.

In addition to the command line arguments there are environment variables that can be set, see below.

These in turn have further subcommands:

## Utils

Example:

```bash
fragmenstein utils minimize -t apo_protein.pdb -o minimised.pdb;
fragmenstein utils minimize -t apo_protein.pdb -o minimised.pdb -first;  # only keep first chain
fragmenstein utils minimize -t apo_protein.pdb -o minimised.pdb -ed map.ccp4 -cw 10 -c 15;  # electron density map
```

Command:

    usage: fragmenstein utils minimize [-h] -t TEMPLATE [-o OUTPUT] [-v] [-ed ELECTRON_DENSITY] [-cw CONSTRAINT_WEIGHT] [-c CYCLES] [-cf CONSTRAINT_FILE] [-first FIRST_CHAIN_ONLY]
    
    options:
      -h, --help            show this help message and exit
      -t TEMPLATE, --template TEMPLATE
                            Template PDB file
      -o OUTPUT, --output OUTPUT
                            output PDB folder
      -v, --verbose         verbose
      -ed ELECTRON_DENSITY, --electron_density ELECTRON_DENSITY
                            electron density map
      -cw CONSTRAINT_WEIGHT, --constraint_weight CONSTRAINT_WEIGHT
                            constraint weight
      -c CYCLES, --cycles CYCLES
                            number of cycles
      -cf CONSTRAINT_FILE, --constraint_file CONSTRAINT_FILE
                            constraint file
      -first FIRST_CHAIN_ONLY, --first_chain_only FIRST_CHAIN_ONLY
                            only keep first chain

## Monster combine

See [workings.md](doc_01_workings.html) for details.
As stated elsewhere, it is crucial that the parent hits need to be 3D embedded and in the same reference frame,
namely extracted from superposed structures.

Example:

```bash
fragmenstein monster combine -i mol1.mol mol2.mol >> combo.mol;
```

Usage:

    usage: fragmenstein monster combine [-h] [-v] -i HITS [HITS ...]
    
    options:
      -h, --help            show this help message and exit
      -v, --verbose         verbose
      -i HITS [HITS ...], --hits HITS [HITS ...]
                            hit mol files

## Monster place

See [workings.md](doc_01_workings.html) for details.

Example:

```bash
fragmenstein monster place -i mol1.mol mol2.mol >> combo.mol;
```

Usage:

    usage: fragmenstein monster place [-h] -s SMILES [-v] -i HITS [HITS ...] [-n NAME]
    
    options:
      -h, --help            show this help message and exit
      -s SMILES, --smiles SMILES
      -v, --verbose         verbose
      -i HITS [HITS ...], --hits HITS [HITS ...]
                            hit mol files
      -n NAME, --name NAME  output name of molecule

## Victor combine

    
See [workings.md](doc_01_workings.html) and [victor.md](extra_victor.html) for details.

As stated elsewhere, the template PDB (the receptor in docking parlance) needs to lack a ligand in the site of interest.

Example:

```bash
fragmenstein victor combine -i mol1.mol mol2.mol -t protein.pdb -o output_folder >> combo.mol;
```

Usage:

    usage: fragmenstein victor combine [-h] [-v] -i HITS [HITS ...] [-o OUTPUT] -t TEMPLATE
    
    options:
      -h, --help            show this help message and exit
      -v, --verbose         verbose
      -i HITS [HITS ...], --hits HITS [HITS ...]
                            hit mol files
      -o OUTPUT, --output OUTPUT
                            output root folder
      -t TEMPLATE, --template TEMPLATE
                            template PDB file

## Victor place

    
See [workings.md](doc_01_workings.html) and [victor.md](extra_victor.html) for details.

Example:

```bash
fragmenstein victor place -i mol1.mol mol2.mol -s 'Cn1cnc2c1c(=O)n(C)c(=O)n2C' -t protein.pdb -o output_folder >> placed.mol;
```

Usage:

    usage: fragmenstein victor place [-h] -s SMILES [-v] -i HITS [HITS ...] [-o OUTPUT] [-n NAME] -t TEMPLATE
    
    options:
      -h, --help            show this help message and exit
      -s SMILES, --smiles SMILES
      -v, --verbose         verbose
      -i HITS [HITS ...], --hits HITS [HITS ...]
                            hit mol files
      -o OUTPUT, --output OUTPUT
                            output root folder
      -n NAME, --name NAME  output name of molecule
      -t TEMPLATE, --template TEMPLATE
                            template PDB file


## Laboratory and pipeline subcommands

See [workings.md](doc_01_workings.html) and [pipeline.md](extra_pipeline.html) for details.

For laboratory and pipeline subcommands, the number of cores can be set.
When not specified, the number of core of the machine is used, this is NOT the number of available cores.
If more cores are set than available, the timeout will affect the run,
so when running on a shared node in slurm, then set `-c $SLURM_CPUS_ON_NODE`.

### Laboratory combine
Example: 

```bash
fragmenstein laboratory combine -i hits.sdf ....;
```

Usage:

    usage: fragmenstein laboratory combine [-h] [-v] [-o OUTPUT] -t TEMPLATE -i INPUT [-d OUT_TABLE] [-s SDF_OUTFILE] [-c CORES] [-p RUN_PLIP] [--victor VICTOR]
    
    options:
      -h, --help            show this help message and exit
      -v, --verbose         verbose
      -o OUTPUT, --output OUTPUT
                            output root folder
      -t TEMPLATE, --template TEMPLATE
                            template PDB file
      -i INPUT, --input INPUT
                            input sdf file
      -d OUT_TABLE, --out-table OUT_TABLE
                            table output file
      -s SDF_OUTFILE, --sdf-outfile SDF_OUTFILE
                            sdf output file
      -c CORES, --cores CORES
                            number of cores to use
      -p RUN_PLIP, --run-plip RUN_PLIP
                            Run PLIP?
      --victor VICTOR       Which victor to use: Victor, OpenVictor or Wictor

### Laboratory place

Example:

```bash
...
```

Usage:

    usage: fragmenstein laboratory place [-h] [-v] [-o OUTPUT] -t TEMPLATE -i INPUT [-d OUT_TABLE] [-s SDF_OUTFILE] [-c CORES] [-p RUN_PLIP] [--victor VICTOR] -f IN_TABLE
    
    options:
      -h, --help            show this help message and exit
      -v, --verbose         verbose
      -o OUTPUT, --output OUTPUT
                            output root folder
      -t TEMPLATE, --template TEMPLATE
                            template PDB file
      -i INPUT, --input INPUT
                            input sdf file
      -d OUT_TABLE, --out-table OUT_TABLE
                            table output file
      -s SDF_OUTFILE, --sdf-outfile SDF_OUTFILE
                            sdf output file
      -c CORES, --cores CORES
                            number of cores to use
      -p RUN_PLIP, --run-plip RUN_PLIP
                            Run PLIP?
      --victor VICTOR       Which victor to use: Victor, OpenVictor or Wictor
      -f IN_TABLE, --in-table IN_TABLE
                            CSV table input file, requires `name`, `smiles` and space-separated-`hit_names`

### Pipeline

Example:

```bash
fragmenstein pipeline \
                      --template template.min.pdb \
                      --input hits.sdf \
                      --n_cores $SLURM_CPUS_ON_NODE \
                      --suffix _frag \
                      --max_tasks 5000 \
                      --sw_databases REAL-Database-22Q1.smi.anon \
                      --combination_size 2 \
                      --workfolder /tmp/frag \
                      --timeout 600;
```

Usage:

    usage: fragmenstein pipeline [-h] -t TEMPLATE -i INPUT [-o OUTPUT] [-r RANKING] [-c CUTOFF] [-q QUICK] [-d SW_DIST] [-l SW_LENGTH] [-b SW_DATABASES [SW_DATABASES ...]] [-s SUFFIX]
                             [--workfolder WORKFOLDER] [--victor VICTOR] [-n N_CORES] [-m COMBINATION_SIZE] [-k TOP_MERGERS] [-e TIMEOUT] [-x MAX_TASKS] [-z BLACKLIST] [-j WEIGHTS] [-v]

    options:
      -h, --help            show this help message and exit
      -t TEMPLATE, --template TEMPLATE
                            Template PDB file
      -i INPUT, --input INPUT
                            Hits SDF file
      -o OUTPUT, --output OUTPUT
                            Output folder
      -r RANKING, --ranking RANKING
                            Ranking method
      -c CUTOFF, --cutoff CUTOFF
                            Joining cutoff
      -q QUICK, --quick QUICK
                            Quick reanimation
      -d SW_DIST, --sw_dist SW_DIST
                            SmallWorld distance
      -l SW_LENGTH, --sw_length SW_LENGTH
                            SmallWorld length
      -b SW_DATABASES [SW_DATABASES ...], --sw_databases SW_DATABASES [SW_DATABASES ...]
                            SmallWorld databases. Accepts multiple
      -s SUFFIX, --suffix SUFFIX
                            Suffix for output files
      --workfolder WORKFOLDER
                            Location to put the temp files
      --victor VICTOR       Which victor to use: Victor, OpenVictor or Wictor
      -n N_CORES, --n_cores N_CORES
                            Number of cores
      -m COMBINATION_SIZE, --combination_size COMBINATION_SIZE
                            Number of hits to combine in one step
      -k TOP_MERGERS, --top_mergers TOP_MERGERS
                            Max number of mergers to followup up on
      -e TIMEOUT, --timeout TIMEOUT
                            Timeout for each merger
      -x MAX_TASKS, --max_tasks MAX_TASKS
                            Max number of combinations to try in a batch
      -z BLACKLIST, --blacklist BLACKLIST
                            Blacklist file
      -j WEIGHTS, --weights WEIGHTS
                            JSON weights file
      -v, --verbose         verbose


This does the following:

* place the reference hits against themselves and gets the PLIP interactions
* combines the hits in given combination size, while skipping blacklisted named compounds.
* searches in [SmallWorld](doc_sw.docking.org) the top N mergers
* places them and
* ranks them based on a customisable multiobjective function, which takes into account the PLIP interactions
     along with number of novel atoms (increase in risk & novelty).
 
This in effect reflects the pipeline I commonly use.

![pipeline](_static/pipeline-01.png)

The values for the pipeline command are:

* `template`: The template, a polished PDB. The template must not contain a ligand in the site of interest,
                as Fragmenstein accepts other ligands (e.g. metals, cofactors etc.)
                and it is best to use a PyRosetta minimised template (i.e. one that has been through the ringer already).
* `hits`: The hits in sdf format. These need to have unique names.
* `output`: The output folder
* `suffix`: The suffix for the output files. Note that due to `max_tasks` there will be multiple sequential files for some steps.
* `quick`: Does not reattempt "reanimation" if it failed as the constraints are relaxed more and more the more deviation happens.
* `blacklist`: A file with a lines for each molecule name to not perform (say `hitA–hitZ`)
* `cutoff`: The joining cutoff in Ångström after which linkages will not be attempted (default is 5Å)
* `sw_databases`: See SmallWold or the [SmallWorld API in Python](doc_https://github.com/matteoferla/Python_SmallWorld_API)
    for what datasets are available (e.g. 'Enamine-BB-Stock-Mar2022.smi.anon').
* `sw_length`: How many analogues for each query to keep
* `sw_dist`: The distance cutoff for the SmallWorld search
* `max_tasks`: To avoid memory issues, the pipeline performs a number of tasks (controlled via `max_tasks`)
    before processing them, to disable this use `--max_tasks 0`.
* `weights`: This is a JSON file that controls the ranking

This will minimise the first chain only stripping waters and heterogens

## Environment variables

To avoid having too many arguments, some default values can be set via environment variables.

A yaml file can be set in `$FRAGMENSTEIN_SETTINGS`,

The defaults can be seen in [settings.py](fragmenstein/settings.py)

In the yaml the values are lowercase, while as environment variables they are uppercase prefixed with `FRAGMENSTEIN_`.

Confusingly, for legacy reasons, the Victor argument, `.monster_throw_on_discard` controls `Monster.throw_on_discard`,
So `$FRAGMENSTEIN_MONSTER_THROW_ON_DISCARD` will not have an effect on `fragmenstein monster` run.

## Example slurm script

```bash
#!/bin/bash

# Some note for myself on usage
# submit via:
# export EXPERIMENT='_foo'; export TEMPLATE='protein.pdb'; export HITS='hits.sdf;
# export COMBO=2; export SLACK_WEBHOOK='';
# sbatch /mtn/someshared-drive/frag.slurm.sh;

#SBATCH --job-name=fragmenstein
#SBATCH --chdir=/mtn/someshared-drive/mferla
#SBATCH --output=/mtn/someshared-drive/logs/slurm-error_%x_%j.log
#SBATCH --error=/mtn/someshared-drive/logs/slurm-error_%x_%j.log
#SBATCH --clusters=clustername-this-is-cluster-specific
#SBATCH --partition=this-is-kind-of-the-group-and-is-cluster-specific
#SBATCH --nodes=1
#SBATCH --exclusive
#SBATCH --export=NONE,SLACK_WEBHOOK,COMBO,EXPERIMENT,TEMPLATE,HITS

# -------------------------------------------------------
# Some slurm fluff for logs

export SUBMITTER_HOST=$HOST
export HOST=$( hostname )
export USER=${USER:-$(users)}
export DATA=/mtn/someshared-drive
export HOME=$DATA/$USER; # homeless
source /etc/os-release;

echo "Running $SLURM_JOB_NAME ($SLURM_JOB_ID) as $USER in $HOST which runs $PRETTY_NAME submitted from $SUBMITTER_HOST"
echo "Request had cpus=$SLURM_JOB_CPUS_PER_NODE mem=$SLURM_MEM_PER_NODE tasks=$SLURM_NTASKS jobID=$SLURM_JOB_ID partition=$SLURM_JOB_PARTITION jobName=$SLURM_JOB_NAME"
echo "Started at $SLURM_JOB_START_TIME"
echo "job_pid=$SLURM_TASK_PID job_gid=$SLURM_JOB_GID topology_addr=$SLURM_TOPOLOGY_ADDR home=$HOME cwd=$PWD"

# -------------------------------------------------------
# Some conda fluff
# assuming you have an appropriate conda installed in `$HOME/conda-slurm`
source $HOME/conda-slurm/etc/profile.d/conda.sh
conda activate base;  # or whatever

pwd;
export EXPERIMENT=${EXPERIMENT:-'fragduo'}
export COMBO=${COMBO:-2}
export TEMPLATE=${TEMPLATE:-'template.pdb'}
export HITS=${HITS:-'hits.sdf'}
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$CONDA_PREFIX;
#export N_CORES=$(cat /proc/cpuinfo | grep processor | wc -l);
export N_CORES=$SLURM_CPUS_ON_NODE;

nice -19 fragmenstein pipeline \
                      --template $TEMPLATE \
                      --input $HITS \
                      --n_cores $(($N_CORES - 1)) \
                      --suffix _$EXPERIMENT \
                      --max_tasks 5000 \
                      --sw_databases REAL-Database-22Q1.smi.anon \
                      --combination_size $COMBO \
                      --workfolder /tmp/$EXPERIMENT \
                      --timeout 600;

rm -rf /tmp/$EXPERIMENT;

curl -X POST -H 'Content-type: application/json' --data '{"text":"'$EXPERIMENT' x2 complete"}' $SLACK_WEBHOOK
# -------------------------------------------------------

echo 'complete'

exit 0

```