Tag Archives: Malaria

Ring Closing at 35˚C (HASJK-2-11)

3/28/2017

Reference: http://malaria.ourexperiment.org/triazolopyrazine_se/14086/Synthesis_of_5chloro34difluoromethoxyphenyl124triazolo43apyrazine_AEW_2056.html

Scheme

Smiles: ClC1=C(C)N=CC2=NN=C(C3=CC=C(OC(F)F)C=C3)N21

 (Scheme is not showing up in ELN. See attachment)

3/28/17-Ring Closing Take #2

The reaction is going be run with more PhI(OAc)2

Reagent

mass (g)

mols

Molecular Weight (g/mol)

Volume (mL)

Density (g/mL)

Product Dimethyl Benzaldehyde Rxn

0.2

0.00064

312.70

   

CH2Cl2

8.48

0.101

84.930

~7.0

1.325

PhI(OAc)2

0.4123

0.00128

322.10

   

Product Dimethyl PhI(OAc)2 Rxn

   

310.060

   
Reaction began at 9:15am. The solid reagent was yellow. The reagent dissolved easily into  DMC and the solution turned a translucent yellow. When PhI(OAc)2 was added there was no change to the solution.

3/30/2017

  1. Reaction was stopped at 8:30 on 3/30/17. The solution was translucent yellow and had viscosity similar to water.

  2. Added saturate aqueous solution of sodium hydrogen carbonate (10 mL)

  3. White solid formed in the aqueous layer.

  4. Drained organic layer

  5. Washed aqueous layers with 10 mL DCM twice times.

  6. Combined the organic layers, washed with 10 mL of saturated aqueous sodium hydrogen carbonate two times,

       7. Organic layer was then dried in sodium sulfate.

4/1/17

     9. Sodium sulfate was gravity filtered. The organic layer was yellow

    10. Organic layer was then rotovapped to afford a yellow oil that quickly became crystalline.

 

Open= Opened Ring

Phi = PhI(OAc)2

Co= co Spot

Closed= Closed ring product.

Silica gel plate

50/50 hexane to ethanol


Room Temperature Ring Closing (HASJK-2-10)

Reaction was run on 2/17/2017.

Reference

Scheme

Smiles: ClC1=C(C)N=CC2=NN=C(C3=CC=C(OC(F)F)C=C3)N21

 (Scheme is not showing up in ELN. See attachment)

Table of Reagents (Ring-closing)

         

Reagent

mass (g)

mols

Molecular Weight (g/mol)

Volume (mL)

Density (g/mL)

Product Dimethyl Benzaldehyde Rxn

0.2

0.00064

312.70

   

CH2Cl2

8.48

0.101

84.930

~6.4

1.325

PhI(OAc)2

0.2237

0.00069

322.10

   

Product Dimethyl PhI(OAc)2 Rxn

   

310.060

   

The product from the last step (0.2g, .64 mmol), (E)-2-chloro-6-(2-(4-(difluoromethoxy)benzylidene)hydrazineyl)-3,5- monomethylpyrazine, was combined with CH2Cl2 (6.4 ml) while stirring in a round bottom flask at room temperature.  PhI(OAc)2 (.221g, .32mmol, 0.5eqv) was then added under argon gas. Solution was left to stir overnight. At the beginning of stirring, the solution was a pale yellow color. Began stirring at 9:15am.

2/21/17- Ring Closing of the MonoMethyl work up

Stirring was turned off after 119 hours. Reaction progress was checked to assess completion. Silica gel plate. 50/50 etoac/hexane, UV active. See TLC

Silica Gel Plate

S=starting material

Phi= Phi(OAc)2

Closed=closed ring product

Co= CoSpot

Active under UV light. 



OSM modelling project negative results

INTRODUCTION

This post presents the negative results that have lead to the final design of my semisupervised project submission. While the following experiments individually appear disjointed, they were intended to be combined into a generative classification paradigm with the following experimental design:

  1. Train baseline multitask classification models for classifying Ion Regulation activity (One shot machine learning)
  2. Train a generative model to sample new Series 4 molecules (VAE)
  3. Rank the best synthetic Series 4 molecules based on approximate Ligand Lipophilicity Efficiency (aLLE). aLLE is calculated from pEC50 - cLogP. The pEC50 values would be calculated from a regression model with the assumption that pEC50 values are correlated with Ion Regulation Activity (PGN). Select the best ranking synthetic Series 4 molecules
  4. Classify the synthetic Series 4 compounds and compile these results as a separate task to support training a final classification model
  5. Train a final multitask classification model incorporating the additional synthetic data

This approach aimed to utilise all the available data to expand the applicability domain of any classification model to be more predictive of unseen compounds.

NEGATIVE RESULTS

One shot models from DeepChem were implemented using competition and semi supervised datasets to predict the Ion Regulation activity of the molecules in the test set. While around ~0.7 ROC AUC was initially archieved with these models, they do not output predictions for individual molecules which is a requirement for the competition. Since my entry into this competition was fairly late (and well after when it should have finished), there was no time to analyse the open source code to make it work. As such, development of these models were discontinued for being too opaque to analyse.

Multiclass prediction was not well supported by DeepChem at the time, so a OneVsAll and a OneVsOne hack was devised to enable multi class classification with multitask binary classification models. This hack was not deployed due to the limited amount of data available for the “Slightly active” class and the discontinuation of the One shot models that could have utilised this limited amount of data to make useful predictions.  

A Variational Autoencoder (VAE) was trained on ~18,000 SMILES structures from a combined Nature, Novartis, and OSM anti-malarial screening dataset with the aim of sampling additional Series 4 structures (triazolopyrazines) for the semi-supervised classification model. Unfortunately, this model only achieved 70% accuracy which is less than the 95% accuracy found in the literature for drug-like molecules [1]. This factor, coupled with the lack of time, meant the VAE was not used for sampling any molecules.

A Progressive Neural Network model for predicting EC50 values was trained. Since a previous analysis found EC50 values to correlate with Ion Regulation Activity, it was hypothesised this model could aid in the selection of additional Series 4 molecules sampled by the VAE. This model addressed the overfitting found in prior results by only utilising molecules annotated with Ion Regulation activity, resulting in similar, and sane, internal and external validation error metrics. While this model performed marginally better than other models with 0.64 MAE, the discontinuation of the VAE made this model redundant. 

REFERENCES

[1] Automatic chemical design using a data-driven continuous representation of molecules https://arxiv.org/abs/1610.02415

Summary Report – Chiral Resolution of MMV669844 (AEW 313)

At the start of the series four campaign, OSM inherited data for MMV669844, an enantioenriched compound. No ee data was obtained however.

David Edwards and Mark Butler from The University of Queensland performed a chiral resolution and the full report is attached here: 

Purification of AEW313-1 Report 20March17.docx


Ion Regulation Assay Classification

I developed a gradient boosting model (using xgboost) to predict actives and nonactives for the PfATP4 ion regulation assay, I sampled the data to include only those in the vicinity of OSM S4 compounds.

A network of compounds was useful to analyse this data set. I calculated a matrix of Tanimoto similarity based on ECFP4 fingerprints and used a threshold of 0.28, following the approach described in (Zahoránszky-Kőhalmi et al 2016)(1). Here is how it looks like, where nodes are colored by origin:

OSM Series 4 Competition, compounds network by origin

 

OSM Series 4 Competition, nodes colored by Ion Regulation Assay column

I selected 156 compounds in the neighbourhood of OSM Series 4 based on the network above and performed 2 rounds of 10-fold cross validation using the R package caret and grid search for parameters. The results of the best model are shown below:  

The final values used for the Gradient Boosting Model were nrounds = 50, max_depth = 1, eta = 0.3, gamma = 0, colsample_bytree = 0.6 and min_child_weight = 1.

Confusion Matrix and Statistics:

 

                      Reference
Prediction Inactive Partial Active
  Inactive       12       1      0
  Partial         0       1      0
  Active          1       2     18[/code]

Overall Statistics

               Accuracy : 0.8857         
                 95% CI : (0.7326, 0.968)
    No Information Rate : 0.5143         
    P-Value [Acc > NIR] : 3.724e-06      
                                         
                  Kappa : 0.7923         
 Mcnemar's Test P-Value : 0.2615 [/code]

  Statistics by Class:

                     Class: Inactive Class: Partial Class: Active
Sensitivity                   0.9231        0.25000        1.0000
Specificity                   0.9545        1.00000        0.8235
Pos Pred Value                0.9231        1.00000        0.8571
Neg Pred Value                0.9545        0.91176        1.0000
Prevalence                    0.3714        0.11429        0.5143
Detection Rate                0.3429        0.02857        0.5143
Detection Prevalence          0.3714        0.02857        0.6000
Balanced Accuracy             0.9388        0.62500        0.9118[/code]

 

 

[1] -  Zahoránszky-Kőhalmi, Gergely, Cristian G. Bologa, and Tudor I. Oprea. 2016.
“Impact of Similarity Threshold on the Topology of Molecular Similarity Networks and Clustering Outcomes.”
Journal of Cheminformatics 8 (1): 16. doi:10.1186/s13321-016-0127-5.

Final PfATP4 Ion Regulation Activity Assay Classification Model Submission

INTRODUCTION

A classification model for the PfATP4 Ion Regulation Assay was experimentally selected from various neural network architectures, sampling strategies, and featurisations.  

GOAL 

This project aimed to create a classification model for the PfATP4 Ion Regulation Assay that would be predictive for the Series 4 OSM compounds within the provided dataset, as well as those in the unseen validation dataset.

DATASET PREPARATION

The provided OSM Competition dataset contained 478 structures annotated with Ion Regulation Activity data after curation.

While the dataset featured three classes, consisting of Active, Slightly Active, and Inactive, only seven molecules were found for the slightly inactive class. These molecules were removed as there were too few for accurate modelling.

The remaining molecules were divided into training and testing datasets based on their Ion Regulation Testset designation. This resulted in a Training dataset with 442 molecules and Test dataset with 29 molecules.

Additional datasets were composed from screening data in the literature [1][2]. A criterion of less than 2 uM XC50 activity and at least 75% growth inhibition of either wild-type or drug resistant Plasmodium falciparum strains were used to select a 5723 molecule subset from [1] and 5693 molecules from [2]. These molecules were initially assigned a dummy class, however, subsequent modelling either predicted a putative class (Nature) or were left unlabelled.

MODELLING METHODOLOGY

A semi-supervised machine learning paradigm adapted from the machine learning algorithms implemented in the DeepChem project [3]  was used to construct QSAR models from both the labelled and unlabelled datasets. All molecules were featurised by either Graph convolutional techniques or with 1024 Bit ECFP4 descriptors. A 80/10/10 train, test, internal validation was used to split the Training dataset for model construction and internal validation before testing on the external validation dataset.

RESULTS

The following results present the performance for the Bypass Multitask Neural Network classification model with ECFP4 descriptors as ranked by ROC AUC in both internal and external validation datasets.

Classification Matrix:

Predicted Class  
Positive Negative  
16 0 Positive Actual Class
5 8 Negative  

Performance Statistics

Measure Performance
Sensitivity 1.00
Specificity 0.614
Balanced Accuracy 0.808
Precision 0.762
Correctly Classified 24
Incorrectly Classified 5
Accuracy 0.828
ROC AUC 0.784

Individual OSM compound results

OSM ID Actual IR Class Predicted IR Class Probability
OSM-S-201 0 Active 0.978
OSM-S-366 0 Inactive 0.225
OSM-S-175 1 Active 0.903
OSM-S-218 1 Active 0.943
OSM-S-272 1 Active 0.535
OSM-S-279 1 Active 0.931
OSM-S-293 1 Inactive 0.001
OSM-S-353 1 Active 0.754
OSM-S-376 1 Active 0.966
OSM-S-378 1 Active 0.972
OSM-S-379 1 Active 0.954
OSM-S-389 1 Active 0.988
OSM-S-390 1 Active 0.986
OSM-S-363 0 Inactive 0.323
OSM-S-364 0 Active 0.663
OSM-S-372 0 Inactive 0.183
OSM-S-373 0 Active 0.832
OSM-S-374 0 Active 0.895
OSM-S-375 0 Inactive 0.489
OSM-S-382 0 Inactive 0.000
OSM-S-386 0 Active 0.983
OSM-S-387 0 Inactive 0.017
OSM-S-388 0 Inactive 0.000
OSM-S-369 1 Active 0.814
OSM-S-370 1 Active 0.914
OSM-S-371 1 Active 0.957
OSM-S-383 1 Active 0.943
OSM-S-384 1 Active 0.890
OSM-S-385 1 Active 0.994

COMMENTS AND CONCLUSIONS 

This model trades off specificity for greater positive prediction power with perfect sensitivity observed for this testset. 

REFERENCES

[1] Gamo F-J, Sanz LM, Vidal J, de Cozar C, Alvarez E, Lavandera J-L, et al. (2010). Thousands of chemical starting points for antimalarial lead identification. Nature 465: 305-310.

[2] Plouffe D, Brinker A, McNamara C, Henson K, Kato N, Kuhen K, et al. (2008). In silico activity profiling reveals the mechanism of action of antimalarials discovered in a high-throughput screen. Proceedings of the National Academy of Sciences of the United States of America 105: 9059-9064.

[3] https://github.com/deepchem/deepchem


LINK TO THE FILES

https://drive.google.com/drive/folders/0BwLj2vvUHycXUEZkSUpzRjREd0U?usp=sharing 

 

 

Model exploitation: proposal for World’s First Crowd Sourced Drug Design Campaign

If after the publishing of the OSM hidden test set our predictive model for PfATP4 Ion Regulation Activity results to be useful, it can be effectively and thoroughly exploited by anybody after Molomics provides it in Lead Designer, an Android app to easily and quickly access molecule properties important in drug discovery.
Lead Designer allows to easily sketch new molecules with an easy, fully automatized touchpad drawing mechanism. For each molecule, PfATP4 Ion Regulation Activity class and its associated prediction confidence can be instantaneously calculated on the fly. In this way all the people willing to participate in the OSM project, especially medicinal and synthetic chemists, can do design hypothesis for new active compounds and easily check in Real-Time if these compounds have high chances to be active or not (according to the provided prediction model). Each user can save her or his interesting molecules on the cloud to later access them from different devices through its own account.
If the current proposal is of interest, especially to medicinal and synthetic chemists involved in the project, Lead Designer could be used for the design of new active compounds of OSM Series-4. All the molecules designed for the project through Lead Designer are automatically collected on the cloud and then provided to the OSM consortium for possible synthesis and testing. As Lead Designer can involve an arbitrary large number of participants spread around the globe, this project can result in the World's First Crowd Sourced Drug Design Campaign, which can be interesting also for publication purposes.
Please, let us know whether you would be interested in this proposal.

Final Results and Classifier Description.

A neural network meta classifier has a predictive score of AUC = 0.89 on the test molecules.

 

The Meta Classifier

  • Each predictive model based on fingerprints or another SMILE based description vector such as DRAGON brings a certain amount of predictive power to the task of assessing likely molecular activity against PfATP4.

  • What the meta classifier does is combine the predictive power of each model in an optimal way to produce a more predictive composite model.

  • It does this by taking as it's input the probability maps (the outputs) of other classifiers,

  • The two models chosen as inputs to the meta model are:

    1. A Neural Network model that uses the [DRAGON](http://www.vcclab.org/lab/edragon/) molecular descriptor to estimate molecular PfATP4 ion regulation activity directly. This model had modest predictive power of AUC=0.77.
    2. A logistic classifier that uses the Morgan fingerprints (mol radius = 5) to predict the EC50 <= 500 nMol class. This model has a predictive power of AUC=0.93 for the test molecules. 


Detailed Results.

 

AUC 0.89

Confusion Matrix

 

true/predict ACTIVE INACTIVE
ACTIVE 17 1
INACTIVE 7 10


Molecule Classification

 

ID Actual_Class Pred_Class Prob_ACTIVE
OSM-S-272 ACTIVE ACTIVE 0.5870
OSM-S-366 INACTIVE ACTIVE 0.5868
OSM-S-378 ACTIVE ACTIVE 0.5854
OSM-S-389 ACTIVE ACTIVE 0.5846
OSM-S-390 ACTIVE ACTIVE 0.5835
OSM-S-353 ACTIVE ACTIVE 0.5831
OSM-S-175 ACTIVE ACTIVE 0.5830
OSM-S-376 ACTIVE ACTIVE 0.5828
OSM-S-383 ACTIVE ACTIVE 0.5820
OSM-S-369 ACTIVE ACTIVE 0.5819
OSM-S-218 ACTIVE ACTIVE 0.5819
OSM-S-370 ACTIVE ACTIVE 0.5810
OSM-S-380 ACTIVE ACTIVE 0.5808
OSM-S-293 ACTIVE ACTIVE 0.5804
OSM-S-385 ACTIVE ACTIVE 0.5797
OSM-S-384 ACTIVE ACTIVE 0.5795
OSM-S-279 ACTIVE ACTIVE 0.5794
OSM-S-368 INACTIVE ACTIVE 0.5790
OSM-S-386 INACTIVE ACTIVE 0.5784
OSM-S-363 INACTIVE ACTIVE 0.5772
OSM-S-367 INACTIVE ACTIVE 0.5713
OSM-S-373 INACTIVE ACTIVE 0.5703
OSM-S-204 INACTIVE ACTIVE 0.5698
OSM-S-379 ACTIVE ACTIVE 0.5689
OSM-S-201 INACTIVE INACTIVE 0.4121
OSM-S-374 INACTIVE INACTIVE 0.3391
OSM-S-254 INACTIVE INACTIVE 0.2554
OSM-S-372 INACTIVE INACTIVE 0.2453
OSM-S-371 ACTIVE INACTIVE 0.1007
OSM-S-278 INACTIVE INACTIVE 0.0772
OSM-S-375 INACTIVE INACTIVE 0.0584
OSM-S-364 INACTIVE INACTIVE 0.0189
OSM-S-382 INACTIVE INACTIVE 0.0172
OSM-S-387 INACTIVE INACTIVE 0.0004
OSM-S-388 INACTIVE INACTIVE 0.0004

 


The Classification Software.

The Meta Classifier runs on Linux and Windows under Python 2.7 and 3.5 (Mac untested):

  1. Download the entire directory tree from google drive [here](https://drive.google.com/drive/folders/0B0Rfx1fjhlsaZU1MenhlYVc5TVU). You can also download the software from GitHub [here](https://github.com/kellerberrin/OSM-QSAR). However, the google drive version is already has the required directory tree.

  2. Make sure you have activated the python anaconda environment as described in "readme.md".

Then go to the directory where you copied the software and simply execute the prepared batch files:

On Windows:

text code:
c>osm_comp

On Linux:

text code:
$chmod 777 ./osm_comp $./osm_comp

You can also execute the meta model (--help for flag descriptions) directly from the command line (the clean flag is optional it removes previous results from the model directory):

text code:
python OSM_QSAR.py --classify osm --load ION_META --epoch 40 --train 0 [--clean]

You can also classify the molecules proposed by @spadavec in issue #486 (looks like some strong leads here Vito)  by changing the input data file (--data OSMData4MMP.csv):

text code:
python OSM_QSAR.py --classify osm --load ION_META --epoch 40 --train 0 --data OSMData4MMP.csv

The classification results are found in "./Work/osm/test" and "./Work/osm/train". The statistics files contain 3 classifications. The first two are the classifier results that feed into the meta classifier. 

If you want to explore further, then you could train a neural network to classify molecules for EC50 <= 500nMol potency with the Morgan (mol=5) fingerprint using the following command:

text code:
python OSM_QSAR.py --classify bin_m --train 500 --check 25 --depend EC50_500 --indep MORGAN2048_5
 
This trains the neural network for 500 epochs and checkpoints (saves) the neural network every 25 epochs. The results for each checkpoint are concatonated and will be in the directory "./Work/bin_d/test" and "./Work/bin_d/train".
 

PfATP4 Ion Regulation Activity classification model

We developed several PfATP4 Ion Regulation Activity classification models using different strategies for modeling set sampling, different machine learning methods and different descriptors. Here we report the best performing one.

Data and approach 

The total set of 455 compounds with experimental PfATP4 Ion Regulation Activity was submitted to Molomics standard chemical structure curation protocol, similar to the one described by Fourches et Al.1 A curated set of 445 different molecules was obtained.

For the model development, validation and exploitation we followed an internal protocol considering QSAR best practices as defined in literature2,3. The final curated set was split into:

  • a modeling set containing 150 compounds that was subsequently split for internal validation into multiple randomly-chosen, response-stratified training and test sets. The internal validation used a 10-folds cross validation procedure.

  • an external validation set containing 295 compounds.

The OSM competition set consists of 35 compounds obtained from the original data file provided by OSM consortium for this competition. The 35 compounds are those where the Ion Regulation Test Set column is equal to “A,B”, “B” and “C”. Predictions for these compounds were extracted from the test and external validation sets.

The molecules were described with 23 non-highly-correlated (property-based) molecular descriptors and ECFC4 structural fingerprints hashed in 1024-bytes vectors. The machine learning technique used to build the model was an ensemble (Random Forest-like) decision-tree model. The best resulting model uses 15 trees (average tree depth = 15.3; average number of nodes = 47.9).

Results

Results were analyzed considering standard assessment metrics generally used in virtual screening reported for 3 compounds sets: OSM competition, internal validation and external validation sets.

  • confusion matrix (counting correct and wrong classified molecules)

  • accuracy = (TP+TN)/N

  • sensitivity of active molecules. Sensitivity = TP/(TP+FN)

  • specificity of active molecules. Specificity = TN/(TN+FP)

  • balanced accuracy of active molecules. This is very important when the compounds activity is distributed in heavily unbalanced classes, as in the case of OSM. Balanced accuracy = (sensitivity+specificity)/2

  • precision of active molecules. Precision = TP/(TP+FP)

  • Area Under the Curve (AUC) of active molecules

Where TP, TN, FP and FN are True Positives, True Negatives, False Positives and False Negatives, respectively. Active molecules are those with Ion Regulation Activity class = 1.

 

 

OSM competition compounds general results

Confusion matrix:

 

Predicted class

Experimental class

Inactive (0)

Active (1)

Partial (0.5)

Inactive (0)

10

2

1

Active (1)

3

15

0

Partial (0.5)

1

3

0


Assessment metrics:

Assessment metrics

Value

Correct classified

25

Wrong classified

10

Accuracy

0.714

Sensitivity of actives

0.833

Specificity of actives

0.706

Balanced accuracy of actives

0.770

Precision of actives

0.75

AUC

0.810


 

OSM competition compounds individual results

Here we report the individual prediction class for each OSM competition test compound and the class prediction probability for the 3 model classes (i.e. 0, 0.5 and 1).  

Molecule_ID

Ion Regulation Activity class

Prediction (Ion Regulation Activity class)

P(class=0.0)

P(class=1.0)

P(class=0.5)

OSM-S-218

1

1

0

1

0

OSM-S-378

1

1

0

1

0

OSM-S-373

0

0

0.933

0.067

0

OSM-S-372

0

0

0.867

0.133

0

OSM-S-390

1

1

0.067

0.867

0.067

OSM-S-370

1

1

0.2

0.8

0

OSM-S-254

0.5

0

0.733

0.2

0.067

OSM-S-385

1

1

0.267

0.733

0

OSM-S-375

0

0

0.667

0.267

0.067

OSM-S-388

0

0

0.667

0.333

0

OSM-S-382

0

0

0.667

0.333

0

OSM-S-387

0

0

0.667

0.333

0

OSM-S-278

0.5

1

0.333

0.667

0

OSM-S-389

1

1

0.267

0.6

0.133

OSM-S-374

0

1

0.4

0.6

0

OSM-S-204

0.5

1

0.333

0.6

0.067

OSM-S-279

1

1

0.267

0.533

0.2

OSM-S-383

1

1

0.4

0.533

0.067

OSM-S-371

1

0

0.533

0.4

0.067

OSM-S-201

0

0.5

0.267

0.2

0.533

OSM-S-379

1

1

0.4

0.533

0.067

OSM-S-369

1

1

0.333

0.533

0.133

OSM-S-175

1

1

0.4

0.533

0.067

OSM-S-272

1

1

0.467

0.533

0

OSM-S-380

1

0

0.533

0.467

0

OSM-S-363

0

0

0.533

0.4

0.067

OSM-S-353

1

1

0.467

0.533

0

OSM-S-376

1

1

0.133

0.533

0.333

OSM-S-364

0

0

0.533

0.4

0.067

OSM-S-384

1

1

0.467

0.533

0

OSM-S-368

0.5

1

0.4

0.533

0.067

OSM-S-386

0

0

0.533

0.467

0

OSM-S-366

0

1

0.333

0.533

0.133

OSM-S-367

0

0

0.467

0.467

0.067

OSM-S-293

1

0

0.467

0.467

0.067



Internal validation compounds general results

Confusion matrix:

 

Predicted class

Experimental class

Inactive (0)

Active (1)

Partial (0.5)

Inactive (0)

107

3

1

Active (1)

13

22

0

Partial (0.5)

1

3

0

 

Assessment metrics:

Assessment metrics

Value

Correct classified

129

Wrong classified

21

Accuracy

0.860

Sensitivity of actives

0.629

Specificity of actives

0.948

Balanced accuracy of actives

0.788

Precision of actives

0.786

AUC

0.860

 



External validation compounds general results

 Confusion matrix:

 

Predicted class

Experimental class

Inactive (0)

Active (1)

Partial (0.5)

Inactive (0)

272

3

0

Active (1)

12

8

0

Partial (0.5)

0

0

0

 

Assessment metrics:

Assessment metrics

Value

Correct classified

280

Wrong classified

15

Accuracy

0.949

Sensitivity of actives

0.400

Specificity of actives

0.989

Balanced accuracy of actives

0.695

Precision of actives

0.727

AUC

0.835

 

Model statistical significance

In order to asses the statistical significance of the model performance, we developed 100 similar models using a bootstrapped sampling of the modeling set and 100 response-permuted models where the compound response (i.e. the Ion Regulation Activity class) has been randomly permuted for all the compounds. The balanced accuracy distribution of the 100 bootstrapped models is shown in figure 3, while that of the Y-randomized model is shown in figure 4. Where the balanced accuracy is calculated for active molecules (i.e. Ion Regulation Activity = 1).

Distribution of balanced accuracy for active molecules in bootsrapped samples

Figure 3



Distribution of balanced accuracy for active molecules in response-randomized samples

Figure 4 

It can be seen from the figures (figure 3 and figure 4) that the resulting balanced accuracy distributions in the 2 experiment sets are completely non-overlapped. This suggests that the statistical significance of the model is reliable.

 

References

1 Denis Fourches, Eugene Muratov, Alexander Tropsha “Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research” J. Chem. Inf. Model. 2010, 50, 1189-1204.

 

2 Alexander Tropsha “Best Practices for QSAR Model Development, Validation, and Exploitation” Mol. Inf. 2010 Volume 29, Issue 6-7, Pages 476–488.

 

3 Lennart Eriksson, Joanna Jaworska, Andrew P Worth, Mark T D Cronin, Robert M McDowell, Paola Gramatica “Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of Classification- and Regression-Based QSARs”, Environ. Health Perspect. 2003, 111(10): 1361–1375.


Latest submission

Small tweak in the weighting of the scoring function. Increase weighting of molecules that are similar to the reference (highest affinity) ligand.

 

OSM-S363, 5.8

OSM-S364, 6.3

OSM-S365, 6.4 

OSM-S368, 5.6

OSM-S369, 5.6

OSM-S370, 6.0

OSM-S371, 6.0

OSM-S372, 5.6

OSM-S373, 5.8

OSM-S374, 5.8

OSM-S375, 6.3