Tag Archives: Malaria

Parameter Sweep for ECFP Depth and #bits

Next, a sweep of parameters on the ECFP was performed for both the LR and SVM method, considering the following: 

 

EC Depth = [4,5,6]

EC #Bits = [1024, 2048, 4096]

 

Again, 125 train/test splits were performed, and the distribution of MCC values were calculated. 

 

https://imgur.com/p5UAUFx

 

Here, the means of the MCC values for each model were calculated. 

 

Method_EC Depth_#Bits MeanMCCValue

svm_6_1024 0.594463

lr_6_1024 0.594463

svm_5_1024 0.642099
lr_5_1024 0.642099
svm_4_1024 0.648357
lr_4_1024 0.648357
lr_6_2048 0.653864
svm_6_2048 0.653864
lr_5_2048 0.658755
svm_5_2048 0.658755
svm_6_4096 0.658897
lr_6_4096 0.658897
svm_5_4096 0.667104
lr_5_4096 0.667104
svm_4_4096 0.667121
lr_4_4096 0.667121
lr_4_2048 0.673339
svm_4_2048 0.673339

 

The best methods were SVM and LR at depth 4, with 2048 bits. This gave an average MCC value of 0.67 +/- 0.1 for both. 

Initial Model Search

Picking Initial Methods

With the goal in mind of being able to classify potential S4 compounds, an initial search for a well suited classification method was undertaken. All compounds in the database that met the following criteria were used in the model search: 

* Compounds with SMILES strings

* Compounds with  Ion Activity, and were either a 0 or 1 

This resulted in 575 compounds being used. Next, the following models were used (with default settings) from sklearn to classify compounds as either 0 or 1 (Ion Activity Assay) as the class, and RDKit ECFP4 (2048-bit) fingerprints as the inputs : KNN, Linear SVM, Random Forest, Naive Bayes, Decision Trees, and Logisitic Regression. To determine which model was the most accurate, a train/test split (80/20) was done 125 times, and for each loop, a model was built for each method, and the matthews correlation coefficient (MCC) was calculated as an unbiased and accurate measure of model accuracy. The distributions of these MCC scores for each model were then compared. 

https://imgur.com/y8uHYSi

Treating the MCC values as distributions, the Kolmogorov-Smirnov statistic was calculated to determine the p-values of distribution similarity for each method:

Method A, Method B, p-value
mcc_knn,mcc_knn,1.0
mcc_knn,mcc_svm,0.13700610573284444
mcc_knn,mcc_rf,0.007449442574861611
mcc_knn,mcc_nb,8.296026497590731e-38
mcc_knn,mcc_dt,3.5280572995108e-21
mcc_knn,mcc_lr,0.987342261870452
mcc_svm,mcc_knn,0.13700610573284444
mcc_svm,mcc_svm,1.0
mcc_svm,mcc_rf,4.409900257709484e-05
mcc_svm,mcc_nb,8.500551823859001e-41
mcc_svm,mcc_dt,3.535605015038742e-25
mcc_svm,mcc_lr,0.18293778552780215
mcc_rf,mcc_knn,0.007449442574861611
mcc_rf,mcc_svm,4.409900257709484e-05
mcc_rf,mcc_rf,1.0
mcc_rf,mcc_nb,1.677584074335309e-31
mcc_rf,mcc_dt,1.4852492791766038e-16
mcc_rf,mcc_lr,0.03647438799031367
mcc_nb,mcc_knn,8.296026497590731e-38
mcc_nb,mcc_svm,8.500551823859001e-41
mcc_nb,mcc_rf,1.677584074335309e-31
mcc_nb,mcc_nb,1.0
mcc_nb,mcc_dt,1.096954798445088e-14
mcc_nb,mcc_lr,8.296026497590731e-38
mcc_dt,mcc_knn,3.5280572995108e-21
mcc_dt,mcc_svm,3.535605015038742e-25
mcc_dt,mcc_rf,1.4852492791766038e-16
mcc_dt,mcc_nb,1.096954798445088e-14
mcc_dt,mcc_dt,1.0
mcc_dt,mcc_lr,1.2305157079847292e-20
mcc_lr,mcc_knn,0.987342261870452
mcc_lr,mcc_svm,0.18293778552780215
mcc_lr,mcc_rf,0.03647438799031367
mcc_lr,mcc_nb,8.296026497590731e-38
mcc_lr,mcc_dt,1.2305157079847292e-20
mcc_lr,mcc_lr,1.0

The Linear SVM and Logistic Regression methods were best, with average MCC values of 0.67 +/- 0.11 and  0.64 +/- 0.11 respectively, and were statistically signficant in their difference of distribution from the rest of the methods (but not significantly different from one another).

Moving forward, we will explore Linear SVMs and LR as our base methods, and explore some light parameter searching to determine if we can improve the performance. 

 

 

Cleaning data for series 4 comp.

 

8-9-2019

Data Processing

For the purpose of supplying data for building the ML model, the data set for ION Regulation DATA was downloaded from http://tinyurl.com/OSM-Series4CompData as a .csv on Friday August 9, 2019. 

Ran the attached pyton script to keep Potency vs Parasite (uMol), Ion Regulation Activity, Ion Regulation Test Set and Smiles. All data rows contining NaNs were dropped. 

The attached ouput file contains the relevent data to be used in our model building.

Synthesis of 2-Chloro-3-(4-chlorophenyl)-[1,2,4]triazolo[4,3-a]pyrazine

2-chloro-3-(4-chlorophenyl)-[1,2,4]triazolo[4,3-a]pyrazine was synthesised from the previously synthesised 2-chloro-6-(2-hydrazinyl)(4-chlorobenzylidene)pyrazine intermediate.

Reaction Scheme

Reaction 2 scheme.png

Risk assessment

4chlorobenzaldehyde reaction 2 risk assessment.pdf

Reagents

                                                                                     mass/g      moles/mmol       equivalents

2-chloro-6-(2-hydrazinyl)(4-chlorobenzylidene)pyrazine       0.0201       0.062                   1.0

diacetoxyiodobenzene                                                            0.0310       0.096                   1.5

Proceedure

A round bottomed flask was charged with Synthesis of 2-Chloro-6-(2-hydrazinyl)(4-chlorobenzylidene)pyrazine (0.0212 g, 0.067 mmol), diacetoxyiodobenzene (0.0250 g, 0.078 mmol) and dichloromethane (10 mL) in. The resulting solution was stirred at room temperature and monitored using TLC (7: 3 ethyl acetate: light petroleum ether solvent used) until completion. The resulting product was then isolated by column chromatography, using the same solvents (Rf = 0.4) and dried via vacuum filtration yielding a white powder (0.0181g, 85.7%). The product was characterised via 1H and 13C NMR, infrared and melting point analysis.

Analytical Data

4chlorobenzaldehyde reaction2 H NMR spectrum.pdf

4chlorobenzaldehyde reaction2 IR spectrum.pdf

4chlorobenzaldehyde reaction2 Mass spectrum.pdf

4chlorobenzaldehyde reaction2 C NMR spectrum.pdf

Mpt = 261-265C

Inchi keys

2-chloro-6-(2-hydrazinyl)(4-chlorobenzylidene)pyrazine IREXIIXDQDGVMX-UHFFFAOYSA-N

diacetoxyiodobenzene  ZBIKORITPGTTGI-UHFFFAOYSA-N

2-chloro-3-(4-chlorophenyl)-[1,2,4]triazolo[4,3-a]pyrazine VCWFDFYRPOTNNX-UHFFFAOYSA-N

Synthesis of 2-Chloro-6-(2-hydrazinyl)(4-chlorobenzylidene)pyrazine.

2-chloro-6-(2-hydrazinyl)(4-chlorobenzylidene)pyrazine was synthesised from 2-chloro-6-hydrazinopyrazine and 4-chlorobenzaldehyde, as an intermediate in the synthesis of 2-chloro-3-(4-chlorophenyl)-[1,2,4]triazolo[4,3-a]pyrazine

Reaction scheme

Reaction 1 scheme.png

Risk Assessment

4chlorobenzaldehyde reaction1 risk assessment.pdf

Reagents

                                                     mass/g           moles/mmol         equivalents

2-chloro-6-hydrazinopyrazine           0.049              0.339                   1.0

4-chlorobenzaldehyde                      0.056             0.364                    1.1

Proceedure

A round bottomed flask was charged with 2-chloro-6-hydrazinopyrazine (0.0490 g, 0.339 mmol), 4-chlorobenzaldehyde (0.0560 g, 0.399 mmol) and ethanol (20 mL) in. The resulting solution was heated under reflux and monitored using TLC (3: 7 ethyl acetate: light petroleum ether solvent used) until completion (Rf = 0.35). The resulting product was then isolated and dried via vacuum filtration yielding a light yellow powder (0.0436g, 44.5%). The product was characterised via 1H and 13C NMR, infrared and melting point analysis. 

Analytical data

4chlorobenzaldehyde reaction1 C NMR spectrum.pdf

4chlorobenzaldehyde reaction1 Mass spectrum.pdf

4chlorobenzaldehyde reaction1 H NMR spectrum.pdf

Mpt = 215 - 225C

InChl Key

2-chloro-6-hydrazinopyrazine  FEDQSVIJHNBUHH-UHFFFAOYSA-N

4-chlorobenzaldehyde  AVPYQKSLYISFPO-UHFFFAOYSA-N

2-chloro-6-(2-hydrazinyl)(4-chlorobenzylidene)pyrazine  IREXIIXDQDGVMX-UHFFFAOYSA-N

Synthesis of 5-chloro-3-(2-hydroxyyphenyl)-[1,2,4]triazolo[4,3-a]pyrazine

5-chloro-3-(2-hydroxyphenyl)-[1,2,4]triazolo[4,3-a]pyrazine was synthesised using previously synthesised intermediate 2-chloro-6-(2-hydrazinyl)(2-hydroxybenzylidene)pyrazine.

Reagent Mass/g Moles/mmol Equivalents
2-chloro-6-(2-hydrazinyl)(2-hydroxybenzylidene)pyrazine 0.020 0.080 1.00
diacetoxyiodobenzene 0.0290 0.090 1.13

Yield: 54.3%

Procedure

2-chloro-6-(2-hydrazinyl(2-hydroxybenzylidene)pyrazine (20 mg, 0.080 mmol, 1 eq.) and diacetoxyiodobenzene (29 mg, 0.090 mmol, 1.13 eq.) was dissolved in DCM (10 mL) and left overnight stirring under an atmosphere of nitrogen. The reaction was monitored by TLC using a variety of solvent systems (1:1 PET ether: EtOAc, 1:3 PET ether: EtOAc, 100% EtOAc, 20% MeOH in EtOAc and 1% acetic acid in EtOAc) however the spots did not move off the baseline and so the product was not suitable for purification by column chromatography. The solvent was removed in vacuoto give a dark orange powder which was analysed by mass spec and 1H NMR. Both NMR and mass spec showed that the intended product was not formed.

Analytical Data 

mp 261-264 °C.

InChI Key:

2-chloro-6-(2-hydrazinyl(2-hydroxybenzylidene)pyrazine: BTVQPPGRWZUSCA-LHHJGKSTSA-N

PIDA: ZBIKORITPGTTGI-UHFFFAOYSA-N

5-chloro-3-(2-hydroxyyphenyl)-[1,2,4]triazolo[4,3-a]pyrazine: GLKVTIUQXSULMV-UHFFFAOYSA-N

 

Synthesis of 2-chloro-6-(2-hydrazinyl)(4-carboxybenzylidene)pyrazine

2-chloro-6-(2-hydrazinyl)(2-hydroxybenzylidene)pyrazine was synthesised as an intermediate compound in the overall synthesis of 5-chloro-3-(2-hydroxyphenyl)-[1,2,4]triazolo[4,3-a]pyrazine.

Reagent Mass/g Moles/mmol Equivalents
2-chloro-6-hydrazinopyrazine  0.0523 0.360 1.00
4-carboxybenzaldehyde 0.0560 0.460 1.28

Yield: 86.1%

Procedure

2-chloro-6-hydrazinopyrazine (52.3 mg, 0.36 mmol, 1.0 eq.) and 2-hydroxybenzaldehyde (56.0 mg, 0.46 mmol, 1.28 eq.) were added to a round-bottom flask and dissolved in ethanol (10 mL). The resulting solution was heated to 80 °C under reflux for 75 minutes. The reaction was monitored using TLC with a solvent system of 7: 3 PET ether: EtOAc. Once the reaction had gone to completion, the solution was recrystallised. The minimum amount of boiling water needed to precipitate out the product was added, and the mixture was then cooled to allow the product to crash out. The crystals were filtered under vacuum to give a light-yellow powder. The product was analysed by mass spec, IR, 1H and 13C NMR.

Analytical Data 

mp 228 – 231 °C (from EtOH); RF0.44; nmax/cm-13157 (NH), 1625 (CN), 1563 (NH), 1417 (OH); dH (300 MHz, DMSO-d6): d11.52 (s, 1H, NH), 10.22 (s, 1H, OH), 8.44 (s, 1H, pyrazine C), 8.37 (s, 1H, pyrazine C), 8.03 (s, 1H, N=CH) 7.75-7.72 (d, 1H, J= 9.0 Hz, Ar), 7.25-7.20 (t, 1H, J= 9.0 Hz, Ar), 6.91-6.87 (m, 2H, Ar); dC (75 MHz, DMSO-d6): d156.4 (pyrazine C), 152.3 (pyrazine C), 145.9 (C=N), 141.2 (COH), 132.4 (CCl), 130.9, 129.0 (Ar), 126.9 (Ar), 120.5 (Ar), 119.7 (Ar), 116.4 (Ar); m/z 249.0538 (M+H+).

InChI Key:

2-chloro-6-hydrazinopydrazine: FEDQSVIJHNBUHH-UHFFFAOYSA-N

2-hydroxybenzaldehyde: SMQUZDBALVYZAC-UHFFFAOYSA-N

2-chloro-6-(2-hydrazinyl)(2-hydroxybenzlidene)pyrazine: BTVQPPGRWZUSCA-LHHJGKSTSA-N

Synthesis of 5-chloro-3-(4-carboxyphenyl)-[1,2,4]triazolo[4,3-a]pyrazine

Reagent Mass/g Moles/mmol Equivalents
diacetoxyiodobenzene  0.023 0.0714 0.9
2-chloro-6-(2-hydrazinyl(4-carboxybenzylidene))pyrazine 0.022 0.0795 1.0

Yield: 40%

Procedure

PIDA (0.023 g, 0.0714 mmol, 0.9 eq.) was added to a stirred solution of the intermediate pyrazine I1 (0.022 g, 0.0795 mmol, 1 eq.) and DCM (10 mL, 0.157 mol, 2 eq.) at rt under N. The resulting mixture was stirred at rt for 150 min. Then, the solvent was evaporated under reduced pressure to give the crude product. Purification by flash column chromatography on silica with 80:19:1 EtOAc-PET ether-Acetic acid as eluent gave P1 (0.0087 g, 40%) as a light yellow powder

Analytical Data 

mp 269 – 274 °C; RF (80:19:1 EtOAc-PET ether-Acetic acid) 0.26; 1H NMR (300 MHz, CDCl3)

Synthesis of 2-chloro-6-(2-hydrazinyl)(4-carboxybenzylidene)pyrazine

2-chloro-6-(2-hydrazinyl)(4-carboxybenzylidene)pyrazine was synthesised as an intermediate compound in the overall synthesis of 5-chloro-3-(4-carboxyphenyl)-[1,2,4]triazolo[4,3-a]pyrazine.

Reagent Mass/g Moles/mmol Equivalents
2-chloro-6-hydrazinopyrazine  0.0512 0.354 1.00
4-carboxybenzaldehyde 0.052 0.346 0.98

Yield: 70% 

Procedure

2-chloro-6-hydrazinopyrazine (0.0512 g, 0.354 mmol, 1 eq.) was added to a stirred solution of 4-carboxybenzaldehyde (0.052 g, 0.346 mmol, 0.98 eq.) in EtOH (10 mL) at rt in air. The resulting mixture was stirred and heated at reflux for 75 min. Then, the reaction mixture was allowed to cool to rt. The solution was then dried (MgSO4) and evaporated under reduced pressure to give the crude product. Purification by flash column chromatography on silica with 70:30 PET ether-EtOAc as eluent gave I1 (0.0672 g, 70%) as a yellow crystalline solid.

Analytical Data 

mp 229 – 231 °C; RF (70:30 PET ether-EtOAc) 0.28; 1H NMR (300 MHz, DMSO-d6)

Gastric Acid Stability Test on 5-chloro-3-(2,4-dimethoxyphenyl)-[1,2,4]triazolo[4,3-a]pyrazine

Gastric Acid Stability Test on 5-chloro-3-(2,4-dimethoxyphenyl)-[1,2,4]triazolo[4,3-a]pyrazine

5-chloro-3-(2,4-dimethoxyphenyl)-[1,2,4]triazolo[4,3-a]pyrazine was added to a gastric acid mimic to test its stability.

Risk Assessment:

Procedure:

Stomach acid risk assessment.pdf

A 50 mL standard solution of DCl (0.075 mL, 7 mol dm-3, 0.525 mmol) and KCl (0.1863 g, 2.50 mmol) was made up with D2O. This solution was tested using Litmus paper to ensure that is was approximately pH 2. Each pyrazine product was added to the solution (ca. 2 mL), and a 1H NMR was instantly taken.

Analytical Data: 

Run 1: 

P5 Acid Test run 1.pdf

Run 2:

P5 Acid Test run 1.pdf

Write up:

This product was partially soluble in solution. Run 1 was taken 5 minutes after adding the product to the solution and run 2 was taken 3 hours afterwards. As shown by there being no change in the 1H NMRs, the product remained undegraded.

InChi Key: 

5-chloro-3-(2,4-dimethoxyphenyl)-[1,2,4]triazolo[4,3-a]pyrazine: LPODABXRBQELBM-UHFFFAOYSA-N