Tag Archives: Malaria

Implementing Multitask Models to Improve Testset Performance


Multitask machine learning algorithms train and predict on more than one output. These models have been found to higher prediction performance compared to Single Task models, especially in domains where data is limited. This competition features a small dataset so the utilisation of all available relevant data is crucial to produce a useful model for the unseen validation chemicals. Previous data analysis has found the included ChemBL EC50 data to be non-linearly correlated with the OSM EC50 data, so it is hypothesised that non-linear multitask modelling methodologies will featuer higher performance than singletask models. 


This experiment aims to implement multitask models using the OSM and ChemBL EC50 data in the provided competition dataset and compare their testset prediction performance to single task models.


The ChemBL EC50 data was extracted from a previous analysis as the Mean_AltEC50 and appended to the training dataset. The numerical Mean_AltEC50 values were stored in an adjacent column to the OSM EC50 values. 


Multitask variants of the Progressive Neural Network (DT-PGN), Deep Neural Network (DT-DNN), and Graph Convolution (DT-GraphConv) machine learning algorithms modelled both tasks in the training dataset, while a Progressive Neural Network only modelling OSM EC50 was chosen as the representative single task model (ST-PGN). All machine learning algorithms modelled 1024 ECFP fingerprints to their respective endpoints, while the DT-Graph Convolution modelled graphical featurizations of each molecule to their classes. A 80/10/10 training/test/validation split of the dataset was used to train and evaluate each model. All model hypermeters were optimised for the best held out validation prediction performance, which consist of the 37 molecules in the combined OSM Testset.

Hyperparameter ST-PGN  DT-PGN DT-DNN


2 2 2
Layer dimensions 1000, 500 1500, 1500 1500, 1500
Dropouts per layer 0.15, 0.1 0.1, 0.1 0.1, 0.1
Number of epochs 100 100 100
Optimizer Adam Adam Adam
Batch size


32 32
Penalty 0.0001 0.001 0.0001
Learning rate 0.001 0.001 0.001

DT-GraphConv architecture/hyperparameters:

  • Total Layers: 10
  • Layer Configuration: 2x(Convolutional, Normalization, Pooling)
  • Number of epochs: 100
  • Optimizer: Adam
  • Batch size: 128
  • Learning rate: 0.001 


The multitask DT-PGN and DT-DNN models featured higher  external testset performance than the singletask PGN model, while the multitask DT-GraphConv model featured lower external testset performance compared to the singletask PGN model. Raw predictions for each Testset molecule are in the attached spreadsheet.

There is a substantial prediction performance difference between the Internal Validation and External Testset for all models.

Model  Training (MAE) Internal Validation (MAE) External Testset BC (MAE)
ST-PGN 0.680365333 8.040726457 2.957931574
DT-PGN 0.77677925 6.45179363 2.527557844
DT-GraphConv 1.57414035 5.825520611 3.824818007
DT-DNN 1.026544839 5.363081693 2.791228748


  • Multitask models perform better than their singletask counterparts for OSM EC50 prediction.
  • Multitask Graph Convolutional models continue to underperform compared to previous findings.
  • The substantial performance difference between the Internal and External Validation datasets may indicate the molecules in the external testset are not well represented in the training dataset. Future experiments should substitute the training/test/validation splitting of the training dataset with a K-fold cross validation methodology in order to maximise the usage of chemicals in the training set.

Assessing the correlation between OSM and ChemBL EC50 values (Data Analysis 1)


The OSM competition spreadsheet contains a column labelled "Alternative EC50 from Chembl (uM)". While it is currently unclear how these values were acquired, their presence in the spreadsheet allows for a brief analysis to determine if they correlate to the desired modelling target, "Potency vs Parasite (uMol)". A correlation between these two activities could enable multitask regression modelling which could feature enhanced performance for the Test datasets.


Determine the correlation between OSM and ChemBL EC50 values within the provided competition dataset.


Since multiple ChemBL EC50 values may be present within a single cell, all 359 ChemBL EC50 values were extracted from the competition dataset and converted from text to columns in Microsoft Excel. This resulted in the formation of multiple columns containing values for each row (OSM molecule). These values were averaged in order to consolidate the multiple values to a single representative value in a new column called "Mean_AltEC50". Potency vs Parasite (uMol) EC50 values were then carefully inserted adjacent to their corresponding ChemBL EC50 data. 


The OSM and ChemBL EC50 values were graphed with a scatterplot in Microsoft Excel. Linear, logarithmic, power, and exponential trendlines were fitted to this data. The R^2 values were used as a measure of the correlation between the OSM and ChemBL EC50 values.


The logarithmic, exponential, and linear trendlines display a poor correlation between OSM and ChemBL EC50 values of less than 0.1 R^2. However, the power trendline features a better correlation with 0.186 R^2. 

Trendline Type R^2


Logarithmic 0.02357






The correlation between ChemBL and OSM EC50 values is non-linear. As such, this correlation could be utilised by multitask neural network models to potentially enhance their predictive performance compared to single task models. The performance of dual task models compared to single task models should be investigated in a follow up experiment.

Future analyses should generate some form of identification that is compatible with Excel's VLOOKUP function instead of relying on sorting the entire dataset.

In Silico Model Prediction of Testset B and C EC50 values (Single Task Modelling Part 2)


To assess the prediction performance of the Progressive Neural Network model on the held out "B" and "C" test sets.


The molecules labelled with "B" and "C" Ion regulation Test Set were combined to create a single, 37 molecule Test Dataset. An additional class was also created by transforming the associated "Potency vs Parasite (uMol)" values for these molecules by log10(EC50 + 1).


A Progressive Neural Network model was constructed using the datasets described in Part 1 and with the hyperparameters listed below. This model was used to predict 37 log10(EC50 + 1) transformed "Potency vs Parasite (uMol)" values in the test set. The log(x + 1) transformation was then reversed for all predictions to enable comparison with the true "Potency vs Parasite (uMol) values of the test set.

Progressive neural network hyperparameters:

  • Layers: 2
  • Layer dimensions: 1000
  • Number of epochs: 50
  • Dropouts per layer: 0.25
  • Optimizer: Adam
  • Batch size: 100
  • Loss: Root Mean Square Error


The calculated Root Mean Squared Error for the Progressive neural network model for assessing the combined test set was 4.1340 uMol. 

The true and predicted Potency vs Parasite (uMol) values are displayed below. 

OSM Code Ion Regulation Test Set PotencyuMol PGN_ST_predictions
OSM-S-367 A,B 8.1938 2.404632188
OSM-S-380 A,B 0.11 3.151741719
OSM-S-175 B 0.3475 7.300927026
OSM-S-201 B 4.5956 7.719285267
OSM-S-204 B 0.9018 5.808532719
OSM-S-218 B 0.1105 0.366073106
OSM-S-254 B 0.7744 1.420859794
OSM-S-272 B 0.1078 0.68316042
OSM-S-278 B 4.2154 5.616926461
OSM-S-279 B 0.314275 2.844591687
OSM-S-293 B 0.13 0.987342693
OSM-S-353 B 0.1137 1.776545003
OSM-S-366 B 0.4349 1.629969458
OSM-S-376 B 0.5767 1.354778073
OSM-S-377 B 0.01668 0.153477093
OSM-S-378 B 10 2.057914063
OSM-S-379 B 0.3292 2.85889783
OSM-S-381 B 0.02432 0.957832692
OSM-S-389 B 0.1408 2.532740452
OSM-S-390 B 0.074 1.758208853
OSM-S-363 C 10 2.540437817
OSM-S-364 C 10 0.619554596
OSM-S-368 C 2.239 1.436717336
OSM-S-369 C 0.251 0.985902954
OSM-S-370 C 1.995 3.386147645
OSM-S-371 C 0.372 4.859706705
OSM-S-372 C 10 7.033399774
OSM-S-373 C 10 14.63480484
OSM-S-374 C 10 9.961609289
OSM-S-375 C 10 1.156782548
OSM-S-382 C 10 10.05324279
OSM-S-383 C 0.135 1.212617192
OSM-S-384 C 0.928 1.202246959
OSM-S-385 C 8.586 2.308562764
OSM-S-386 C 4.801 4.868107745
OSM-S-387 C 10 1.059725732
OSM-S-388 C 10 14.21659369


Many EC50 predictions from this initial modelling effort are often in the wrong order of magnitude of the actual assay result which indicates the need to reduce the RMSE measure well below 4 uM in order to produce a predictive model. This could be achieved by further model tuning (at the risk of overfitting the Testset), multitask/transfer learning of related assay activities to make better use of the limited data, and dataset augmentation to hopefully expand the applicability domain of in silico models and enhance prediction performance for the Series 4 compounds of the Test Sets.


In Silico Model Training for predicting whole cell EC50 values against Pfal (Single Task Modelling Part 1)


To determine the capability of each neural network architecture for modelling and predicting whole cell EC50 values for the molecules within the provided OSM competition dataset.


All molecules with missing or "ND" values in the "Potency vs Parasite (uMol)" were removed. Additionally, all molecules labelled within the "B" or "C" Ion regulation Test Set" were also removed. Two missing SMILES structures were recovered from the "Vaidya" source article while the remaining three molecules featuring "Potency vs Parasite (uMol)" values with missing SMILES structures were removed. This resulted in a training dataset with 566 structures. An additional continuous class was created by transforming the "Potency vs Parasite (uMol)" values for all molecules by log10(EC50 + 1).


Three distinct artificial neural network algorithms were used to model the training dataset. These were the shallow Multitask neural network modified for single task modelling, state of the art Progressive neural networks, and the Graph convolutional neural network algorithm. 1024-bit ECFP4 fingerprints were used to featurise the SMILES structures for the Multitask and Progressive neural networks, while the SMILES structures were converted to graphs for the Graph Convolutional neural network. A 80%/10%/10% training/testing/validation split of randomly selected molecules was used to construct and assess each model. A brief hyperparameter search was conducted for each algorithm to find the best model performance. 


All modelling algorithms featured the capability for fitting the training split with high Training Pearson R2 values. Each model displayed lower performance for the unseen validation dataset. The Progressive neural network algorithm featured the best performance followed by the conventional single task neural network, while the Graph Convolutional neural network featured lower than expected performance. This discrepancy could be due to the longer run times required for the Graph Convolutional model that did not accomodate for many hyperparameter tuning iterations in comparison to the other two models. 

ModelTraining Pearson R2Validation Pearson R2
Single task TF 0.93310395181090899 0.62007326931407103
Progressive NN 0.8814511879515059 0.64175851611886581
Graph-conv 0.8764779532937661 0.57656288012980794


While each neural network architecture displays similar performance for the training and validation datasets, the Progressive neural network model features marginally higher performance than the conventional neural network model. As such, it should be used to assess the currently held out "B" and "C" datasets. The Graph-convolutional model featured the lowest performance overall which was not expected, since this architecture was found to consistently feature the best performance in previous studies. This may be attributed to the relatively small size of the dataset in this project compared to previous studies. Additionally, the validation performance of each model is substantially lower than the training performance, which indicates further model refinement is necessary. 

Pyramethamine synthesis: Status at the end of 2016



The synthesis of Daraprim by Sydney Grammar School students, with the assistance of their teachers, Dr Malcolm Binns and Dr Erin Sheridan, was completed in three steps as shown in the diagram above.

Significant investigation was undertaken to

  • elucidate the optimal conditions to produce the keto-nitrile Compound 2 in Step A
  • determine why Step D and related reactions one stepreactions would not work when Compound 2 was the starting material 
  • methylate Compound 2 to form Compound 4
The reports for the larger scale synthesise from phenylacteonitrile (Compound 1) through to Daraprim via Steps A,B and C are itemized below.  The isolation of pure samples of the phenylketonitrile (Compound 2) and the enol ethers (Compounds 3 and 4) has provided NMR spectra for compounds that are not usually isolated and characterised in commercial literature.

Discussion of Grammar syntheses

2-(4-chlorophenyl)-3-oxopentanenitrile. (Compound 2). Keto, enol and enolate forms.

At the onset of this project it was understood that the phenylketonitrile , Compound 2, might exist in both keto and enol forms. It was also suggested by Thomas McDonald that the red colour of the reaction mix, which disappeared on acidification may be due the production of the corresponding enolate (enoxide). Our investigation of the NMR spectra of Compound 2 in CDCl3 indicated that it was 100% keto-form displaying the characteristic δ4.66 shift for the benzylic proton, whilst in d6DMSO Compound 2 was 100% enol-form displaying evidence for E and Z isomerism and no signal at δ4.66.  The spectrum submitted by Thomas McDonald for the Compound 2 had been also been run in d6DMSO and displayed the characteristics of the enol-form.

We found that Compound 2 was surprizingly acidic, presumeably as a result of the benzylic carbanion being stabilized by phenyl, nitrile and carbonyl groups. Addition of excess triethylamine to a solution of Compound 2 in ethanol immediately resulted in the generation of the enolate, as evidenced by the immediate appearance of a very polar spot in the TLC.  Intermediate levels of triethylamine resulted in a smear on the TLC running between protonated and deprotonated species. The enolate was isolated as its triethylammonium salt by direct reaction between Compound 2 and triethylamine and a confirmatory NMR spectrum in d6DMSO was obtained.

A very polar by-product was formed alongside Compound 2. It has tenaciously been identified as 4-chlorophenylbenzoate until further analysis becomes available. It is surmised that it is generated by the addition of molecular oxygen to the phenylacetonitrile anion to form the peroxide anion which then undergoes further reaction. If this is the case the Sydney Grammar decision to work with larger quantities (~20g) of material would have mitigated the problem of oxygen contamination somewhat.  It is noted that the Vasiliki Theologia Chioti group were working on the synthesis of Compound 2 using much smaller (part gram) quantities of materials and were not able to indentify Compound 2 in any of their eraction fractions after purification, perhaps a result of complete reaction of their cabanion with oxygen. 

Another advantage of using larger quanties of materials in the synthesis was that the exotherm of the reaction took the temperature quickly from room temperature to 40°C or so, accelerating the desired process and reducing the solubility of oxygen in the reaction mix (stirring of the reaction mix was stopped as soon as the potassium tert butoxide had dissolved). To make the reaction more school friendly and foolproof, reaction quantities of all reagents were based on the consumption of a whole bottle of potassium tert butoxide.  This enabled the rapid addition of the potassium butoxide directly from the bottle (safe and efficient handling) and did not leave any residue in the bottle to "expire" during storage.

Finally, in the earlier reaction workups by Thomas Mcdonald and Sydney Grammar, the highly basic reaction mix was quenched with water, extracted with a solvent, acidified and extracted again. Later recognition of the acidic nature of compound two led to the more efficient immediate quenching of the reaction mix with an excess of hydrochloric acid.


2-(4-chlorophenyl)-3-methoxy-2-pentenenitrile. Compound 4

In spite of indications that Compound 4 could be synthesised from Compound 2 and trimethyl orthoformate in an acidic aprotic environment, this was found not to be the case in our hands. Neither milder alkylating conditions using acidic silica nor more aggressive conditions using concentrated sulfuric acid were found to be effective.  

The alternative reaction system, Step E, pictured above with methanol as the alkoxylating agaent and trimethyl orthoformate acting as a dehydrating agent worked to some extent but the reaction yield was never more than about 50% according to TLC analysis. TLC analysis of the reaction mix indicated that two new products were present. Isolation and susequential NMR analysis of these products suggested they were the E and Z isomers of the desired Compound 4. One of the isomers tended to crystalise out of the Compound 4 oil on standing.

As the much of the reaction solvent (methanol) was lost during the overnight reaction in the hot water bath (presumably due to competing reactions between the methanol and the triethyl orthoformate) the reaction scheme was considered inconvenient for larger scale reactions and an alternative enol ether was sought as an intermediate in the synthetic pathway.


2-(4-chlorophenyl)-3-(2-methylpropoxy)-2-pentenenitrile. Compound 3

This reaction started well with water formed in the reaction effectively removed from the reaction by condensing in the condenser of the Dean Stark apparatus and dropping into the reservoir. As the reaction proceeded, the water still condensed in the condenser but did not drop into the reservoir, rather it reincorporated back into the refluxing solvent and was returned to the reaction mix.  As a result the reaction never went to completion.  Perhaps varying the tolulene and isobutanol content would further improve the yield.

Removal of unreacted Compound 2 was effected by adding triethylamine to the products of the reaction and then stirring in some silica to remove the very polar triethylammonium salt of Compound 2.

Seperate E and Z isomers were apparent in the TLC analysis and the NMR spectra.



The use of DMSO as the reaction solvent in Step C has many advantages. The guanidine hydrochloride, sodium methoxide and resultant sodium chloride (and guanidine) were all soluble in the DMSO. Alternative syntheses in ethanol typically filtered out the sodiumchloride precipitate before proceeding. Additionally, the DMSO is supposed to accelerate the isomeration and aromatization phase of the reaction after the intial addition of the guanidine to Compound 3, a significant rate limiting step in ethanol based reactions.

After standing overnight at room temperature, much of the Daraprim had crytallised from solution even though there was still starting material present according to TLC analysis. The yield of the reaction could be improved by holding the reaction mix at an elevated temperature overnight to push the reaction towards completion.  The isolation of the Daraprim could be improved by filtering the solution containing Daraprim crystals under reduced pressure. The School chemistry laboratory did not have sufficient vacuum available to filter the majority of the highly viscous mixture. A laborious and messy workup was required.

Guanidine carbonate is not as soluble in DMSO as guanidine chloride, however may have been suitable as an alternative.  It may be that guanidine carbonate would react with Compound 3 without the need for the sodiummethoxide base, but there was insufficient time to trial this system.

Step D,a one step reaction from Compound 2 to Daraprim, was discontinued in part due to the acidic nature of Compound 2 creating previouslydiscussed unwanted side reactions when triethylamine/guanidine hydrochloride systems were investigated and in part to the relative insolubility of the guanidine carbonate system making it difficult to identify whether the relatively insoluble Daraprim had formed. It well may be that a review of the previously dismissed guanidine carbonate reaction products alongside an authentic Daraprim sample may conclude that the attempted one-step reactions were more successful than initially thought.


Summary prepared by M.R.Binns on behalf of the Sydney Grammar School team.

PfATP4 pre-modeling activity data analysis

A unique id was assigned to any compound using the IDs available in the data file with the following priority order: OSM Code, MMV, Internal OSM, PubChem, Chembl, Commercial, Other.

The PvsP pEC50, corresponding to -log10(EC50(M)) was calculated from the provided Potency vs Parasite EC50 (uM).

In the provided data file are present:

  1. 601 compounds with PvsP values, of which 569 have quantitative values and can be used for regression modeling, and 32 have a qualitative value (meaning they have an associated potency qualifier) and cannot be used in regression modeling.

  2. 455 compounds with IRA values; this activity is mainly binary discriminate active (activity = 1) and inactive (activity = 0) compounds. Nevertheless 5 molecules are tagged as slightly active (activity = 0.5).

  3. 370 compounds with both quantitative PvsP and IRA values.

Using the 370 compounds of point 3 the correlation between PvsP pEC50 and IRA was analyzed. Results are reported in the following table and figure:

Ion Regulation Activity (IRA)

PvsP pEC50 mean – 1St. Dev.

PvsP pEC50 mean

PvsP pEC50 + 1St. Dev.













Table 1 


Potency vs Parasite - Ion Regulation Activity correlation for all available compounds

Figure 1

As can be seen from the table and figure, if all the molecules are considered it seems that there is no correlation between IRA class (X-axis of figure) and PvsP pEC50 (Y-axis).

The same analysis was repeated using only the compounds of Open Source Malaria series 4 (OSM-S4) (whose origin column in original file is tagged as “OSM S4”). This is a quite smaller set as it contains 32 compounds with both quantitative PvsP and IRA values.

Ion Regulation Activity (IRA)

PvsP pEC50 mean – 1St. Dev.

PvsP pEC50 mean

PvsP pEC50 + 1St. Dev













Table 2



Potency vs Parasite - Ion Regulation Activity correlation for OSM-S4 compounds

Figure 2

As can be seen from the table and figure, if we consider only OSM-S4 compounds it seems there is a correlation between IRA class (X-axis of figure) and PvsP pEC50 (Y-axis).

The cause of the difference in the 2 trends (whether considering all the compounds or only OSM-S4 series ones) is unknown. The correlation, which we see for the OSM-4S IRA class (X-axis of figure 2) and PvsP pEC50 (Y-axis) data, may suggest that the two screens give similar results for specific molecule chemotype series. Still, we believe that both activities should be predicted independently and new molecules should preferentially be selected for synthesis on basis of positive results in both models.

We plan to develop different models based on the previously described observation.


In the framework of Open Source Malaria (OSM) project arose the necessity to develop a predictive model for PfATP4 (a sodium pump found in the membrane of the malaria parasite). A modeling competition was launched in order to promote the modeling of such target that is the putative target for the lead series.

The provided data file contains 2 activity types:

  1. Potency vs Parasite (uM): the potency of compounds in whole-cell assays expressed as EC50 (uM). From now on referred as PvsP.

  2. Ion Regulation Activity: PfATP4 target assay activity determining which compounds blocks the malaria parasite ion pumps. It binary discriminate active and inactive compounds. From now on referred as IRA.

The objective of the competition is to develop a computational model that predicts which molecules will block the malaria parasite's ion pump, PfATP4, especially focusing on OSM series 4 (OSM-S4) compounds.

Synthesis of (E)-4-(2-((2-(6-chloropyrazin-2-yl)hydrozono)methyl)-phenyl)morpholine (FS-04-01); Cyclisation Step.

Synthesis of (E)-4-(2-((2-(6-chloropyrazin-2-yl)hydrozono)methyl)-phenyl)morpholine (FS-04-01); Cyclisation Step.





FS-04-01 Scheme.jpg







0.3138 g

0.9875 mmol



0.3185 g

0.9888 mmol



50 mL




25 mL






PIDA (0.3185 g, 0.99 mmol) was added to a stirring mixture of FS-03-01 (0.3138 g, 0.99 mmol) in DCM (20 mL). Left to stir, at room temperature, over night.

The resultant mixture was separated with NaHCO3 (~25 mL) in deionised water.

The organic phase was extracted and the aqueous phase was washed with DCM (3x10 mL).

The crude yield was 96.28%




The crude product was purified via column chromatography, using a 1:9 EtOAc to petroleum ether eluent.  The polarity was increased incrementally, every 500 mL, by 5% - with the final elutions being at 35% EtOAc. From 25% EtOAc onwards, hexane was used instead of petroleum ether.

The final yield was 24.7% (0.0772 g) with four distinct compounds being eluted; FS-04-01-1, FS-04-01-2, FS-04-01-3 and FS-04-01-4. The product was assumed to be FS-04-01-2 with a yield of 10.26% (0.032 g)


NMR Analysis:


FS-04-01-1 and FS-04-01-4 appeared to contain starting material and solvent. 

FS-04-01-2 and FS-04-01-3 seemed the most promising, with FS-04-01-2 being analysed in depth via 1H NMR, 13C NMR, mass spectrometry and IR.

FS-04-01- Carbon.jpg
FS-04-01- DEPT.jpg




FS-04-01-2 MS.pdf
FS-04-01-2 MS (1).pdf
FS-04-01-2 MS (Infused Neg).pdf


The results were inconclusive (at present), though a small quantity of the intended final product was seen in the mass spec.




OSM: Synthesis of 5-chloro-3-(4-chlorophenyl)-[1,2,4]triazolo[4,3-a]pyrazine (JU 8-2), http://malaria.ourexperiment.org/triazolopyrazine_se/9986/, Date accessed: 12/11/16.

Synthesis of (E)-4-(2-((2-(6-chloropyrazin-2-yl)hydrozono)methyl)-phenyl)morpholine (FS-03-01); Condensation Step.

Synthesis of (E)-4-(2-((2-(6-chloropyrazin-2-yl)hydrozono)methyl)-phenyl)morpholine (FS-03-01); Condensation Step.





FS-03-01 scheme.jpg







0.1562 g

1.05 mmol



0.2 g

1.05 mmol



4 mL

67 mmol





2-choloro-6-hydrazinylpyrazine (0.1562 g, 1.05 mmol) was added to a stirring solution of 2-morpholinylbenzaldehyde (0.2 g, 1.05 mmol) in ethanol (4.0 mL, 67 mmol).

The mixture was left to stir, after Ca. 2 hours a TLC (1:1 pet. ether to EtOAc) of the mixture against diluted 2-morpholinylbenzaldehyde showed the reaction had reached completion.




The product was rotary evaporated to give a yield of 93.95% (0.3138g, 0.9875 mmol)


NMR analysis:






OSM: Synthesis of (E)-2-chloro-6-(2-(napthalen-2-ylmethylene)hydrazinyl)pyrazine (TY 2-1).

http://malaria.ourexperiment.org/triazolopyrazine_se/9268/Synthesis_of_E2chloro62naphthalen2ylmethylenehydrazinylpyrazine_TY_21.html , Date accessed: 10/11/16.

Scaled-up – Amination of 2-fluorobenzaldehyde, via SNAr, with DMF to produce 2-morpholinylbenzaldehyde.


Scaled-up - Amination of 2-fluorobenzaldehyde, via SNAr, with DMF to produce 2-morpholinylbenzaldehyde.




FS-02-02 Scheme.jpg







2.1 mL

20 mmol



3.46 mL

40 mmol



4.1 g

29.7 mmol



20 mL

259 mmol





To a stirring mixture of K2CO3 (4.1 g, 29.7 mmol) in DMF (20 mL), 2-fluorobenzaldehyde (2.1 mL, 20 mmol) and morpholine (2.6 mL, 30 mmol) were added. The mixture was heated to ~100 oC, condensing under air and left overnight.

A mini work-up and TLC (1:1 EtOAc to petroleum ether) was done, showing the reaction was incomplete.

To force the reaction to completion, a further 10 mmol of morpholine was added to the reaction mixture, bringing the overall reactant equivalents to 1:2 with morpholine in excess.


The reaction didn’t reach completion after 3 hours so it was left for a further week.




Diethyl ether (~70 mL) in a 1:1 solution of water in brine (15 mL/15 mL) was used to dilute and bring about separation.


The organic phase was washed once more with a solution of diethyl ether and water in brine (10 mL diethyl ether and 20 mL (1:1) water and brine).


The aqueous phase was washed with diethyl ether (~20 mL).


The combined organic phases were washed with brine (20 mL) and the extracted organic phase was evaporated in vacuo to give the crude product (3.9g).

The pure product was eluted via column chromatography with an eluent of EtOAc to petroleum ether (1:9).

125 fractions were eluted in total with the product showing, via TLC, in fractions’ 61-113.


2-morpholinylbenzaldehyde was successfully synthesised to give a yellow, crystalline, solid with a yield of 37.7% (1.44 g).


FS-02-02 product.jpg


NMR Analysis:


1H NMR (400 MHz, CDCl3) δ 10.3 (1H, s, CHO), 7.83 – 7.81 (1H, d, J 7.7 Hz, HAr), 7.69 – 7.53 (1H, m, HAr), 7.17 – 7.11 (2H, m, HAr), 3.92 – 3.89 (4H, t, CH2OCH2), 3.10 – 3.08 (4H, t, CH2NCH2).


FS-02-02 NMR Proton crude.JPG


FS-02-02 MS.pdf






X. Xia, X. Shu, K. Ji, et al., Journal of Organic Chemistry, 2010, 75(9), 2893 – 2902.