Prediction of the thermal decomposition of organic peroxides by validated QSPR models

Organic peroxides are unstable chemicals which can easily decompose and may lead to explosion. Such a process can be characterized by physico-chemical parameters such as heat and temperature of decomposition, whose determination is crucial to manage related hazards. These thermal stability properties are also required within many regulatory frameworks related to chemicals in order to assess their hazardous properties. In this work, new quantitative structure-property relationships (QSPR) models were developed to predict accurately the thermal stability of organic peroxides from their molecular structure respecting the OECD guidelines for regulatory acceptability of QSPRs. Based on the acquisition of 38 reference experimental data using DSC (differential scanning calorimetry) apparatus in homogenous experimental conditions, multi-linear models were derived for the prediction of the decomposition heat and the onset temperature using different types of molecular descriptors. Models were tested by internal and external validation tests and their applicability domains were defined and analyzed. Being rigorously validated, they presented the best performances in terms of fitting, robustness and predictive power and the descriptors used in these models were linked to the peroxide bond whose breaking represents the main decomposition mechanism of organic peroxides.


Introduction
Organic peroxides are reactive compounds, containing the -O-O-bond [1][2], that can be formed naturally by auto-oxidation with oxygen in certain solvents, such as diethylether. They can lead to highly explosive peroxidic residues, requiring specific safety precautions, like addition of oxidation inhibitors to prevent the formation of undesirable organic peroxides [3][4][5][6]. Organic peroxides are more or less stable due to a relatively low -O-O-bond energy (20-50 kcal/mol) [7]. Since they generate instable radicals during their decomposition, organic peroxides are commonly used as catalyst and as radical polymerization initiators. The decomposition of organic peroxides can nevertheless be dangerous and can lead to serious effects [8][9][10]. To reduce the risk of incidents and of accidents, their hazards are intensively studied. In order to avoid the accidents by the customers and users, the organic peroxide producers provide information concerning the properties of commercial organic peroxides and give recommendations for safe handling and use of organic peroxides [11][12]. General documents also describe the characteristics of organic peroxides and the safety rules to apply to safely handle them at laboratory scale [3].
Concerning the regulations, organic peroxides belong to a dedicated division (Division 5.2), as described in the UN Dangerous Goods Transportation Recommendations [13], or in GHS (Globally Harmonized System of classification and labelling of chemicals) [14] with 7 different classes (types A to G) related to their hazardous potential and leading to different amounts authorized for transport.
The decomposition of organic peroxides can be triggered and accelerated by heat, mechanical shock or friction and by various contaminants [15][16]. To produce, transport and provide safely the numerous organic peroxides, the industry generally commercializes them in low concentration diluted in variable solvents.
In order to improve and homogenize the knowledge of the marketed chemicals, the European Union regulation REACH (Registration, Evaluation, Authorization and Restriction of Chemicals) [17][18] requires the evaluation of physico-chemical, toxicological and ecotoxicological properties for all chemicals produced or imported by more than one ton by year in Europe. To help industry to meet the requirements of REACH regulation as far as the chemical safety assessment is concerned, ECHA technical guidances have been published [19][20] and a general testing strategy for physico-chemical properties was proposed to consider the order of testing.
As far as organic peroxides are concerned, even if they belong to a dedicated regulated division or class, the knowledge of their explosive properties, as defined by the UN recommended tests [21], is of great importance. As entry data, their thermal stability (represented by the energy and temperature of decomposition) is a key property that is considered as a pre-selection criterion to identify substances that could undergo explosive reactions. So, it is used in the complex procedure of classification of explosives and organic peroxides [21]. Indeed, the UN regulation indicates that there is no need to perform this complex procedure when the decomposition heat (corresponding to the amount of energy released during the decomposition) is lower than 500 J/g. Measured by calorimetric analyses, notably by differential scanning calorimetry (DSC), the decomposition heat is estimated with measurement uncertainties of about 5-10% [22,23] and less than 5°C for the onset temperature [23].
For safety reasons but also for technical reasons, experimental tests can be difficult to implement for unstable substances like organic peroxides. As a consequence, the development of methods used for the prediction of data can be of great help at the research and development step and can help to accelerate and fulfil the next registration deadlines. As a simple prediction tool of reactivity hazards, the CHETAH software based on Benson's group contribution method was developed by ASTM [24]. Considering only six organic peroxides, Mohan et al. [25] demonstrated some correlations between CHETAH criteria (oxygen balance, the maximum decomposition heat, the difference between heat of combustion and decomposition heat) and explosive properties. In one recent study, Sato et al. [26] showed that there is a mutual correlation between CHETAH criteria and the explosibility of self-reactive substances except for organic peroxides and azo compounds. Nevertheless, it has to be noticed that the CHETAH software provides the maximum decomposition heat (considering that the available oxygen first oxidizes hydrogen to water and then carbon to carbon dioxide) and not the actual experimental decomposition heat.
Considering the REACH regulatory framework, Lewis et al. [27] advocated the use of powerful computer-aided ab initio techniques to generate predictions of key properties of broad classes of chemicals, without resorting to costly experimentation and potentially hazardous testing. Among these alternative methods to experimental testing, Quantitative Structure-Activity/Property Relationships (QSAR/QSPR) were clearly recommended in REACH and in technical guidances [19,20] to obtain information data. Indeed, they represent powerful tools of prediction [28] used more and more for physico-chemical applications [29][30]. Their applicability in an industrial context was also recently demonstrated by Patlewicz et al. [31]. To support the development and use of QSPRs, OECD drawn up 5 principles for the validation of QSAR/QSPR models for regulatory purpose [32]. Some recent reviews [33][34] list the existing predictive models developed for the relevant properties of chemicals in the context of REACH. In particular, Dearden et al. [34] focused on the validation of models dedicated to physico-chemical properties according to OECD principles to favor the use of predicted property values in submissions to the European Chemicals Agency (ECHA).
Considering the prediction of thermal stability, some models exist for different families of compounds like nitroaromatic compounds [35][36][37][38][39], nitramines [40][41][42], ionic liquids [43] or polymers [44]. Nevertheless, to our knowledge, only one reference exists [45] to predict the reactivity hazards of organic peroxides using the QSPR approach. In this study, a limited database of 16 organic peroxides was used to derive models with no validation set, neither definition of applicability domain. Therefore, the robustness and the predictivity of these models must be at least validated and may be improved.
Consequently, the aim of this paper was to develop new robust QSPR models respecting OECD principles dedicated to the prediction of the heat and temperature of decomposition.
To achieve this goal and considering that no large database exists in literature for organic peroxides (only 16 and 9 DSC data from the works of Lu et al. [45] and Ando et al. [22] respectively), an experimental database of 38 organic peroxides was built for these properties obtained in homogenous experimental conditions using DSC. The amount and quality of experimental data allowed to performing both internal and external validation methods to ensure the good performances of the multi-linear regression (MLR) models developed and then reach more robust and accurate models than the ones proposed by Lu et al..
Besides, the approach combined QSPR methodology with quantum chemical descriptors obtained with density functional theory (DFT) calculations. Indeed, a better chemical interpretation of the developed models can be expected using this type of descriptors as already demonstrated in previous works for nitro compounds [46][47][48]. To our knowledge, this work leads to the first completely validated QSPR models (including the definition of applicability domains) dedicated to the prediction of the thermal stability of organic peroxides.

Construction of the database
As mentioned above, the first work carried out on the prediction of thermal stability of organic peroxides [45] was based on only 16 samples. The concentration of the peroxides used ranged between 34% and 98% wt [45], with no indication of the dilution solvent or of the possible contaminants or co-products present in the samples.
In this study, to limit or even avoid any effect of the poor purity of organic peroxide samples, the origin of the organic peroxide samples was taken into account with great care. The following rules of organic peroxides selection were followed: -choice among the commercial organic peroxides, with a perfect knowledge of their composition; -choice of the highest purities and concentrations available for the organic peroxides; -elimination of organic peroxides available as mixtures.
For example, the 2-butanone peroxide also known in industries as the methyl ethyl ketone peroxide (MEKPO) was discarded since several isomers exist [49].
Using these criteria, 38 organic peroxide samples were selected (see Table 1 for the different kind of organic peroxides families and Table 2 for the list and concentrations), 31 provided by Arkema and 7 by Akzo Nobel. The concentrations were close to 97-99% (weight of organic peroxide divided by weigh of samples) exception given for some of them whose containing inert solvents (water, organic solvent or an inert inorganic). The size of the dataset was notably limited to organic peroxides available with no transportation, handling and storage issues.

Experimental dataset and methods
The experimental data were collected from calorimetric tests completed on the 38 selected organic peroxides. These tests were carried out on two different calorimeters with crosschecking, indicating that the results obtained were not equipment-dependent, and were within the precision limits of the DSC method recommended in ASTM E537-2 [23]. The calorimetric tests were performed using DSC131 from SETARAM and with DSC821e from Mettler-Toledo, with a scanning rate of 5 K/min from ambient temperature to 300°C. A few milligram sample was introduced, at ambient temperature, into a closed stainless steel crucible. The DSC vessels were previously washed, passivated, rinsed and dried. Each sample was tested three times to establish good reproducibility. The onset temperature was defined as the extrapolated onset temperature (see Figure 1) following ASTM E537-2 [23].
In order to consider concentration effects, three organic peroxide samples (one dialkyl peroxide and two peroxyesters) were diluted with an inert solvent (type A solvent according to actual regulation with boiling point close to 175°C) to obtain different concentrations from 7% up to 99% wt (see Figure 2).

Partitioning of the dataset
The considered dataset of 38 organic peroxides was divided into a training set, containing two thirds of the molecules of the dataset and a validation set constituted by the remaining molecules. This partition enabled both sets to be of sufficient size with similar distributions to allow a robust development and an external validation of models. The partitioning of the dataset was also visually inspected to ensure that the validation set covered at best the chemical diversity of the domain of applicability of the model, i.e. of the molecules in the training set. The range of the property values was also checked in order to prevent that no data was out of the general range of values in the dataset and could influence too much the correlation. For example, the decomposition heat of the 2,5-dimethyl-2,5-dihydroperoxy hexane was too high compared to the values of other peroxides. So, this compound was considered as outlier and removed for the prediction of this property (see Figure S1). For the onset temperature, the 3,3,5,7,7-pentamethyl-1,2,4-trioxepane was also considered as outlier and removed for the same reason (see Figure S2) coupled by the fact that such type of cyclic organic peroxide was not well represented in terms of chemical structures in the dataset. So, in both cases, a training set of 25 compounds was used for the development and the internal validation of models, while a validation set of 12 compounds was considered for external validation to evaluate their predictive power.

Molecular structure calculation
A preliminary conformation analysis was carried out with Scigress software [50] using the Conflex algorithm [51,52] and the MM3 force field [53]. The most stable conformer was optimized again with Gaussian09 software [54] using DFT calculations with the PBE0 [55] functional and the 6-31+G(d,p) basis set. This functional provides results close to those provided by the most-common B3LYP approach, but being parameter-free, is more physically sound [55]. Vibrational frequencies were then computed to ensure that all final stable conformations exhibited no imaginary frequency.

Molecular descriptor calculation
Each molecular structure was then characterized by a series of descriptors (constitutional, topological, geometric and quantum chemical) [56,57]. Most of them were calculated using the CodessaPro software [58] but additional descriptors were taken into account considering the specific case of organic peroxides and their reactivity. As proposed by Lu [45], the concentration of the organic peroxide was also chosen as a descriptor potentially influencing the temperature of decomposition of organic peroxides. Other descriptors were considered as explained in a previous work [35], notably the oxygen balance [21,59] which is an empirical descriptor well known to evaluate hazards related to energetic materials. Some more specific descriptors to the peroxide functional group were computed such as the number of peroxide bonds (n OO ), the OOR angle, the charge on oxygen atoms (Q OO ) or the distance between these atoms (d OO ) to better describe the decomposition process. As the cleavage of the -O-O-bond is considered as the first step in the decomposition process of organic peroxides [7,60], the dissociation energy (E disso ) of the peroxide bond was also calculated. When several peroxide functional groups were present in the molecule, the one presenting the lowest dissociation energy was considered for the calculation of all descriptors related to the -O-O-bond.
Moreover, descriptors arising from conceptual DFT [61,62], already successfully used in QSPR models for the prediction of the decomposition heat of explosives [35,36,46] or organic peroxides [45], were considered. In particular, descriptors related to the peroxide bond properties were developed and used in this work such as the local Fukui function f [61][62][63]: where + Other descriptors include hardness: where IP and EA are the ionization potential and the electron affinity and local softness:

Model building and performances
All models were developed on the training set based on multi-linear regressions (MLR) using the Best Multi Linear Regression (BMLR) [57] approach as implemented in CodessaPro program. The final model was chosen as the best compromise between correlation and number of descriptors as explained in previous works [35].
To evaluate the performances of models, a series of internal and external validations were computed. The goodness of fit was measured by the determination coefficient (R²), the mean absolute error (MAE) and the root mean square error (RMSE) between predicted and experimental values. The Q² coefficients issued from both leave-one-out (LOO) and leavemany-out (LMO) cross validations measured the robustness of the model, i.e. the dependence of the fitting of the model to any molecule(s) of the training set via the Q 2 LOO , Q 2 5CV , Q 2 10CV , Q 2 7CV coefficients (for LOO, 5-fold, 10-fold and 7-fold cross validations, respectively). Robust models are expected to present a low difference between Q² and R² coefficients. It has to be noticed that cross validation does not measure the predictivity of models because the set of molecules excluded at each step of the cross validation procedure and then used to calculate Q 2 has already been used for the building of the model [64,65]. Besides, to ensure that models did not correspond to chance correlations, a Y-scrambling test was realized on the training set. Random permutations of experimental property values were performed (1000 iterations) and new models were recalculated [66]. To evaluate the impact of randomization, average and standard deviation in R² random coefficients were calculated (R 2 YS and SD YS ). Low R 2 YS values are expected to avoid chance correlation. To go further, Rücker proposed that the difference between R² of the original model and R 2 YS should be roughly superior or equal to 2.3 SD YS to ensure a statistical significance at a 1% level and superior or equal to 3 SD YS at a 0.1% level [67].
The predictive power of models was measured on the validation set based on the same coefficients (R 2 ext , RMSE ext , MAE ext ) as those used for the fitting. Several additional external validation coefficients proposed in literature were calculated as: Q² F1 proposed by Tropsha [68] and the OECD guidance document [32], Q² F2 defined by Schuurman [69], Q² F3 by Consonni [70] and CCC by Lin [71,72].
Considering that a QSPR model only offers reliable predictions for compounds similar to those belonging to the training data set, the applicability domain [73,74] required by the 3 rd OECD point, was determined based on the descriptors included in the model. Euclidean distance method available in Ambit discovery software [75] was used with a 95% threshold, i.e. the domain was calculated to contain 95% of the molecules of the training set. Then, the predictivity inside the applicability domain was calculated based on the molecules of the validation set belonging to this domain. Corresponding coefficients for external validation were denoted R² in , RMSE in , MAE in, Q² F1in , Q² F2in , Q² F3in and CCC in .

DSC results
The reactivity data of organic peroxides are shown in Tables 2 and 3. The ranges of detected onset temperature (53-180°C) and decomposition heat (441-2622 J/g) cover most of the commercial grade of organic peroxides. One example of DSC curve is given in Figure 1. The concentration effect on the decomposition heat measured by unit mass of tested sample is linear in the range of the tested concentrations for the three tested organic peroxides ( Figure   2). That confirmed that the solvent did not contribute to the decomposition energy and that the models can be simplified from the concentration effect to only take into account the pure organic peroxides.

Use of existing models
QSPR models were developed by Lu and Mannan [45] for the heat and temperature of decomposition from a dataset of 16 organic peroxides. Even if these models presented good performances in terms of fitting (R²), they failed in robustness (Q²), in particular for the prediction of the onset temperature using MLR (Q²=0.108). Due to the low number of organic peroxides available in this study, models were not validated with an external set of molecules and their applicability domains were not defined. In a first step, 24 organic peroxides from our experimental database, different from the 16 molecules of the training set used by Lu, were computed on these MLR models (for both heat and temperature of decomposition) to perform an external validation and estimate their predictive power. The proposed model for the decomposition heat was not predictive considering R² ext =0.06, RMSE ext =250 kcal/mol and MAE ext =149 kcal/mol. The model developed for the prediction of the onset temperature was slightly better with R² ext equal to 0.67 but was still not predictive with RMSE ext =67°C and MAE ext =57°C. In the following, QSPR models were developed for these two reactivity properties from our experimental database of 38 organic peroxides allowing a complete validation.  (Table 4).

Multi-linear regressions for the decomposition heat
In a second step, a QSPR model was developed for the decomposition heat divided by the concentration of the organic peroxide (ΔH/C) in order to consider the influence of this parameter. Indeed, C may be more considered as a dilution effect on ΔH (as shown in Figure   2) rather than a linear term in the MLR model contributing in the same way as molecular descriptors. A very interesting model was obtained with 4 parameters (see Table S1): ΔH/C = 54 1 K -990 n OO + 12934 d OO + 2631 Q OO -19371 (5) Where 1 K is the order 1 Kier shape index (t-test= 12.7), n OO is the number of peroxide bonds (t-test=-14.9), d OO is the distance between the oxygen atoms of the peroxide bond (t-test=4.5) and Q OO is the average Mulliken charges of these two O atoms (t-test=7.8).
The Kier shape index of first order is a topological index that encodes the number of atoms in the molecule and the relative degree of cyclicity [57,76]. The three other descriptors are linked to the peroxide bond. The most important descriptor (considering the t-test values) is the number of peroxide bonds in the molecule, confirming the accepted mechanism of decomposition of organic peroxides starting by an homolytic cleavage of the -O-O-bond [7,60].
The predicted decomposition heat divided by the concentration using Eq. (5) as a function of experimental values was plotted in Figure 3 (see Table 2 for calculated property values). As illustrated in Table 4, the model was characterized by remarkable correlation (R 2 =0.97, RMSE=99 J/g) and robustness (Q 2 LOO =Q 2 10CV =Q 2 7CV =0.94 and Q 2 5CV =0.95). It did not result from chance correlation, since the models issuing from Y-randomization presented low correlations (R 2 YS =0.17, SD YS = 0.10) as shown in Figure 4. Besides, the criterion of Rücker [68] was respected: R 2 −R 2 YS (0.80) was higher than 2.3 SD YS (0.23). Finally, the predictivity for the 12 molecules of the validation set was very good (R² ext =0.81, RMSE ext =301 J/g, Q² F1 =Q² F2 =0.74, Q² F3 =0.77 and CCC=0.82). For this model, no molecule of the validation set was out of the applicability domain implying that the same performances were obtained considering the applicability domain of the model. This model was fully validated and presented better performances than the one previously proposed [45]. Indeed, if the coefficients of determination were quite similar (R 2 =0.97 for eq. 5 and R 2 =0.92 for Lu), this new model presents better robustness (Q²=0.77 versus Q²=-0.81 respectively).

Multi-linear regressions for the onset temperature
A MLR model was also developed for the prediction of the onset temperature of decomposition. A three-parameter model (see Table S2) was found to be the best compromise between correlation and number of descriptors among the 8 equations including up to 9 descriptors sorted out by the BMLR method (equation 6) with R²=0.84 and Q²=0.77 ( Figure   5).
T onset =144 F -OO + 29 n OO -20 gap + 194 (6) Where n OO is the number of peroxide bonds (t-test=3.6), F -OO is the average local Fukui function on O atoms of the peroxide bond (t-test=7.4), gap is the eV energy difference between the LUMO and HOMO orbitals (t-test=-4.4). This gap, a positive number, is an indicator of the stability of the molecule: the larger the gap is, the more stable is the molecule. F -OO is the Fukui function (quantum chemical descriptor) that characterizes the reactivity and more specifically the electrophilic attack on the peroxide bond. It's worth to note that the dissociation energy did not appear in the model indicating that even if the first step of the decomposition is the cleavage of the homolytic peroxide bond, the decomposition process is also strongly influenced by the reactivity of radicals formed [7,60]. Nevertheless, two descriptors were linked to the peroxide bond: F -OO and n OO confirming again the role of the peroxide bond in the decomposition process. The former was already present in the model for the prediction of the decomposition heat divided by the concentration (eq.5) but for equation 6, the most important descriptor was F -OO considering the value of the t-test, indicating the importance of quantum chemical descriptors in the prediction of this property.
It is noteworthy that the concentration was not included in the multi-linear equation (as proposed by Lu [45]), thus indicating that, contrary to the decomposition heat, it does not affect the onset temperature.

Conclusion
In this study, the largest experimental database including thermal properties for 38 organic peroxides obtained from homogeneous measurements was built to develop and validate the first QSPR models for the prediction of the heat and the onset temperature of decomposition.
Considering more than 300 descriptors including quantum chemical ones calculated with the density functional theory, two MLR models presenting high performances were constructed according to all OECD principles for the validation of QSAR/QSPR models for regulatory use. The number and quality of experimental data (principle 1) allowed validating them on a series of internal and external tests (principle 4). A four-parameter model was obtained for the prediction of the decomposition heat divided by the peroxide concentration reaching high performances in terms of fitting, robustness and predictivity (R²=0.97, Q²=0.94 and R² ext =0.81). A three-parameter model presenting also good performances was sorted out (R²=0.84 and Q²=0.77, R² ext =0.80) for the prediction of the onset temperature of decomposition. Algorithms are unambiguous (principle 2) and their applicability domains are clearly defined (principle 3). Finally, descriptors included in the models were chemically pertinent (principle 5) such as the number of peroxide bonds in the molecule, the distance between the oxygen of this bond and the charge of these atoms, confirming that the homolytic cleavage of the -O-O-bond is critical in the mechanism of decomposition of organic peroxides. The pertinence of quantum descriptors was also demonstrated to reach high performances in terms of predictivity. These models can now be used to fulfil requirements of REACH regulation or as a first screening tool indicating predicted data related to reactivity hazards in view of development of new organic peroxides.