# Nn top 100 models

## Development and validation of a neural network for NAFLD diagnosis

### Abstract

Non-Alcoholic Fatty Liver Disease (NAFLD) affects about 20–30% of the adult population in developed countries and is an increasingly important cause of hepatocellular carcinoma. Liver ultrasound (US) is widely used as a noninvasive method to diagnose NAFLD. However, the intensive use of US is not cost-effective and increases the burden on the healthcare system. Electronic medical records facilitate large-scale epidemiological studies and, existing NAFLD scores often require clinical and anthropometric parameters that may not be captured in those databases. Our goal was to develop and validate a simple Neural Network (NN)-based web app that could be used to predict NAFLD particularly its absence. The study included 2970 subjects; training and testing of the neural network using a train–test-split approach was done on 2869 of them. From another population consisting of 2301 subjects, a further 100 subjects were randomly extracted to test the web app. A search was made to find the best parameters for the NN and then this NN was exported for incorporation into a local web app. The percentage of accuracy, area under the ROC curve, confusion matrix, Positive (PPV) and Negative Predicted Value (NPV) values, precision, recall and f1-score were verified. After that, Explainability (XAI) was analyzed to understand the diagnostic reasoning of the NN. Finally, in the local web app, the specificity and sensitivity values were checked. The NN achieved a percentage of accuracy during testing of 77.0%, with an area under the ROC curve value of 0.82. Thus, in the web app the NN evidenced to achieve good results, with a specificity of 1.00 and sensitivity of 0.73. The described approach can be used to support NAFLD diagnosis, reducing healthcare costs. The NN-based web app is easy to apply and the required parameters are easily found in healthcare databases.

### Introduction

Non-alcoholic liver steatosis (NAFLD) is the leading cause of chronic liver disease in Western countries. This condition increases the risk of cardiovascular disease, type 2 diabetes mellitus and chronic kidney disease and leads to increased mortality1,2. The condition is estimated to affect about 20–30% of the adult population in developed countries3. NAFLD is defined as an accumulation of Triglycerides in the hepatocytes (> 5% of liver volume) of patient with low alcohol intake (< 20 g/day in women or < 30 g/day in men), diagnosed once causes due to viral infections or other specific liver diseases have been excluded4. NAFLD is becoming more common among adults between 40 and 60 years of age, but the disease is also seen children5. A meta-analysis published in 2016 reported that this disease has an average prevalence of 23.71% in Europe6. Population-based studies conducted in our geographical area (district of Bari, Apulia Region, Italy), estimated a prevalence of NAFLD of around 30%, males and the elderly are most commonly affected7.

NAFLD is strongly associated with the metabolic syndrome and is considered the hepatic manifestation of the metabolic syndrome8. It can manifest as pure fatty liver disease (hepato-steatosis) or as non-alcoholic steatohepatitis (NASH), an evolution of the former in which steatosis is associated with inflammation and hepatocellular damage, and with fibrogenic activation that can lead to cirrhosis and the onset of hepatocarcinoma9. In general it has been established that early diagnosis of cirrhosis and elimination of the cause can stop further liver damage, increase the chances of transplant success and also reduce mortality rates10. According to recent EASL—EASD—EASO guidelines11, at the individual level the gold standard for identifying steatosis in individual patients is magnetic resonance imaging (MRI), although ultrasound scanning (US) is considered a good alternative being more widely available and cheaper than MRI. In addition, for large-scale screening studies, serum biomarkers and steatosis score indices have been preferred because their easy availability and low cost has a substantial impact on the feasibility of screening. One of the best validated indexes is the Fatty Liver Index (FLI)12, although other anthropometric indices or measurements work together with FLI in predicting NAFLD risk13.

In recent years, due to the increasing prevalence of NAFLD, there has been a research trend towards identifying low cost, diagnostic methods, and Machine Learning has been acknowledged as a valuable tool. Machine Learning (ML) is a branch of artificial intelligence aimed to enable machines to operate using intelligent "learning" algorithms14. Using the data sets supplied, the machine is able to process them through algorithms that allow it to develop its own logic in order to perform the required function or task. Machine Learning has already been used as a support tool for the diagnosis of different diseases, and for risk quantification, such as cardiovascular risk in patients with diabetes mellitus15,16, ischemic heart disease17 and tumors18.

Nowadays, NAFLD diagnosis is made by performing Ultrasound19 and MRI with lipid content quantification20. Besides some biochemical and/or anthropometric parameters alone or in combination are used to perform the diagnosis21,22. This implies to refer patients to more specialized health center with the consequent healthcare system burden23. Many studies have used ML for the diagnosis of NAFLD but they were predominantly focused to identify particular aspects of NAFLD such as quantification of lipid content, staging, fibrosis, etc24,25,26,27. and no longer simply ascertain the absence of disease, for example, in a large cohort of subjects avoiding in that way the use of non-invasive diagnostics for screening and monitoring NAFLD.

As imaging technologies such as ultrasound, magnetic resonance imaging (MRI), transient elastography (TE), and computed tomography (CT) are expensive and time consuming, they are generally impractical for most serial assessments28 or when large-scale population studies are considered. In addition to high cost, other limitations of imaging-based diagnosis of liver damage such as operator dependence, lower sensitivity and range, radiation exposure and limited availability need to be considered29. Moreover, ML-based models have also been used to classify liver diseases into distinct categories with ~ 80% accuracy30,31, highlighting that biomarker-based diagnostic methods meet the requirements for diagnosis32.

Then, our purpose was to develop a simple web app which permits to perform the diagnosis of absence of NALFD with high accuracy to reduce waiting list and costs for the National Health System, as. most studies on NAFLD diagnosis are based on images or laboratory parameters that are not always available26,33.

The aim of our study was to develop and validate a simple Neural Network (NN), using easily available laboratory parameters which had been identified in our previous study34, in order to build a web app incorporating the NN, trained to apply them to identify subjects at greater risk of NAFLD to be scheduled for ultrasound assessment. We also checked the performance of the trained NN by analyzing Explainability (XAI)35; to evaluate its reliability and ease of use and validate the results on a randomly selected sample subset extracted from a population-based study.

In the first part of this paper the population under study the variables and formula on which the AVI parameter is built have been described, then. Next, a first analysis with the t-SNE36 technique was performed and then we switched to an approach based on NN to search for optimal parameters to build the NN with the parameters identified. Subsequently, the NN performance and XAI are evaluated. Finally, we illustrate the development of a simple local web app tested on a population sample.

### Population

The subjects included in the were drawn from two different cohort studies conducted at the laboratory of Epidemiology and Biostatistics of the National Institute of Gastroenterology, Research Hospital "Saverio de Bellis" (Castellana Grotte, Bari, Italy). Subjects participating in the MICOL study and NUTRIHEP study were eligible. Details on the MICOL and NUTRIHEP study populations have been published elsewhere7,13,37. The MICOL study is an ongoing randomized study of subjects drawn from the electoral list of Castellana Grotte (aged ≥ 30 years) in 1985 and followed up in 1992, 2005–2006 and 2013–2016. The study included a total of 2970 out of 3000 selected subjects; 56.5% were male. By 1985, 2472 subjects had been enrolled. In 2005–2006, 1697 of the original cohort were still present. In 2005–2006 this cohort was added with a randomized sample of 1273 subjects (PANEL study) aged between 30 and 50 years, to compensate for the cohort aging38,39. All subjects gave prior informed written consent to participate.

All procedures were performed in accordance with the ethical standards of the institutional research committee (IRCCS Saverio de Bellis approval for research and the ethics committee for the MICOL study (DDG-CE-347/1984; DDG-CE-453/1991; DDG-CE-589/2004; DDG-CE 782/2013) and, with the Helsinki Declaration of 1964. The NUTRIHEP study was conducted at the National Institute of Gastroenterology Saverio de Bellis (Castellana Grotte, Bari, Italy) in collaboration with 12 General Practitioners (GPs) operating in Putignano (Bari, Italy). The study period was from July 2005 to January 2007. By means of systematic random sampling of 1 of every 5 procedures, a sample from the general population aged ≥ 18 years had been obtained from the General Practitioners lists. Instead, we used records from a census design, because no significant difference was found between the age-sex distribution of the general population from Putignano and the subjects listed in the general practitioners' registers. Therefore, 2550 subjects were invited to participate in the survey and, 2301 (90%) agreed. NUTRIHEP subjects were followed-up in 2015–2017 then, 951 of them were included. All subjects provided written information and consent according to the 1964 Helsinki Declaration.

The subjects participating in the MICOL and NUTRIHEP studies underwent anthropometric measurements, blood sampling and hepatic ultrasound. They were weighed wearing underwear, on an electronic scale, SECA; weight was approximated to the nearest 0.1 kg. Height was measured with a SECA wall stadiometer, approximated to the nearest 1 cm. Blood pressure (BP) measurements were performed following international guidelines40. The mean of 3 BP measurements was calculated.

### Data acquisition and pre-processing

The initial database for the MICOL III trial contained 2970 subjects. The sample declined to 2869 as for 101 subjects there were no data on at least one of the values among Waist Circumference (WC), Hips (HP) (variables for the constitution of AVI), Gamma-Glutamyl Transferase (GGT), Glucose. These 2869 subjects constituted the new database used for training and testing the NN using a train-test-split approach. From the NUTRIHEP database, initially composed of 2301 subjects, we randomly extracted a further 100 subjects to constitute the validation sample.

#### Variables used

The Variables used to develop the NN were: Sex, Age, Gamma-Glutamyl Transferase (GGT), Glucose, Abdominal Volume Index (AVI)41 and NAFLD condition.

We have previously highlighted that the best model to detect the NAFLD condition is based on the above variables. These variables were identified starting from a sample of 27 variables and exploiting a subset selection approach in order to identify the model with fewer variables and better performance34. Table 1 shows the formula employed to build the AVI index.

AVI is the only compound index used, and this formula is easy to compute and the component variables are easily available as they consist of anthropometric measurements.

The array composed by Sex, Age, Gamma-Glutamyl Transferase (GGT), Glucose, Abdominal Volume Index (AVI) represents the X of our algorithm and the condition of NAFLD the Y.

NAFLD diagnosis was performed using an ultrasound scanner Hitachi H21 Vision (Hitachi Medical Corporation, Tokyo, Japan). Examination of the visible liver parenchyma was performed with a 3.5 MHz transducer.

### Data exploration

Data were explored by using a t-Distributed Stochastic Neighbor Embedding (t-SNE)36. It is an unsupervised and nonlinear technique used primarily for data exploration and visualization of high-dimensional data; its output shows how the data are organized in a high-dimensional space. This technique has not performed in optimal way failing to clearly discriminate the two classes 0 (No NAFLD), 1 (NAFLD), Fig. 1 shows data displayed with the t-SNE.

### Hyperparameter tuning for the neural network

Initially, a NN was created using the Open Source library “scikit-learn42 by Python.

For the interaction with the csv file containing the database, the library “numpy” (np)43 by python was used.

The NN is an MLPClassifier (Perceptron Multilayer Classifier)42 and a supervised machine learning algorithm44. The first fundamental step was to split the considered database using the “Train_test_split” (function present in scikit-learn) in order to divide the sample into two subsets (80% of the data used for NN training and the remaining 20% for the testing).

GridSearchCV42 was used to search for optimal parameters for the NN.

The GridSearchCV is included in the scikit-learn library.

We have performed the NN optimization for the following parameters:

• Activation function: searched among (‘identity’, ‘logistic’, ‘tanh’, ‘relu’)

• Solver type: limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm (lbfgs)45 Stochastic Gradient Descent46, Adam47, (‘lbfgs’, ‘sgd’, ‘adam’). "lbfgs" is an optimizer in the family of almost Newtonian methods48. We selected "lbfgs" because for small data sets it can converge faster and get better performance.

• Learning rate: searched among (‘constant’, ‘invscaling’, ‘adaptive’)

• the Maximum number of iterations looking for it in a defined range of values (max_iter': [1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000, 3000,4000,5000,6000,7000, 8000,9000],) Maximum number of iterations. The solver iterates until convergence (determined by "tol") or until the maximum number of iterations.

• The alpha value searched for in a set of defined values (alpha': 10.0 ** -np.arange(0, 10),) Penalty parameter L249.

• The number of hidden layers of the network 'hidden_layer_sizes': np.arange(0, 20), searched in a range from 0 to 20

• And the value of 'random_state': [0,1,2,3,4,5,6,7,8,9,10] searched in the range from 0 to 10 to make sure the results were replicable.

MLPClassifier performs iterative training because at each time step the partial derivatives50 of the loss function50 are calculated with respect to the model parameters, in order to update the parameters. It can also have a regularization term added to the loss function that reduces the Model Parameters to prevent overfitting. The values obtained at the end of the NN optimization were:

• activation: 'logistic'

• alpha: 1.0

• hidden_layer_sizes: 19

• learning_rate: 'constant'

• max_iter: 9000

• random_state: 10

• solver: 'lbfgs'

### Training session and neural network test

The algorithm was trained using as target variable the NAFLD condition and as features Sex, Age, GGT, Glucose and AVI values.

The dataset used for the training and the test of the algorithms was the MICOL subjects, subdivided into the Test and Training subsets: 80% of the dataset was dedicated to the training phase while the remaining 20% was used in the model testing phase. The output reported the accuracy during training and testing, the value of the area of the Roc curve (AUC)51,52 in the training and testing phase, the Confusion Matrix53 and the value of Precision, Recall and F1-score in the testing phase.

### Results

Participants characteristics and the performance of AVI indexes in MICOL subjects are shown in Table 2. The NAFLD prevalence was 31.7%, the condition being, as expected, more prevalent among men. Subjects with NAFLD were a little older, with increased levels of Glucose and GGT.

In Table 3 are shown Participants characteristics and the performance of AVI indexes in the NUTRIHEP study. In the original study NAFLD prevalence was 24.3% and, as expected, more prevalent among men.

### Neural network performance analysis

The first parameter considered to evaluate the performance of the NN was the accuracy defined as54:

$${\text{Accuracy }} = { }\frac{{{\text{Number}}\;{\text{of}}\;{\text{correct}}\;{\text{preditions}}}}{{{\text{Total}}\;{\text{number}}\;{\text{of}}\;{\text{preditcions}}}}*{ }100{ }$$

More specifically, the accuracy of a model is calculated with the following formula54:

$${\text{Accuracy }} = \frac{{{\text{TP }} + {\text{ TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}*100{\text{\% }}$$

where TP = True Positive, TN = True Negative, FP = False Positive and FN = False Negative.

Accuracy was measured during both the NN training and the testing phase.

Another performance index that we considered was the value of the ROC curve52. The area under the ROC (AUC, "Area Under the Curve") is a measure of accuracy and indicates the diagnostic power of the test.

In Figs. 2 and 3 the ROC curves with the AUC value obtained during the training phase and testing phase are shown.

In addition to the accuracy and ROC curve values, we evaluated the confusion matrix to verify the reliability of the NN. Figure 4 shows the confusion matrix values in the test phase.

In addition, the Positive (PPV) (0.57) and Negative (NPV) (0.86) predictive values were calculated. It is worth to note that the NN is able to identify subjects without the condition with a very high precision.

Table 4 shows the Accuracy and AUC values obtained during training and testing of the NN.

The values obtained for AUC and Accuracy (both for the training phase and for the test phase) show that the NN implemented does not present overfitting or underfitting problems, because the values of the two ROC curves and the values related to the accuracy differ very slightly. Additionally, in order to validate the performance of the NN precision, recall and f1-score values during the test phase were evaluated. In Table 5 are shown values of Precision, Recall, f1-score of No NAFLD and, NAFLD subject, Macro average and Weighted average during test phase.

### Evaluating Explainability using SHAP

After verifying the behavior of the NN by comparing the various indices considered, we performed with the analysis of Explainability (XAI) using LIME55 and the SHAP56 library of Python to compare any inconsistencies. We initially proceeded to the evaluation by performing a relevance analysis of the features in order to verify whether the anthropometric and biochemical variables considered gave a real and consistent contribution in the diagnosis of NAFLD. Figures 4 and 5 show the contribution given by each feature used in the diagnosis of NAFLD within the NN during the Training and Test.

Figures 5 and 6 shows the importance of AVI, GGT and Age as already highlighted in previous studies34 are more important than sex and glucose in the diagnosis of this pathology but still combining them all together they lead to a good diagnostic result in a NAFLD diagnosis.

In Figs. 7 and 8 we report the previous graph seen in another way, more specifically we can understand:

• Feature importance: variables ranked in descending order of importance.

• Impact: horizontal position shows whether the effect of that value is associated with a higher or lower prediction.

• Value: color shows whether that variable is high or low for that observation. Red color deducts the high value and blue for the lower value. The change in color of the dot shows the value of the feature. Correlation: Of each characteristic with the pathology being examined.

### Evaluating Explainability using LIME

Subsequently exploiting the LIME library, it has been verified how the NN has reasoned in order to obtain a diagnosis verifying both the case of diagnosis of "sick subject" and that of "healthy subject".

Figure 9 shows which characteristics had a greater impact on a diagnosis of disease present and which had a greater impact on a diagnosis of disease absent with relative final diagnosis. Regarding subjects diagnosed as sick, the features that contributed most to directing the NN toward a diagnosis of sick subject were AVI, age, and GGT value demonstrating how the NN performs optimal reasoning.

Figure 10 shows what concerns the characteristics that contribute to the identification of healthy subjects, the NN took into consideration the values that from the clinical diagnosis are standard values of GGT, Glucose and a low value of the AVI index.

Also, in the diagnosis of healthy subjects the NN has produced an optimal reasoning correctly directing the diagnosis.

### Export of the trained algorithm and incorporation into the web app

After the NN training and testing and the XAI analysis we exported the already trained model. In this way it is possible to avoid repeating the training every time we want to perform a new forecast. The model export was done using the “pickle” tool by Python57, which allowed the generation of a file with the extension “.pkl”. This file is then loaded by means of another python program which can be used to make a new forecast. Another important function implemented is the creation of a web application written using the HTML languages58, CSS59 and JavaScript60. This web application can interface with the trained NN to test it on new data, different from those used to train the original NN. The interface of the web application with the trained algorithm was implemented through the “flask library”61 by python. A flask object receives a request from the web and displays the HTML file that allows it to interface with the NN.

The user can fill in the form present in a web page and after clicking the submit button, the flask object receives a request, extracts the input, runs it through the template and finally displays the HTML page with the result of the prediction.

The HTML page includes various fields in which to enter variables, and a submit button to pass the input data to the NN that will perform the prediction. At the end of the prediction, the HTML page will display the NAFLD status: “NAFLD Detected” or the string “No NAFLD Detected”.

The web app also includes the automatic calculation of the AVI parameter from the values for hips Circumference and waist Circumference using the code implemented in Javascript.

### Test of the web app on a sample of subjects with known NAFLD

To test the web app, the database previously formed by random extraction of 100 subjects participating in the NUTRIHEP study was used. The web app was passed the data: age, Sex, GGT, GLUCOSE, WC, HC.

After the input of the parameters and clicking the submit button, the values were sent to the NN. The web app feedback, related to the NAFLD status, was then saved in a dataset used for comparison with the true NAFLD condition, already known to us.

Using the saved dataset, we could calculate the accuracy, sensitivity and specificity of the web app.

In the sample considered, there were 50 subjects affected by NAFLD and 50 healthy subjects. The NN correctly identified all the healthy subjects but made 18 errors, all false negatives. On this result we calculated the values of Specificity and Sensitivity of the NN.

It is important to point out that many of the subjects considered healthy by the NN had anthropometric and biochemical values in the norm, but it is possible that these subjects were affected by mild NAFLD, although with values still within normal range62.

Table 6 shows the sensitivity and specificity values for the NN in the web app.

### Discussion

In this study, a NN to support NAFLD diagnosis has been developed on a model made up of easily available variables, as already highlighted in our previous work34.

In particular, in this work we trained a NN to identify patients at risk of NAFLD and, developed a local web app for use as a tool in epidemiological studies and screening. The aim was to make a prior identification of healthy patients in order to ensure that only subjects really needing it are sent on for ultrasound examination.

Today, alternative, less expensive methods of diagnosis compared to traditional tools (MRI, Ultrasound) are very important in the diagnosis of NAFLD. The reorganization of the National Health System requires close consideration of aspects related to performance together with factors related to the reduction of costs and waiting times. The objective of our study was to create a NN implementing an intuitive and easy application to support medical decisions during the diagnostic phase using simpler and cheaper tools, thus reducing both costs and waiting times related to the use of instrumental methods. We highlight that it would thereby be possible to use simple computers to make a diagnosis of NAFLD, resulting in a faster diagnosis and thus preventing disease evolution and the resulting serious consequences.

Several prediction models for NAFLD in the literature have been developed to identify healthy subjects and subjects with NAFLD. These existing NAFLD prediction models have employed clinical and laboratory parameters; however, some parameters are not always routinely measured or retrievable in health databases63,64. This limits the use of these models in large-scale epidemiologic studies and health database research. Specifically comparing the AUC of NN (0.821) with traditional methods we could verify that the performance in terms of AUC is superior to LAP65 (0.79), Hepatic steatosis index66 (0.81), SteatoTest67 (0.79), APRI68 (0.60), NAFLD fibrosis score69 (0.82). When considering some studies exploiting AI techniques, we could verify that a new approach using LWA (learning by abstraction) method classifies liver ultrasound images as normal or abnormal and does not classify the data unless it is confident of accurate prediction. Features were extracted from ROIs within 99 ultrasound images and were used to train NN, SVM, and LWA classifiers with fivefold cross-validation. The proposed LWA method outperformed the other classifiers with an AUROC of 0.7870. In a second study, the prediction ability of particle swarm optimization (PSO), GA, MReg (multilinear regression), and alternative decision tree (ADT) algorithms were compared using medical data from 39,567 patients. Using uniform random sampling, the dataset was divided into training (22,690 patients) and test (16,877) sets. Four algorithms were applied for classification using tenfold cross-validation. The results evidenced that the ADT model had an AUROC between 0.73 and 0.7671. In another study factors provided by the 2005 updated ATP III clinical criteria for metabolic syndrome (MetS) along with age and gender were used to create a NAFLD prediction model. After preprocessing data from 40,637 patients they were divided into 66% and 34% for training and testing sets, respectively. The classification was performed by the J48 algorithm using hold-out cross-validation, and the AUROC of 0.731 was achieved72. NN also performed better in these cases.

From the described results, it can be seen that the NN, using AVI plus Glucose plus GGT plus Sex plus Age, produced few prediction errors in the test phase, whereas the accuracy percentage was not very high. However, the 18% error (18 of 100 subjects) in the test phase may be open to doubt, since it is possible that these subjects were developing NAFLD and so merely diagnosed in advance).

It has been demonstrated that the good performance of the ML algorithms used to identify NAFLD, applying common anthropometric parameters and other variables, can be a valid alternative to the classic indexes73,74.

Moreover, the NN was able to correctly identify all the subjects without NAFLD, as evidenced by the high VPP value (0.86). This VPP satisfies our objectives to detect subjects without NAFLD to avoid referral to perform more expensive diagnostic procedures.

This type of study highlights the fact that a NN can be used to find high-risk NAFLD subjects to send on for US. In this way, 82.6% of unnecessary US tests could be avoided (this value was calculated as the ratio of the total number of subjects in the web app test set, divided by the total number of subjects in the web app test set plus the number of false predictions).

In addition, to lighten the waiting lists, our aim was to develop a machine learning algorithm that would allow savings by eliminating a number of US that would otherwise be prescribed. The NN developed is therefore useful to exclude NAFLD and may be considered a valid diagnostic support in the context of epidemiological studies, not merely a smart working replacement diagnostic tool.

In conclusion, the NN can be considered a valid support for medical decision making in regard to health policies, in the context of epidemiological studies and screening.

### Study limitations

There are several limitations to this work. The most significant is that this study was conducted in a single center and so has a rather limited sample size. Deep learning models in other fields have included millions of samples. Another problem is that the NN is strongly linked to the identification of the NAFLD condition only in a Mediterranean population with the characteristics on which it was formed. A second limitation is the low sensitivity of the NAFLD diagnostic methodology, as it fails to detect a fatty liver content as low as > 25%75. However, both databases were drawn from population-based studies and subjects were selected from electoral lists or from the physicians lists. Moreover, participants subjects did not seek medical assistance and participated on a voluntary basis. Therefore, the NAFLD diagnosis performed by US was the only diagnostic procedure that could be proposed to participants, since biopsy or H-MRS would obviously be unethical.

### Future developments

In the future the NN based web app can be improved by using a SQL database where to save the entered data and, providing feedback to the app (correct or wrong prediction) in order to continue its training and make it more flexible so that it can be used on any kind of population. This could be done by leveraging a document classification system76 to retrieve data from electronic medical records and then building an open dataset77 in order to improve with more heterogeneous data the web app.

### Conclusion

The application of ML in the diagnosis of NAFLD is an efficient approach to identify healthy subjects. The model we propose has that can be exploited to target only those subjects who have a real need for further investigation, thus leading to a reduction in waiting lists, costs and time required for instrumental examinations. In this research we have predicted the risk of developing NAFLD in individuals using biochemical and anthropometric variables in a NN. The rationale behind our approach is divided into two parts: first train, evaluate performance and validate the result in assessing NAFLD risk in an individual. Second, development of a local web app that incorporates the previously evaluated NN, compare its performance applying in this way a rapid and non-invasive methodology in order to demonstrate that the proposed technique is suitable for optimal discrimination for NAFLD risk assessment. It is worthy to note that through XAI, it is possible to identify the factors that contribute to a given diagnosis. This facilitates the physician to do informed choices about their patients management and improve the health conditions of the subjects.

### Abbreviations

Ultrasound scan

Non-alcoholic fatty liver disease

Waist circumference

Hips

Fatty liver index

Machine learning

Abdominal volume index

Gamma-glutamyl transferase

Non-alcoholic steatohepatitis

Neural network

Magnetic resonance imaging

Blood pressure

True positive

True negative

False positive

False negative

HyperText markup language

Area under the curve

Positive predictive value

Negative predictive value

### References

1. 1.

Fazel, Y., Koenig, A. B., Sayiner, M., Goodman, Z. D. & Younossi, Z. M. Epidemiology and natural history of non-alcoholic fatty liver disease. Metabolism65, 1017–1025 (2016).

2. 2.

Levene, A. P. & Goldin, R. D. The epidemiology, pathogenesis and histopathology of fatty liver disease. Histopathology61, 141–152 (2012).

3. 3.

Preiss, D. & Sattar, N. Non-alcoholic fatty liver disease: An overview of prevalence, diagnosis, pathogenesis and treatment considerations. Clin. Sci. (Lond.)115, 141–150 (2008).

4. 4.

Neuschwander-Tetri, B. A. & Caldwell, S. H. Nonalcoholic steatohepatitis: Summary of an AASLD single topic conference. Hepatology37, 1202–1219 (2003).

5. 5.

Zelber-Sagi, S., Ratziu, V. & Oren, R. Nutrition and physical activity in NAFLD: An overview of the epidemiological evidence. World J. Gastroenterol.17, 3377–3389 (2011).

6. 6.

Younossi, Z. M. et al. Global epidemiology of nonalcoholic fatty liver disease-meta-analytic assessment of prevalence, incidence, and outcomes. Hepatology64, 73–84 (2016).

7. 7.

Cozzolongo, R. et al. Epidemiology of HCV infection in the general population: A survey in a southern Italian town. Am. J. Gastroenterol.104, 2740–2746 (2009).

8. 8.

Marchesini, G., Marzocchi, R., Agostini, F. & Bugianesi, E. Nonalcoholic fatty liver disease and the metabolic syndrome. Curr. Opin. Lipidol.16, 421–427 (2005).

9. 9.

Ratziu, V., Bellentani, S., Cortez-Pinto, H., Day, C. & Marchesini, G. A position statement on NAFLD/NASH based on the EASL 2009 special conference. J. Hepatol.53, 372–384 (2010).

10. 10.

Schuppan, D. & Afdhal, N. H. Liver cirrhosis. Lancet371, 838–851 (2008).

11. 11.

Mahana, D. et al. Antibiotic perturbation of the murine gut microbiome enhances the adiposity, insulin resistance, and liver disease associated with high-fat diet. Genome Med.8, 1–20 (2016).

12. 12.

Bedogni, G. et al. The fatty liver index: A simple and accurate predictor of hepatic steatosis in the general population. BMC Gastroenterol.6, 33 (2006).

13. 13.

Procino, F. et al. Reducing NAFLD-screening time: A comparative study of eight diagnostic methods offering an alternative to ultrasound scans. Liver Int.39, 187–196 (2019).

14. 14.

Mohammed, M., Khan, M. B. & Bashier, E. B. M. Machine Learning: Algorithms and Applications (CRC Press, 2016).

15. 15.

Napoli, C., Benincasa, G., Schiano, C. & Salvatore, M. Differential epigenetic factors in the prediction of cardiovascular risk in diabetic patients. Eur. Heart J. Cardiovasc. Pharmacother.6, 239–247 (2020).

16. 16.

Dagliati, A. et al. Machine learning methods to predict diabetes complications. J. Diabetes Sci. Technol.12, 295–302 (2018).

17. 17.

Kukar, M., Kononenko, I., Groselj, C., Kralj, K. & Fettich, J. Analysing and improving the diagnosis of ischaemic heart disease with machine learning. Artif. Intell. Med.16, 25–50 (1999).

18. 18.

Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. & Fotiadis, D. I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J.13, 8–17 (2015).

19. 19.

Ferraioli, G. & Monteiro, L. B. S. Ultrasound-based techniques for the diagnosis of liver steatosis. World J. Gastroenterol.25, 6053 (2019).

20. 20.

Schaapman, J. J., Tushuizen, M. E., Coenraad, M. J. & Lamb, H. J. Multiparametric MRI in patients with nonalcoholic fatty liver disease. J. Magn. Reson. Imaging53, 1623–1631 (2021).

21. 21.

Papatheodoridi, M. & Cholongitas, E. Diagnosis of non-alcoholic fatty liver disease (NAFLD): Current concepts. Curr. Pharm. Des.24, 4574–4586 (2018).

22. 22.

Stachowska, E., Portincasa, P., Jamioł-Milc, D., Maciejewska-Markiewicz, D. & Skonieczna-Żydecka, K. The relationship between prebiotic supplementation and anthropometric and biochemical parameters in patients with NAFLD-A systematic review and meta-analysis of randomized controlled trials. Nutrients12, 3460 (2020).

23. 23.

Cotter, T. G. et al. Nonalcoholic fatty liver disease: Impact on healthcare resource utilization, liver transplantation and mortality in a large, integrated healthcare system. J. Gastroenterol.55, 722–730 (2020).

24. 24.

Jiang, T. et al. Application of computer tongue image analysis technology in the diagnosis of NAFLD. Comput. Biol. Med.135, 104622 (2021).

25. 25.

Taylor-Weiner, A. et al. A machine learning approach enables quantitative measurement of liver histology and disease monitoring in NASH. Hepatology74, 133–147 (2021).

26. 26.

Feng, G. et al. Machine learning algorithm outperforms fibrosis markers in predicting significant fibrosis in biopsy-confirmed NAFLD. J. Hepatobiliary Pancreat. Sci.28, 593–603 (2021).

27. 27.

Qu, H. et al. Training of computational algorithms to predict NAFLD activity score and fibrosis stage from liver histopathology slides. Comput. Methods Prog. Biomed.207, 106153 (2021).

28. 28.

Schwenzer, N. F. et al. Non-invasive assessment and quantification of liver steatosis by ultrasound, computed tomography and magnetic resonance. J. Hepatol.51, 433–445 (2009).

29. 29.

Calès, P. et al. Reproducibility of blood tests of liver fibrosis in clinical practice. Clin. Biochem.41, 10–18 (2008).

30. 30.

Fatima, M. & Pasha, M. Survey of machine learning algorithms for disease diagnostic. J. Intell. Learn. Syst. Appl.9, 1 (2017).

31. 31.

Vijayarani, S. & Dhayanand, S. Liver disease prediction using SVM and Naïve Bayes algorithms. Int. J. Sci., Eng. Technol. Res. (IJSETR)4, 816–820 (2015).

32. 32.

Hadizadeh, F., Faghihimani, E. & Adibi, P. Nonalcoholic fatty liver disease: Diagnostic biomarkers. World J. Gastrointest. Pathophysiol.8, 11 (2017).

33. 33.

Das, A., Connell, M. & Khetarpal, S. Digital image analysis of ultrasound images using machine learning to diagnose pediatric nonalcoholic fatty liver disease. Clin. Imaging77, 62–68 (2021).

34. 34.

Sorino, P. et al. Selecting the best machine learning algorithm to support the diagnosis of non-alcoholic fatty liver disease: A meta learner study. PLoS ONE15, e0240867 (2020).

35. 35.

Arrieta, A. B. et al. Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion58, 82–115 (2020).

36. 36.

Linderman, G. C. & Steinerberger, S. Clustering with t-SNE, provably. SIAM J. Math. Data Sci.1, 313–332 (2019).

37. 37.

Osella, A. R. et al. Overweight and obesity in southern Italy: Their association with social and life-style characteristics and their effect on levels of biologic markers. Rev. Fac. Cien. Med. Univ. Nac. Cordoba71, 113–124 (2014).

38. 38.

Osella, A. R., Misciagna, G., Leone, A., Di Leo, A. & Fiore, G. Epidemiology of hepatitis C virus infection in an area of southern Italy. J. Hepatol.27, 30–35 (1997).

39. 39.

Misciagna, G. et al. Epidemiology of cholelithiasis in southern Italy. Part II: Risk factors. Eur. J. Gastroenterol. Hepatol.8, 585–593 (1996).

40. 40.

Sever, P. New hypertension guidelines from the National Institute for Health and clinical excellence and the British hypertension society. J. Renin-Angiotensin-Aldosterone Syst.7, 61–63 (2006).

41. 41.

Guerrero-Romero, F. & Rodríguez-Morán, M. Abdominal volume index. An anthropometry-based index for estimation of obesity is strongly related to impaired glucose tolerance and type 2 diabetes mellitus. Arch. Med. Res.34, 428–432 (2003).

42. 42.

Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.12, 2825–2830 (2011).

43. 43.

Harris, C. R. et al. Array programming with NumPy. Nature585, 357–362 (2020).

44. 44.

Cunningham, P., Cord, M. & Delany, S. J. Supervised learning. In Machine Learning Techniques for Multimedia 21–49 (Springer, 2008).

45. 45.

Saputro, D. R. S. & Widyaningsih, P. Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method for the parameter estimation on geographically weighted ordinal logistic regression model (GWOLR). In AIP Conference Proceedings, Vol. 1868, 040009 (AIP Publishing LLC, 2017).

46. 46.

Bottou, L. Large-Scale Machine Learning with Stochastic Gradient Descent 177–186 (Physica-Verlag HD, 2010).

47. 47.

Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

48. 48.

Grippo, L. & Sciandrone, M. Metodi quasi-Newton. In Metodi di ottimizzazione non vincolata 289–323 (Springer, 2011).

49. 49.

Ng, A. Y. Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine learning 78 (2004).

50. 50.

Ashcroft, M. Advanced Machine Learning: Training Basic Neural Networks.

51. 51.

ROC Curve. in Encyclopedia of Machine Learning (eds. Sammut, C. & Webb, G. I.) 875–875 (Springer, 2010).

52. 52.

Melo, F. Area under the ROC curve. in Encyclopedia of Systems Biology (eds. Dubitzky, W., Wolkenhauer, O., Cho, K.-H. & Yokota, H.) 38–39 (Springer, 2013).

53. 53.

Ting, K. M. Confusion matrix. in Encyclopedia of Machine Learning and Data Mining (eds. Sammut, C. & Webb, G. I.) 260–260 (Springer, 2017).

54. 54.

Biswas, A. K., Noman, N. & Sikder, A. R. Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. BMC Bioinform.11, 273 (2010).

55. 55.

Samuel, T. S. B. Comparing the Explainability of Different Crop Disease Identification Models Using LIME (2021).

56. 56.

Bugaj, M., Wrobel, K. & Iwaniec, J. Model explainability using SHAP values for LightGBM predictions. In 2021 IEEE XVIIth International Conference on the Perspective Technologies and Methods in MEMS Design (MEMSTECH) 102–106 (IEEE, 2021).

57. 57.

Rossum, G. V. The Python Library Reference: Release 3.6.4 (2018).

58. 58.

Patel, K. Incremental journey for World Wide Web: Introduced with web 1.0 to recent web 5.0–a survey paper. Int. J. Adv. Res. Comput. Sci. Softw. Eng.3, 1–9 (2013).

59. 59.

Duckett, J. HTML & CSS: Design and Build Websites (Wiley, 2011).

60. 60.

Flanagan, D. & Novak, G. M. Java-Script: The Definitive Guide (American Institute of Physics, 1998).

61. 61.

Grinberg, M. Flask Web Development: Developing Web Applications with Python (O’Reilly Media Inc, 2018).

62. 62.

Kumar, R. & Mohan, S. Non-alcoholic fatty liver disease in lean subjects: Characteristics and implications. J. Clin. Transl. Hepatol.5, 216–223 (2017).

63. 63.

Kwok, R. et al.

Sours: https://www.nature.com/articles/s41598-021-99400-y

## The Sequential model

Author:fchollet
Date created: 2020/04/12
Description: Complete guide to the Sequential model.

View in ColabGitHub source

### When to use a Sequential model

A model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

Schematically, the following model:

is equivalent to this function:

A Sequential model is not appropriate when:

• Your model has multiple inputs or multiple outputs
• Any of your layers has multiple inputs or multiple outputs
• You need to do layer sharing
• You want non-linear topology (e.g. a residual connection, a multi-branch model)

### Creating a Sequential model

You can create a Sequential model by passing a list of layers to the Sequential constructor:

Its layers are accessible via the attribute:

You can also create a Sequential model incrementally via the method:

Note that there's also a corresponding method to remove layers: a Sequential model behaves very much like a list of layers.

Also note that the Sequential constructor accepts a argument, just like any layer or model in Keras. This is useful to annotate TensorBoard graphs with semantically meaningful names.

### Specifying the input shape in advance

Generally, all layers in Keras need to know the shape of their inputs in order to be able to create their weights. So when you create a layer like this, initially, it has no weights:

It creates its weights the first time it is called on an input, since the shape of the weights depends on the shape of the inputs:

Naturally, this also applies to Sequential models. When you instantiate a Sequential model without an input shape, it isn't "built": it has no weights (and calling results in an error stating just this). The weights are created when the model first sees some input data:

Once a model is "built", you can call its method to display its contents:

However, it can be very useful when building a Sequential model incrementally to be able to display the summary of the model so far, including the current output shape. In this case, you should start your model by passing an object to your model, so that it knows its input shape from the start:

Note that the object is not displayed as part of , since it isn't a layer:

A simple alternative is to just pass an argument to your first layer:

Models built with a predefined input shape like this always have weights (even before seeing any data) and always have a defined output shape.

In general, it's a recommended best practice to always specify the input shape of a Sequential model in advance if you know what it is.

### A common debugging workflow: +

When building a new Sequential architecture, it's useful to incrementally stack layers with and frequently print model summaries. For instance, this enables you to monitor how a stack of and layers is downsampling image feature maps:

Very practical, right?

### What to do once you have a model

Once a Sequential model has been built, it behaves like a Functional API model. This means that every layer has an and attribute. These attributes can be used to do neat things, like quickly creating a model that extracts the outputs of all intermediate layers in a Sequential model:

Here's a similar example that only extract features from one layer:

### Transfer learning with a Sequential model

Transfer learning consists of freezing the bottom layers in a model and only training the top layers. If you aren't familiar with it, make sure to read our guide to transfer learning.

Here are two common transfer learning blueprint involving Sequential models.

First, let's say that you have a Sequential model, and you want to freeze all layers except the last one. In this case, you would simply iterate over and set on each layer, except the last one. Like this:

Another common blueprint is to use a Sequential model to stack a pre-trained model and some freshly initialized classification layers. Like this:

If you do transfer learning, you will probably find yourself frequently using these two patterns.

To find out more about building models in Keras, see:

Sours: https://keras.io/guides/sequential_model/

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (see papers for details and citations).

### Install

SHAP can be installed from either PyPI or conda-forge:

pip install shap or conda install -c conda-forge shap

### Tree ensemble example (XGBoost/LightGBM/CatBoost/scikit-learn/pyspark models)

While SHAP can explain the output of any machine learning model, we have developed a high-speed exact algorithm for tree ensemble methods (see our Nature MI paper). Fast C++ implementations are supported for XGBoost, LightGBM, CatBoost, scikit-learn and pyspark tree models:

importxgboostimportshap# train an XGBoost modelX, y=shap.datasets.boston() model=xgboost.XGBRegressor().fit(X, y) # explain the model's predictions using SHAP# (same syntax works for LightGBM, CatBoost, scikit-learn, transformers, Spark, etc.)explainer=shap.Explainer(model) shap_values=explainer(X) # visualize the first prediction's explanationshap.plots.waterfall(shap_values[0])

The above explanation shows features each contributing to push the model output from the base value (the average model output over the training dataset we passed) to the model output. Features pushing the prediction higher are shown in red, those pushing the prediction lower are in blue. Another way to visualize the same explanation is to use a force plot (these are introduced in our Nature BME paper):

# visualize the first prediction's explanation with a force plotshap.plots.force(shap_values[0])

If we take many force plot explanations such as the one shown above, rotate them 90 degrees, and then stack them horizontally, we can see explanations for an entire dataset (in the notebook this plot is interactive):

# visualize all the training set predictionsshap.plots.force(shap_values)

To understand how a single feature effects the output of the model we can plot the SHAP value of that feature vs. the value of the feature for all the examples in a dataset. Since SHAP values represent a feature's responsibility for a change in the model output, the plot below represents the change in predicted house price as RM (the average number of rooms per house in an area) changes. Vertical dispersion at a single value of RM represents interaction effects with other features. To help reveal these interactions we can color by another feature. If we pass the whole explanation tensor to the argument the scatter plot will pick the best feature to color by. In this case it picks RAD (index of accessibility to radial highways) since that highlights that the average number of rooms per house has less impact on home price for areas with a high RAD value.

# create a dependence scatter plot to show the effect of a single feature across the whole datasetshap.plots.scatter(shap_values[:,"RM"], color=shap_values)

To get an overview of which features are most important for a model we can plot the SHAP values of every feature for every sample. The plot below sorts features by the sum of SHAP value magnitudes over all samples, and uses SHAP values to show the distribution of the impacts each feature has on the model output. The color represents the feature value (red high, blue low). This reveals for example that a high LSTAT (% lower status of the population) lowers the predicted home price.

# summarize the effects of all the featuresshap.plots.beeswarm(shap_values)

We can also just take the mean absolute value of the SHAP values for each feature to get a standard bar plot (produces stacked bars for multi-class outputs):

shap.plots.bar(shap_values)

### Natural language example (transformers)

SHAP has specific support for natural language models like those in the Hugging Face transformers library. By adding coalitional rules to traditional Shapley values we can form games that explain large modern NLP model using very few function evaluations. Using this functionality is as simple as passing a supported transformers pipeline to SHAP:

importtransformersimportshap# load a transformers pipeline modelmodel=transformers.pipeline('sentiment-analysis', return_all_scores=True) # explain the model on two sample inputsexplainer=shap.Explainer(model) shap_values=explainer(["What a great movie! ...if you have no taste."]) # visualize the first prediction's explanation for the POSITIVE output classshap.plots.text(shap_values[0, :, "POSITIVE"])

### Deep learning example with DeepExplainer (TensorFlow/Keras models)

Deep SHAP is a high-speed approximation algorithm for SHAP values in deep learning models that builds on a connection with DeepLIFT described in the SHAP NIPS paper. The implementation here differs from the original DeepLIFT by using a distribution of background samples instead of a single reference value, and using Shapley equations to linearize components such as max, softmax, products, divisions, etc. Note that some of these enhancements have also been since integrated into DeepLIFT. TensorFlow models and Keras models using the TensorFlow backend are supported (there is also preliminary support for PyTorch):

# ...include code from https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.pyimportshapimportnumpyasnp# select a set of background examples to take an expectation overbackground=x_train[np.random.choice(x_train.shape[0], 100, replace=False)] # explain predictions of the model on four imagese=shap.DeepExplainer(model, background) # ...or pass tensors directly# e = shap.DeepExplainer((model.layers[0].input, model.layers[-1].output), background)shap_values=e.shap_values(x_test[1:5]) # plot the feature attributionsshap.image_plot(shap_values, -x_test[1:5])

The plot above explains ten outputs (digits 0-9) for four different images. Red pixels increase the model's output while blue pixels decrease the output. The input images are shown on the left, and as nearly transparent grayscale backings behind each of the explanations. The sum of the SHAP values equals the difference between the expected model output (averaged over the background dataset) and the current model output. Note that for the 'zero' image the blank middle is important, while for the 'four' image the lack of a connection on top makes it a four instead of a nine.

### Deep learning example with GradientExplainer (TensorFlow/Keras/PyTorch models)

Expected gradients combines ideas from Integrated Gradients, SHAP, and SmoothGrad into a single expected value equation. This allows an entire dataset to be used as the background distribution (as opposed to a single reference value) and allows local smoothing. If we approximate the model with a linear function between each background data sample and the current input to be explained, and we assume the input features are independent then expected gradients will compute approximate SHAP values. In the example below we have explained how the 7th intermediate layer of the VGG16 ImageNet model impacts the output probabilities.

fromkeras.applications.vgg16importVGG16fromkeras.applications.vgg16importpreprocess_inputimportkeras.backendasKimportnumpyasnpimportjsonimportshap# load pre-trained model and choose two images to explainmodel=VGG16(weights='imagenet', include_top=True) X,y=shap.datasets.imagenet50() to_explain=X[[39,41]] # load the ImageNet class namesurl="https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json"fname=shap.datasets.cache(url) withopen(fname) asf: class_names=json.load(f) # explain how the input to the 7th layer of the model explains the top two classesdefmap2layer(x, layer): feed_dict=dict(zip([model.layers[0].input], [preprocess_input(x.copy())])) returnK.get_session().run(model.layers[layer].input, feed_dict) e=shap.GradientExplainer( (model.layers[7].input, model.layers[-1].output), map2layer(X, 7), local_smoothing=0# std dev of smoothing noise ) shap_values,indexes=e.shap_values(map2layer(to_explain, 7), ranked_outputs=2) # get the names for the classesindex_names=np.vectorize(lambdax: class_names[str(x)][1])(indexes) # plot the explanationsshap.image_plot(shap_values, to_explain, index_names)

Predictions for two input images are explained in the plot above. Red pixels represent positive SHAP values that increase the probability of the class, while blue pixels represent negative SHAP values the reduce the probability of the class. By using we explain only the two most likely classes for each input (this spares us from explaining all 1,000 classes).

### Model agnostic example with KernelExplainer (explains any function)

Kernel SHAP uses a specially-weighted local linear regression to estimate SHAP values for any model. Below is a simple example for explaining a multi-class SVM on the classic iris dataset.

importsklearnimportshapfromsklearn.model_selectionimporttrain_test_split# print the JS visualization code to the notebookshap.initjs() # train a SVM classifierX_train,X_test,Y_train,Y_test=train_test_split(*shap.datasets.iris(), test_size=0.2, random_state=0) svm=sklearn.svm.SVC(kernel='rbf', probability=True) svm.fit(X_train, Y_train) # use Kernel SHAP to explain test set predictionsexplainer=shap.KernelExplainer(svm.predict_proba, X_train, link="logit") shap_values=explainer.shap_values(X_test, nsamples=100) # plot the SHAP values for the Setosa output of the first instanceshap.force_plot(explainer.expected_value[0], shap_values[0][0,:], X_test.iloc[0,:], link="logit")

The above explanation shows four features each contributing to push the model output from the base value (the average model output over the training dataset we passed) towards zero. If there were any features pushing the class label higher they would be shown in red.

If we take many explanations such as the one shown above, rotate them 90 degrees, and then stack them horizontally, we can see explanations for an entire dataset. This is exactly what we do below for all the examples in the iris test set:

# plot the SHAP values for the Setosa output of all instancesshap.force_plot(explainer.expected_value[0], shap_values[0], X_test, link="logit")

### SHAP Interaction Values

SHAP interaction values are a generalization of SHAP values to higher order interactions. Fast exact computation of pairwise interactions are implemented for tree models with . This returns a matrix for every prediction, where the main effects are on the diagonal and the interaction effects are off-diagonal. These values often reveal interesting hidden relationships, such as how the increased risk of death peaks for men at age 60 (see the NHANES notebook for details):

### Sample notebooks

The notebooks below demonstrate different use cases for SHAP. Look inside the notebooks directory of the repository if you want to try playing with the original notebooks yourself.

### TreeExplainer

An implementation of Tree SHAP, a fast and exact algorithm to compute SHAP values for trees and ensembles of trees.

### DeepExplainer

An implementation of Deep SHAP, a faster (but only approximate) algorithm to compute SHAP values for deep learning models that is based on connections between SHAP and the DeepLIFT algorithm.

An implementation of expected gradients to approximate SHAP values for deep learning models. It is based on connections between SHAP and the Integrated Gradients algorithm. GradientExplainer is slower than DeepExplainer and makes different approximation assumptions.

### LinearExplainer

For a linear model with independent features we can analytically compute the exact SHAP values. We can also account for feature correlation if we are willing to estimate the feature covaraince matrix. LinearExplainer supports both of these options.

### KernelExplainer

An implementation of Kernel SHAP, a model agnostic method to estimate SHAP values for any model. Because it makes not assumptions about the model type, KernelExplainer is slower than the other model type specific algorithms.

• Census income classification with scikit-learn - Using the standard adult census income dataset, this notebook trains a k-nearest neighbors classifier using scikit-learn and then explains predictions using .

• ImageNet VGG16 Model with Keras - Explain the classic VGG16 convolutional nerual network's predictions for an image. This works by applying the model agnostic Kernel SHAP method to a super-pixel segmented image.

• Iris classification - A basic demonstration using the popular iris species dataset. It explains predictions from six different models in scikit-learn using .

### Documentation notebooks

These notebooks comprehensively demonstrate how to use specific functions and objects.

### Methods Unified by SHAP

1. LIME: Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why should i trust you?: Explaining the predictions of any classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.

2. Shapley sampling values: Strumbelj, Erik, and Igor Kononenko. "Explaining prediction models and individual predictions with feature contributions." Knowledge and information systems 41.3 (2014): 647-665.

3. DeepLIFT: Shrikumar, Avanti, Peyton Greenside, and Anshul Kundaje. "Learning important features through propagating activation differences." arXiv preprint arXiv:1704.02685 (2017).

4. QII: Datta, Anupam, Shayak Sen, and Yair Zick. "Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems." Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 2016.

5. Layer-wise relevance propagation: Bach, Sebastian, et al. "On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation." PloS one 10.7 (2015): e0130140.

6. Shapley regression values: Lipovetsky, Stan, and Michael Conklin. "Analysis of regression in game theory approach." Applied Stochastic Models in Business and Industry 17.4 (2001): 319-330.

7. Tree interpreter: Saabas, Ando. Interpreting random forests. http://blog.datadive.net/interpreting-random-forests/

### Citations

The algorithms and visualizations used in this package came primarily out of research in Su-In Lee's lab at the University of Washington, and Microsoft Research. If you use SHAP in your research we would appreciate a citation to the appropriate paper(s):

Sours: https://github.com/slundberg/shap

## Machine learning enabling prediction of the bond dissociation enthalpy of hypervalent iodine from SMILES

### Abstract

Machine learning to create models on the basis of big data enables predictions from new input data. Many tasks formerly performed by humans can now be achieved by machine learning algorithms in various fields, including scientific areas. Hypervalent iodine compounds (HVIs) have long been applied as useful reactive molecules. The bond dissociation enthalpy (BDE) value is an important indicator of reactivity and stability. Experimentally measuring the BDE value of HVIs is difficult, however, and the value has been estimated by quantum calculations, especially density functional theory (DFT) calculations. Although DFT calculations can access the BDE value with high accuracy, the process is highly time-consuming. Thus, we aimed to reduce the time for predicting the BDE by applying machine learning. We calculated the BDE of more than 1000 HVIs using DFT calculations, and performed machine learning. Converting SMILES strings to Avalon fingerprints and learning using a traditional Elastic Net made it possible to predict the BDE value with high accuracy. Furthermore, an applicability domain search revealed that the learning model could accurately predict the BDE even for uncovered inputs that were not completely included in the training data.

### Introduction

Organic chemistry enables the synthesis of various molecules by continuously breaking and forming molecular bonds. Bond dissociation enthalpy (BDE) is an indicator of the strength of a chemical bond and is an essential consideration in the design of chemical reactions and reactive molecules. Heat energy or light energy in adequate quantities can be used to break the chemical bond homolytically. Therefore, BDE is a commonly estimated on the basis of thermal measurement1, kinetics2, and electricity3,4. In recent years, advances in the development of computers, quantum chemistry, and density functional theory (DFT) calculations, have provided remarkably more accurate methodologies for predicting BDE5,6,7. In silico methods can be used to estimate the BDE, even for pinpoint chemical bonds of complicated molecules and imaginary molecules, enabling the design of reactive molecules and estimating the stability of functional molecules before they are synthesised. Even with advanced computer technology, however, the calculation costs of the DFT method remain enormous. The calculation time exponentially increases by the total number of electrons in a molecule. Therefore, obtaining the BDE values of hundreds or thousands of molecules at once by quantum computations remains challenging.

Hypervalent iodine (HVI), which bears over eight valence electrons on iodine, is a reactive molecule used as an oxidant or an alkylating agent in organic synthesis8,9,10,11,12,13,14,15. Heterolytic or homolytic cleavage of a weak, three-center four-electron (3c–4e) bond of HVI progresses the chemical reaction. Therefore, the BDE of the 3c–4e bond of HVI is an essential parameter that has been calculated by the DFT method on demand16,17,18,19. We previously reported the BDE value of 3c–4e bonds in various HVIs on the basis of DFT calculations19. We first determined the optimal functional and basis sets for reproducing the 3c–4e bond in silico and calculated a BDE value of 206 HVIs. While this database is helpful for chemists, it is still necessary to calculate the BDE for HVIs that are not available in the database.

Machine learning is currently attracting attention worldwide, and analysing and learning from a population using statistical methods enables immediate prediction of the results from new inputs. In the field of organic chemistry, machine learning is applied for predicting synthetic pathways and reactivity, and optimising reaction conditions20,21,22,23,24,25,26. Yu and co-workers reported the prediction model of BDE of carbonyl groups with machine learning in 202027. Their important model accurately predicts the BDE value of the C=O bond on the basis of the bond length and bond angle of the relevant site as inputs. Three-dimensional molecular information is required for the input data, however, and thus time-consuming DFT calculations are inescapable. We considered that an ideal and highly useful method of BDE prediction for chemists should not require quantum computations to prepare the input data. Therefore, we decided to use only structural formula information, such as SMILES strings, to predict the BDE value of HVIs by machine learning (Fig. 1).

### Methods

We first performed DFT calculations to increase the sizes of the data set populations. The DFT calculations were performed using Gaussian16 with MN1528 functional and SDD29,30 (for I and Se) and cc-pvTZ31 (for the others) basis sets19. Structure optimizations were carried out with an ultrafine grid at 298.15 K in gas phase. Harmonic vibrational frequencies were computed at the same level of theory to confirm no imaginary vibration was observed for the optimised structure. BDE was calculated from the enthalpy (H) of each species at 298 K according to the following formula:

$$BDE=y={H}_{radical A}^{298}+ {H}_{radical B}^{298}- {H}_{AB}^{298}$$

In addition to the BDE data of 206 HVIs, which we reported previously, we newly calculated 510 HVIs by DFT calculations (Fig. 2). Various combinations of iodine-containing backbones and leaving groups were calculated to increase the diversity of the data sets. A total of 330 cyclic HVIs were calculated: 105 molecules with 35 types of leaving groups and three common HVI skeletons for cyclic HVIs, and 225 molecules with 75 types of cyclic HVI skeletons and 3 common leaving groups. In addition, 167 acyclic HVIs were calculated: 13 types of symmetric HVIs, 101 types of asymmetric HVIs, and 66 types of HVIs with 33 HVI skeletons and 2 leaving groups. The 716 types of HVIs were randomly divided into 75% training and 25% test data sets. The training data set was first subjected to a grid search by k-partition cross-validation in each machine learning iterative process to optimise the hyperparameters (see supplementary information for details). For machine learning, three types of structural formulas were converted to SMILES: HVI (neutral), leaving group (radical), and HVI skeleton (radical). Then, fingerprints were generated using an RDkit (version 2019.09.3)32: Morgan33 (Circular, r = 2, 3 or 4), Topological (RDKFingerprint)32, MACCS34, and Avalon35. In each fingerprint, learning from the training data set was performed with optimised hyperparameters using Elastic Net (EN)36, support vector (SVR)37, Neural Network (NN)38, Random Forest (RF)39, and LightGBM (LGBM)40. The accuracy of the BDE prediction was evaluated by comparison with the test data set. Mean absolute error (MAE) and coefficient of determination (R2) were used to evaluate the prediction accuracy of the BDE.

$$\mathrm{MAE}=\frac{1}{n}\sum_{i=1}^{n}|{yi}_{DFT}-{yi}_{ML}|$$

$${R}^{2}=1-\frac{\sum_{i=1}^{n}{({yi}_{DFT}-{yi}_{ML})}^{2}}{\sum_{i=1}^{n}{({yi}_{DFT}-{yi}_{average\_DFT})}^{2}}$$

The training and testing were performed 10 times (random state = 0–9), and accuracy was evaluated by the average.

### Results and discussion

As a result of the Grid search, we used both the "relu" and "logistic" evaluation functions for NN (see supplementary information for the detailed grid search results). The Avalon fingerprint, which features various factors such as atoms, bonds, and ring information, enables highly accurate prediction with an R2 = 0.964 (Fig. 3a) and MAE = 1.58 kcal/mol (Fig. 3b) by EN, which was the best score. SVR and NNs also gave high scores. In the Morgan fingerprint, which considers each atom's neighbourhood, the increasing number of recognised atoms gave a lower accuracy, and r = 2 (recognising first and second neighbour atoms) with the EN method giving the highest accuracy, similar to Avalon. The Topological fingerprint, which considers atoms and bond types, gave a high R2 of 0.931 and a small MAE of 2.41 kcal/mol using the SVR method; however, it was inferior to the Avalon and Morgan fingerprints. The MACCS fingerprint, which counts 166 specific substructures, yielded the worst results. Although it gave an R2 of 0.905 and an MAE of 3.16 kcal/mol by the NN (relu) method, the errors were small and acceptable. EN and SVR tended to give good results except for the MACCS fingerprint; on the other hand, RF and LGBM, which are decision-tree learning models, predicted BDE with low accuracy in all fingerprints.

Next, we investigated the applicability domain (AD) of these machine learning models41. Verifying the AD of the learning model is essential for examining the overfitting of training and the applicable range of uncovered inputs. For the AD search, the BDE of 561 HVIs was newly calculated by DFT calculations and classified into four groups: group A in which the leaving group and the HVI skeleton were individually included in the training data, group B in which the leaving group was included and the HVI skeleton was not included in the training data, group C in which the leaving group was not included and HVI skeleton was included in the training data, and group D in which neither the leaving group nor the HVI skeleton was included in the training data (Fig. 4). All HVIs shown in Fig. 2 were used as training data, and learning by the decision tree, which was an inappropriate learning model, was excluded.

The investigation of AD with group A (Fig. 5aA,bA) demonstrated that the Avalon fingerprint maintained high accuracy, that is, R2 = 0.932 and MAE = 2.47 kcal/mol with the EN method (Fig. 6A). SVR and NN_r also gave R2 = 0.920, 0.920 and MAE = 2.70, 2.89 kcal/mol, respectively. The Morgan (r = 2) fingerprint had a slightly lower accuracy with R2 = 0.911 and MAE = 2.79 kcal/mol with the EN method. On the other hand, in the Topological and MACCS fingerprints, the R2 value was lower than 0.7 and the minimum MAE was 5.44 kcal/mol (Topological, EN), indicating a significant decrease in accuracy from the test data in Fig. 3. Therefore, overfitting of the training data occurred in the Topological and MACCS fingerprint. With the molecules of group B (Fig. 5aB,bB), which contains new HVI skeletons, the accuracy was slightly decreased but the R2 value of the Avalon (Fig. 6B) and Morgan (r = 2) fingerprints maintained a high accuracy of 0.880 and 0.863, respectively. In group C (Fig. 5aC,bC), which contains new leaving groups, the Avalon fingerprint could still predict with adequate accuracy with R2 = 0.828 with the EN method (Fig. 6C). The Morgan (r = 2) fingerprint predicted the BDE value with R2 = 0.532 and MAE = 8.00 kcal/mol, which are much lower than the values in groups A and B, indicating that prediction with the uncovered leaving groups was not applicable. We considered that because HVI skeletons contain R–I–R' bonds, the Morgan fingerprint could well recognise the pattern of the structure; however, the leaving groups were difficult to learn accurately because of the divergent structures. Finally, we verified the AD of group D (Fig. 5aD,bD), a completely new data set, and revealed that the Avalon fingerprint predicted the BDE value with R2 = 0.759 and MAE = 5.97 kcal/mol (Fig. 6D). Because the Avalon fingerprint considers a larger variety of features and/or generates the fingerprint with a larger number of bits than MACCS, topological or Morgan, it was possible to appropriately evaluate the similarity of molecules and predict uncovered data with higher accuracy than other fingerprints.

We finally compared the computation time of the DFT method and machine learning method to calculate the BDE value of 561 HVIs of group A-D. In our computational environment, the DFT method required 4272 days (time converted to per core), i.e., 12 years; on the other hand, machine learning completed the 561 predictions from SMILES strings within 3 s, an overwhelming difference in speed.

### Conclusions

We constructed a BDE prediction model for HVIs from SMILES strings using machine learning, which does not require quantum computations for input data. Avalon fingerprint generation and Elastic Net machine learning made it possible to predict BDE with high accuracy and an MAE of 1.58 kcal/mol. This model exhibited a high applicable range that can be predicted with an MAE of 5.97 kcal/mol, even for completely uncovered inputs. With this model, it is possible to access the predicted value of BDE for HVIs at a remarkable speed compared with modern quantum calculations. We anticipate that machine learning will be carried out by many organic chemists to facilitate the molecular design and reaction design of HVI.

### Data availability

Computational details including the results of grid search, geometry and energy of HVIs by DFT, and the list of SMILES and the value of BDEDFT are provided in Supplementary Information.

### References

1. 1.

Szwarc, M. The estimation of bond-dissociation energies by pyrolyric methods. Chem. Rev. (Washington, DC, US)47, 75–173. https://doi.org/10.1021/cr60146a002 (1950).

2. 2.

Kerr, J. A. Bond dissociation energies by kinetic methods. Chem. Rev.66, 465–500 (1966).

3. 3.

Fu, Y. et al. Quantum-chemical predictions of redox potentials of organic anions in dimethyl sulfoxide and reevaluation of bond dissociation enthalpies measured by the electrochemical methods. J. Phys. Chem. A110, 5874–5886. https://doi.org/10.1021/jp055682x (2006).

4. 4.

Okajima, M. et al. Generation of diarylcarbenium ion pools via electrochemical C–H bond dissociation. Bull. Chem. Soc. Jpn.82, 594–599. https://doi.org/10.1246/bcsj.82.594 (2009).

5. 5.

Feng, Y., Liu, L., Wang, J.-T., Huang, H. & Guo, Q.-X. Assessment of experimental bond dissociation energies using composite ab initio methods and evaluation of the performances of density functional methods in the calculation of bond dissociation energies. J. Chem. Inf. Comput. Sci.43, 2005–2013. https://doi.org/10.1021/ci034033k (2003).

6. 6.

Yao, X.-Q., Hou, X.-J., Jiao, H., Xiang, H.-W. & Li, Y.-W. Accurate calculations of bond dissociation enthalpies with density functional methods. J. Phys. Chem. A107, 9991–9996. https://doi.org/10.1021/jp0361125 (2003).

7. 7.

Kim, S. et al. Computational study of bond dissociation enthalpies for a large range of native and modified lignins. J. Phys. Chem. Lett.2, 2846–2852. https://doi.org/10.1021/jz201182w (2011).

8. 8.

Kita, Y., Tohma, H., Kikuchi, K., Inagaki, M. & Yakura, T. Hypervalent iodine oxidation of N-acyltyramines: Synthesis of quinol ethers, spirohexadienones, and hexahydroindol-6-ones. J. Org. Chem.56, 435–438. https://doi.org/10.1021/jo00001a082 (1991).

9. 9.

Kita, Y. et al. Hypervalent iodine-induced nucleophilic substitution of para-substituted phenol ethers. Generation of cation radicals as reactive intermediates. J. Am. Chem. Soc.116, 3684–3691. https://doi.org/10.1021/ja00088a003 (1994).

10. 10.

Zhdankin, V. V. et al. Preparation, X-ray crystal structure, and chemistry of stable azidoiodinanes—Derivatives of benziodoxole. J. Am. Chem. Soc.118, 5192–5197. https://doi.org/10.1021/ja954119x (1996).

11. 11.

Kieltsch, I., Eisenberger, P. & Togni, A. Mild electrophilic trifluoromethylation of carbon- and sulfur-centered nucleophiles by a hypervalent iodine(III)-CF3 reagent. Angew. Chem. Int. Ed.46, 754–757. https://doi.org/10.1002/anie.200603497 (2007).

12. 12.

Phipps, R. J. & Gaunt, M. J. A meta-selective copper-catalyzed C-H bond arylation. Science (Washington, DC, US)323, 1593–1597. https://doi.org/10.1126/science.1169975 (2009).

13. 13.

Brand, J. P. & Waser, J. Direct alkynylation of thiophenes: Cooperative activation of TIPS-EBX with gold and Broensted acids. Angew. Chem. Int. Ed.49, 7304–7307. https://doi.org/10.1002/anie.201003179 (2010).

14. 14.

Matsumoto, K., Nakajima, M. & Nemoto, T. Visible light-induced direct S0 → Tn transition of benzophenone promotes C(sp3)-H alkynylation of ethers and amides. J. Org. Chem.85, 11802–11811. https://doi.org/10.1021/acs.joc.0c01573 (2020).

15. 15.

Nakajima, M. et al. A direct S0→Tn transition in the photoreaction of heavy-atom-containing molecules. Angew. Chem. Int. Ed.59, 6847–6852. https://doi.org/10.1002/anie.201915181 (2020).

16. 16.

Konnick, M. M. et al. Selective CH functionalization of methane, ethane, and propane by a perfluoroarene iodine(III) complex. Angew. Chem. Int. Ed.53, 10490–10494. https://doi.org/10.1002/anie.201406185 (2014).

17. 17.

Li, M., Wang, Y., Xue, X.-S. & Cheng, J.-P. A systematic assessment of trifluoromethyl radical donor abilities of electrophilic trifluoromethylating reagents. Asian J. Org. Chem.6, 235–240. https://doi.org/10.1002/ajoc.201600539 (2017).

18. 18.

Yang, J.-D., Li, M. & Xue, X.-S. Computational I(III)-X BDEs for benziodoxol(on)e-based hypervalent iodine reagents: Implications for their functional group transfer abilities. Chin. J. Chem.37, 359–363. https://doi.org/10.1002/cjoc.201800549 (2019).

19. 19.

Matsumoto, K., Nakajima, M. & Nemoto, T. Determination of the best functional and basis sets for optimization of the structure of hypervalent iodines and calculation of their first and second bond dissociation enthalpies. J. Phys. Org. Chem.https://doi.org/10.1002/poc.3961 (2019).

20. 20.

Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci.4, 1465–1476. https://doi.org/10.1021/acscentsci.8b00357 (2018).

21. 21.

Walker, E. et al. Learning to predict reaction conditions: Relationships between solvent, molecular structure, and catalyst. J. Chem. Inf. Model.59, 3645–3654. https://doi.org/10.1021/acs.jcim.9b00313 (2019).

22. 22.

Fu, Z. et al. Optimizing chemical reaction conditions using deep learning: A case study for the Suzuki-Miyaura cross-coupling reaction. Org. Chem. Front.7, 2269–2277. https://doi.org/10.1039/d0qo00544d (2020).

23. 23.

Kondo, M. et al. Exploration of flow reaction conditions using machine-learning for enantioselective organocatalyzed Rauhut-Currier and [3+2] annulation sequence. Chem. Commun. (Cambridge, UK)56, 1259–1262. https://doi.org/10.1039/c9cc08526b (2020).

24. 24.

Jorner, K., Tomberg, A., Bauer, C., Skold, C. & Norrby, P.-O. Organic reactivity from mechanism to machine learning. Nat. Rev. Chem.5, 240–255. https://doi.org/10.1038/s41570-021-00260-x (2021).

25. 25.

Kim, H. W. et al. Reaction condition optimization for non-oxidative conversion of methane using artificial intelligence. React. Chem. Eng.6, 235–243. https://doi.org/10.1039/d0re00378f (2021).

26. 26.

Matsubara, S. Digitization of organic synthesis—How synthetic organic chemists use AI technology. Chem. Lett.50, 475–481. https://doi.org/10.1246/cl.200802 (2021).

27. 27.

Yu, H. et al. Using machine learning to predict the dissociation energy of organic carbonyls. J. Phys. Chem. A124, 3844–3850. https://doi.org/10.1021/acs.jpca.0c01280 (2020).

28. 28.

Yu, H. S., He, X., Li, S. L. & Truhlar, D. G. MN15: A Kohn-Sham global-hybrid exchange-correlation density functional with broad accuracy for multi-reference and single-reference systems and noncovalent interactions. Chem. Sci.7, 5032–5051. https://doi.org/10.1039/c6sc00705h (2016).

29. 29.

Dolg, M., Wedig, U., Stoll, H. & Preuss, H. Energy-adjusted ab initio pseudopotentials for the first row transition elements. J. Chem. Phys.86, 866–872. https://doi.org/10.1063/1.452288 (1987).

30. 30.

Andrae, D., Haeussermann, U., Dolg, M., Stoll, H. & Preuss, H. Energy-adjusted ab initio pseudopotentials for the second and third row transition elements. Theor. Chim. Acta77, 123–141. https://doi.org/10.1007/bf01114537 (1990).

31. 31.

Dunning, T. H. Jr. Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen. J. Chem. Phys.90, 1007–1023. https://doi.org/10.1063/1.456153 (1989).

32. 32.

RDKit: Open-Source Cheminformatics Software. https://www.rdkit.org/.

33. 33.

Morgan, H. L. Generation of a unique machine description for chemical structures—A technique developed at Chemical Abstracts Service. J. Chem. Doc.5, 107–113. https://doi.org/10.1021/c160017a018 (1965).

34. 34.

Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci.42, 1273–1280. https://doi.org/10.1021/ci010132r (2002).

35. 35.

Gedeck, P., Rohde, B. & Bartels, C. QSAR—How good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J. Chem. Inf. Model.46, 1924–1936. https://doi.org/10.1021/ci050413p (2006).

36. 36.

Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.)67, 301–320 (2005).

37. 37.

Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn.20, 273–297 (1995).

38. 38.

Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci.79, 2554–2558 (1982).

39. 39.

Ho, T. K. Random decision forests. in Proceedings of 3rd International Conference on Document Analysis and Recognition. 278–282 (IEEE, 2021).

40. 40.

Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural. Inf. Process. Syst.30, 3146–3154 (2017).

41. 41.

Tetko, I. V. et al. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection. J. Chem. Inf. Model.48, 1733–1746 (2008).

### Acknowledgements

This work was supported by the Institute of Global Prominent Research, Chiba University. Numerical calculations were carried out in the SR24000 computer at the Institute of Management and Information Technologies, Chiba University.

### Affiliations

1. Graduate School of Pharmaceutical Sciences, Chiba University, Chiba, Japan

Masaya Nakajima & Tetsuhiro Nemoto

### Contributions

M.N. conceived this research and performed all calculations. All authors discussed and co-wrote the manuscript.

### Corresponding authors

Correspondence to Masaya Nakajima or Tetsuhiro Nemoto.

### Competing interests

The authors declare no competing interests.

### Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

### Rights and permissions

Reprints and Permissions

Nakajima, M., Nemoto, T. Machine learning enabling prediction of the bond dissociation enthalpy of hypervalent iodine from SMILES. Sci Rep11, 20207 (2021). https://doi.org/10.1038/s41598-021-99369-8

Anyone you share the following link with will be able to read this content:

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Sours: https://www.nature.com/articles/s41598-021-99369-8

## Metrics

A metric is a function that is used to judge the performance of your model.

Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model. Note that you may use any loss function as a metric.

### Usage with &

The method takes a argument, which is a list of metrics:

Metric values are displayed during and logged to the object returned by . They are also returned by .

Note that the best way to monitor your metrics during training is via TensorBoard.

To track metrics under a specific name, you can pass the argument to the metric constructor:

All built-in metrics may also be passed via their string identifier (in this case, default constructor argument values are used, including a default metric name):

### Standalone usage

Unlike losses, metrics are stateful. You update their state using the method, and you query the scalar metric result using the method:

The internal state can be cleared via .

Here's how you would use a metric as part of a simple custom training loop:

### As simple callables (stateless)

Much like loss functions, any callable with signature that returns an array of losses (one of sample in the input batch) can be passed to as a metric. Note that sample weighting is automatically supported for any such metric.

Here's a simple example:

In this case, the scalar metric value you are tracking during training and evaluation is the average of the per-batch metric values for all batches see during a given epoch (or during a given call to ).

### As subclasses of (stateful)

Not all metrics can be expressed via stateless callables, because metrics are evaluated for each batch during training and evaluation, but in some cases the average of the per-batch values is not what you are interested in.

Let's say that you want to compute AUC over a given evaluation dataset: the average of the per-batch AUC values isn't the same as the AUC over the entire dataset.

For such metrics, you're going to want to subclass the class, which can maintain a state across batches. It's easy:

• Create the state variables in
• Update the variables given and in
• Return the metric result in
• Clear the state in

Here's a simple example computing binary true positives:

### The API

When writing the forward pass of a custom layer or a subclassed model, you may sometimes want to log certain quantities on the fly, as metrics. In such cases, you can use the method.

Let's say you want to log as metric the mean of the activations of a Dense-like custom layer. You could do the following:

The quantity will then tracked under the name "activation_mean". The value tracked will be the average of the per-batch metric metric values (as specified by ).

See the documentation for more details.

Sours: https://keras.io/api/metrics/
Poolside Fashion Show 📸 (WK 342.2) - Bratayley

You can also entertain yourself by giving them difficult errands, such as cleaning the floors in the house using only your tongue. Or cleaning the dust with a brush inserted into the anus. And if the job is poorly done, order the whip to be brought. Another slave from the staff is the cook slave.

### Now discussing:

I started up and saw myself in the mirror. God, I was scared of my own reflection. I smiled at myself and myself.

5564 5565 5566 5567 5568