Neuroet: a simple artificial neural network for scientists

 

Erik Tribou

Peter A. Noble, Ph.D.

Civil and Environmental Engineering

University of Washington

Seattle, Washington 98195

 

Email: panoble@washington.edu

Phone: (206)-685-7583

Web: http://noble.ce.washington.edu

 

Copyright@ 2004 University of Washington


 


If you use this software or the documentation for your research, please include the following reference:

 

Noble, P.A. and E. Tribou. 2006. Neuroet: an easy-to-use artificial neural network for ecological and biological modeling. Ecological Modelling 203, 87-98. [pdf]

 

Manuscripts that have used Neuroet:

 

Pozhitkov, A., B. Chernov, G. Yershov, and P.A. Noble. 2005. Evaluation of gel-pad oligonucleotide microarray technology using artificial neural networks. Appl. Environ. Microbiol. 71: 8663-8676. [pdf]

 

Pozhitkov, A., P. A. Noble, T. Domazet-Loso, P. Staehler, M. Beier and D.Tautz. 2006. Tests of rRNA hybridization to microarrays suggest that hybridization characteristics of oligonucleotide probes for species discrimination cannot be predicted. Nucleic Acids Research. 34, e66. [pdf].

 

Pozhitkov, A., R.G. Stedtfeld, S.A. Hashsham and P. A. Noble. 2007. Revision of the nonequilibrium dissociation and stringent washing approaches for identification of mixed nucleic acid targets by microarrays. Nucleic Acids Research 35, e70.

 

Download Neuroet package

Download Documentation in PDF format

Download Neuroet example data files

 

Table of Contents.

 

1. Preface.

2. Hot Start.

3. Introduction.

            3.1 Fundamentals of artificial neural networks.

                        3.1.1 Training methods.

                        3.1.2 Architecture of NNs.

                        3.1.3. Other relevant information.

4. Assembling the data for neural network analyses.

5. Pre-analyzing your data using Neuroet.

6. Determining relationships among input and output variables.

6.1 Optimizing the number of hidden neurons.

6.2 Training the NN.

6.3 Testing the NN.

6.4 Measuring the predictive importance of input variables.

6.5 Extracting equations from trained NNs.

7.0 Fundamentals of scaling methods, transfer functions, and training methods.

            7.1 Scaling.

            7.2 Transfer functions.

            7.3 Training methods.

8. Testing Neuroet.

8.1 Determining the effects of increasing number of hidden neurons on NN prediction.

8.2 Determining the effects of training method on NN prediction.

8.3 Examining the effects of data complexity on NN prediction.

9. Benchmark testing of Neuroet and 詁est approaches.

9.1 Background on the Donoho-Johnstone NN benchmarks.

9.2 Optimizing the number of hidden neurons using data containing different amounts of noise.

9.3 Results of Blocks, Bumps, Heaving and Doppler analyses using Neuroet.

9.3.1 Blocks.

9.3.2 Bumps.

9.3.3 Heavisine.

9.3.4 Doppler.

10. Summary.

11. References

Appendix 1. Instructions for starting up Neuroet

 


1. Preface.

 

The idea for developing a simple neural network package to assist scientists in analyzing data occurred to me several years ago, when I stumbled upon a paper by Bertone et al. (1996).  They used artificial neural networks (NN) to analyze marine bacteria based on fatty acid compositions.  I was impressed by their 詎ew approach and tried to develop a simple NN for analyzing rRNA banding patterns in polyacrylamide gels.  My adventure was met with limited success.  My first NN program was made in a spreadsheet (i.e. MS Excel), and after numerous computer crashes, a more reliable and faster approach was needed.  I purchased Rao and Rao's book (C++ Neural Networks and Fuzzy Logic) and taught myself C++.  Through writing and testing computer programs, and reading the science literature, I learned the fundamentals and pitfalls of neural network computing.  This documentation and software package represents the sum of several years of 詐laying with artificial NNs.  My objective for this project was to develop a straightforward and convenient approach for analyzing data using artificial NNs.

 

Neuroet was developed by Erik Tribou and myself.  Subtle changes in Neuroet source code and packaging for the web were implemented by Travis Krick (University of Washington).  Numerous people have aided in the development of Neuroet.  I thank Drs. Jim Morris (University of South Carolina), Chet Rakocinski (University of South Mississippi), and Alex Pozhitkov (University of Washington), who beta–tested early versions of Neuroet.  Thanks also to Dr. Seana K. Davidson (University of Washington) for providing the 詐retty brain image.

 

Funds for this project were generously provided by the National Science Foundation (DEB 9802342, DEB 0088879, DEB 0196203), the National Institute for Health (U01 DE 14955-02), and the Environmental Protection Agency (CEER-GOM- R-82945801).

 

If you have any problems, questions or comments about Neuroet, I would be delighted if you contacted me.

 

Peter A. Noble, Ph.D.

Research Assistant Professor

 

201 More Hall

Civil and Environmental Engineering

University of Washington

Seattle, WA 98195

Phone: (206)-685-7583

Email: panoble@washington.edu

 


 

2. Hot Start.

 

This section provides basic information for people who prefer not to read detailed instructions and know the fundamentals of neural computing.

 

2.1  Assemble input and output files (tab delimited) ensuring that there are the same number of rows in both files and that each column has a column heading.  Note extra spaces tend to cause problems so use underscores if spaces are necessary (e.g. 訽).

 

2.2  To determine the optimal number of hidden neurons:

 

2.2.1    Open the Preference window; set the Input Scaling Method tab to: 許tandardized =o s=1; check the Include Bias Term; set the Number of Hidden Neurons to 2; set the Target Scaling Method tab to: 許tandardized =o s=1; check the Include Bias Term; set the Data Set Proportions in Test and Validation data sets to 10%; click the OK button;

 

2.2.2    Click the Optimize Hidden Neurons button; browse for the input file by clicking the Choose button; browse for the output file by clicking the Choose button; click the Training Method button until you find 訪evenberg-Marquardt; slide the Max Hidden Neurons to the maximum number of input variables; click the Optimize button; the program will request a file name; type in 訤irst_Optimized_NN; Watch the NN brain pulsate (it is thinking).

 

2.2.3    When the NN brain stops pulsating a file called 訤irst_Optimized_NN will appear on your Desktop.  Open the file in MS Excel or any word processor.  Three columns with the following headings will appear: Number (of hidden neurons), SBC score, and AICc score.  The optimal number of hidden neurons is the one with the lowest score and should be in the first row (depending if it is sorted by SBC or AICc).

 

2,3 To train a neural network:

 

2.3.1    Open Preference window and set the Number of Hidden Neurons to the optimal number (determined in 2 iii); click the OK button;

 

2.3.2    Click the Train tab; browse for the input file by clicking the Choose button; browse for the output file by clicking the Choose button; click the Training Method tab until you find 訪evenberg-Marquardt; set the Threshold (R2) button to 1.0; do not click the Keep Best Generalization button; click the Train Network button; the program will request a file name; type in 訤irst_ NN; Watch the NN brain pulsate (it is thinking again).

 

2.3.3    When the NN brain stops pulsating, a folder called 訤irst_ NN will appear on your Desktop.  The folder contains seven files: weights.txt (contains the weights), biases.txt (contains the biases), data2X.txt and data2Y.txt (used for Sensitivity Analysis), dataOut.txt (contains the raw and scaled data, and proportions of the data used for training, testing and validation the neural network), prediction.txt (contains predicted and actual output variables and the R-squared value), and NN.ser (this file that can be used by Neuroet to test new input variables).  Open the prediction.txt in MS Excel or any text editor.  The R-squared value is at the bottom of the file.  This tells us how well the NN learned.  The weights and biases can be used to construct the equation defining the relationship between the input variables and the output variable (discussed below). 

 

2.4 To measure the importance of input variables for making predictions of outputs:

 

2.4.1    Click the Measure Importance of Inputs tab; browse for the input file by clicking the Choose button; browse for the output file by clicking the Choose button; click the Training Method tab until you find 訪evenberg-Marquardt; set the Runs/combination slide to 15; set Mode to 1; set the button beside the Mode button to 訧n; click the Test Inputs button; the program will request a file name; type in 訪eave-one-in.txt; Watch the NN brain pulsate (it is thinking yet again!).

 

2.4.2    When the NN brain stops pulsating a file called 訪eave-one-in.txt will appear on your Desktop.  Open the file in MS Excel or any word processor.  Three columns with the following headings will appear: Order of Input Variables, SBC score, and AICc score.  The importance of input variables to predict the output variable is determined by the lowest SBC and AICc scores.

 

2.4.3    To determine the predictive importance of combinations of input variables repeat tasks 2.4.1 but set Mode to 2 or more.

 

2.5  To predict outputs using input variables (trained neural network required):

 

2.5.1    Assemble a new input file (tab delimited) ensuring that each column has a heading and the order of the columns is identical to that used to train the data.  Note extra spaces (in the wrong place) can sometimes cause problems.  Click the Run the Neural Net button; browse for the NN.ser file that was created when you trained the neural network; browse for the input file on your desktop by clicking the Choose button; browse for the new input file on your desktop by clicking the Choose button; click the Run Network button; the program will request a file name; type in 設utput_Predictions.txt; Watch the NN brain pulsate.

 

2.5.2    When the NN brain stops pulsating a file called 設utput_Predictions.txt will appear on your Desktop.  Open the file in MS Excel or a word processor.  The predictions for the input file will appear in the first column. 


3. Introduction

 

Neuroet is a 詓imple artificial neural network (NN) package designed to assist scientists in determining relationships among variables in complex data sets.  We use the word 詓imple to describe Neuroet because the package was purposefully designed to be straightforward and uncomplicated.  Scientists, who are familiar with neural network principles, should be able to rapidly analyze their data by following the Hot Start procedures. 

 

Neuroet has some important features that are not typically found in most commercially available NN software.  For example, Neuroet can automatically determine the optimal number of hidden neurons needed to train a NN.  By knowing the optimal number of hidden neurons, one can prevent under- or over-fitting of the data that often leads to incorrect predictions and/or poor results.  Benchmark testing of Neuroet with other NNs (see Section 9) demonstrated that our approach to optimize the number of hidden neurons was highly accurate and consistent.  An important feature of Neuroet is that it can determine the predictive importance of combinations of input variables within a complex data set.  By doing so, scientists can recognize relevant relationships among variables that would be difficult to recognize using conventional linear-based statistical approaches (e.g., principle component analysis).  We demonstrate the procedure for 'measuring the predictive importance of input variables' using example data sets and provide detailed interpretations of the results.  Another useful feature of Neuroet is the weight and biases files that are automatically generated after training a NN.  These files can be used to build equations relating input variables to output variables.  This document explains how to build equations by walking the user through an example.  Lastly, this document provides information on the 'best' approaches to analyze data using NNs.  Given the many choices one has to choose from, it is often difficult to decide the 'best' settings and approaches to analyze data using a NN.  We provide example data sets and interpret the results produced by Neuroet with the goal of guiding scientists through the process of setting up the preference file, loading and submitting data files, and correctly interpreting the results.

 

The Neuroet documentation was written for individuals who have some familiarity with basic statistics - and who might or might not be familiar with NNs.  We have also written this document with an eye for scientists who are curious about NN approaches and are interested in learning what NN can do and how they do it.  Since much of the NN literature is oriented towards computer and bioinformatic scientists, we have minimized the use of jargon, preferring instead to present Neuroet under the rubric of 詌earning by examples.  We have provided data files that can be downloaded from a FTP site so that scientists can perform the same experiments.  In the end, we hope that scientists will use Neuroet for their own research.

 

Neuroet can be downloaded at http://noble.ce.washington.edu under 訢ata Analysis Tools For Download.  The program is free for academic and non-profit organizations.  Neuroet can be purchased through the University of Washington (Digital Ventures, Box 354990, 4311 11th Ave NE, Suite 500, Seattle WA 98105-4608, Web site: http://depts.washington.edu/ventures).  Downloadable files to demonstrate Neuroet capabilities are available at ftp://neuroet@128.95.45.41.  The password is network.

 

3.1 Fundamentals of artificial neural networks.

 

Artificial neural networks (NNs) are computer software programs that consist of networks of neurons.  Neurons receive information from inputs or other neurons, make independent computations, and pass on their outputs to other neurons in the network.  A neuron is typically composed of weights (which receive information), a bias term, and transfer function.  The value of an output from a single neuron can be calculated by summing the product of the each weight and input, adding the bias term to the final sum, and then computing the output by putting the final sum into a transfer function (e.g., Log-Sigmoid function, 1/1+exp(-sum)). 

 

Calculating the output of a single neuron.  Consider a single neuron that receives two inputs (input #1 has a value of 0.3 and input #2 has a value of -0.2).  The neuron has two weights (wgt #1 is connected to input #1 and has a value of 0.7, wgt #2 is connected to input #2 and has a value of -0.8), a bias term (bias has a value of -0.5), and a transfer function (Log-Sigmoid).  The equation defining the value of the output for the single neuron is:

 

Output = Log-Sigmoid ((input #1 x wgt#1) + (input #2 x wgt#2)+bias)

 

Incorporating values of the inputs, weights, and bias into the equation, we obtain an output value of 0.39:

 

Output = 1/(1+exp(-1 x (((0.3 x 0.7)+(0.2 x -0.8))+(-0.5)))

 

When NNs are trained, the error between outputs and target values is calculated, and then the values of weights and biases are adjusted by an error function.  By adjusting the weights and biases, and recalculating the output values (forward propagation), the error between output values and target values is minimized.  The process of adjusting the weights and biases is referred to as error back-propagation (Bishop, 1995; Rumelhart et al. 1986).  In theory, error back-propagation (followed by forward propagation) continues until the global error minimum is attained.  In practice, NNs sometimes get 'stuck' in local error minima.  A variety of different training methods can be used to ensure that NNs reach the global error minimum.  The adjusted weights can then be used to recognize patterns and to provide information on the functional relations between the inputs and outputs. 

 

Calculating the error and adjusting the weights and biases.  If the target in the previous example has a value of 0.50, the error between the target value and the output would then be equal to 0.11 (i.e., 0.5 - 0.39 = 0.11).  If we adjust wgt #1 to 0.8 and wgt #2 to -0.7, the value of the output will then be equal to 0.40.  By adjusting the weights, the prediction of the output has improved by a value of 0.1. 

 

3.1.1 Training methods.  Training methods such as Conjugate Gradient or Levenberg-Marquardt are sophisticated algorithms that calculate the best way to adjust weights and biases so that the global error between outputs and the targets is minimized after several iterations.  In our simple example (above), the NN learned that the desired output was greater than 0.39.  Continued readjustment of the weights and biases will reduce the error between the output and the target, and the NN will eventually learn that the target value is about 0.50.  The obvious problem with this approach (readjusting the weights and bias until it learns the target) is that the NN will eventually memorize the target data. NNs can be prevented from memorizing the data by stopping training early.  There are a variety of methods to do this and the reader is encouraged to read the following articles and books for additional information: Basheer and Hajmeer, (2000), Bishop (1995), Principe et al. (2000),  Hagan et al. (1996), and Reed and Marks (1999). 

 

Neuroet stops training when there is no change in the error over a specified period of time or, after a specified number of iterations.  We can determine if Neuroet has memorized the data by comparing the predictions using portions of the data not used for training.  Fixed portions of the data set are randomly picked and set aside for testing and validating the NN.  If the R-squared values between predicted and actual output values are similar for training, testing and validation data sets than the NN has not memorized the data.

 

3.1.2 Architecture of NNs.  The architecture of NNs is determined, in part, by the number of input and output variables.  Neurons that receive input variables from a data set are referred to as hidden neurons.  Neurons that receive outputs from hidden neurons are referred to as output neurons.  In this version of Neuroet, the number of output neurons (and variables) is limited to one, the number of hidden neurons is limited to 100, and the number of input variables is limited to 256.  Given that the number of input variables is determined by the data set being trained, and the output is limited to one, users must specify the number of hidden neurons to define the final architecture of a NN.  The number of hidden neurons can be determined by using optimization procedures (Sections 6.1 and 9.2).  It is essential that the number of hidden neurons be optimized in order to prevent under- or over-fitting of the data that often leads to incorrect predictions and/or poor results. 

 

3.1.3 Other relevant information.  Information on the fundamentals of scaling, transfer functions, and training methods are found in Section 7.0.  Experiments aimed at testing Neuroet on the effects of increased number of hidden neurons, the effects of different training methods, and the effects of data complexity on NN prediction are found in Section 8.0.  Benchmark testing of Neuroet is presented in Section 9.0.

 


4. Assembling the data for neural network analyses. 

 

Figure 1 shows the orientation of the data files relative to Neuroet.  Neuroet requires two data files: one file containing the input variables that will be used to predict output variables and the other containing the corresponding output variables.  File x contains many rows of tab-delimited columns of data with each column having a separate heading (e.g. Input#1) and each row comprising a single record.  While file x serves as the input file for Neuroet, file y serves as the file containing output data that the NN will use to learn the patterns.  Note the file y also contains a heading for the column and that each record in file x has a corresponding record in file y.  Neuroet analyzes the data one record at a time so you will receive an error message if a record is missing in either file.

 

Hint. Extra spaces or returns at the bottom of either file x or y will result in an error message.  Ensure that both files have exactly the same number of records by viewing them using a text file editor or word processing application.

 

 

Figure 1. Neuroet establishes the relationship of input data in file x to output in file y.

 

5. Pre-analyzing your data using Neuroet. 

 

Prior to training your data using the NN, it is useful to conduct a preliminary analysis of the data to examine the frequency distribution of variables and determine if there are significant relationships among the variables.  The overall distribution of variables and correlations among variables might effect how Neuroet recognizes patterns in the data.  For example, data that is not normally distributed (e.g. skewed) might have to be transformed to improve Neuroet誷 ability to find patterns.  Similarly, identifying variables that are strongly correlated to one another might warrant removal of one variable since both might bias the analysis. 

 

Skewness and kurtosis measure the nature and amount of departure from normality.  Skewness refers to the asymmetry of the distribution.  For example, a skew distribution means that one tail of the distribution is more drawn out than the other.  In such cases, the calculated average and median will not coincide (Sokal and Rohlf, 1981).  The Skewness report reveals the skewness of each column with a negative value indicating that the distribution of the data is skewed to the left and a positive value indicating that the distribution is skewed to the right.  Kurtosis refers to the 襭eakedness of the distribution.  There are two types of kurtosis: platykurtosis and leptokurtosis.  Platykurtosis occurs when few data points occur at the mean and at the tails (e.g. it resembles a water drop on wax paper).  Leptokurtosis occurs when more data points occur at the mean and at the tails.  The Kurtosis report reveals the kurtosis of each column with negative values indicating platykurtosis and positive values indicating leptokurtosis. 

 

We have provided a downloadable file (e.g., data1.txt) to demonstrate the pre-analyzing procedure.  To start the analysis, click the Pre-Analyze Data (command-D) in the drop down menu (Under File) and an interactive panel will appear (Figure 2).  Browse for the example file by clicking the Choose button.  Link the file to Neuroet by clicking the Open button, then clicking the Run button.  Neuroet will request a file name for saving the results.  Either provide a file name or use the default name.  Click the Save button.  When the analysis is completed, you will receive a Notification of Completion page that states 訟ll jobs are finished.  Click the OK button.

 

 

Figure 2. Pre-analyze panel.

 

The results of the analysis are shown in Table 1.  All variables in Table 1 are skewed to the right and show leptokurtosis, which will have implications for selecting the appropriate scaling method to train the NN.  For example, scaling the data to have a mean of zero and a variance of one might help training NNs to recognize patterns in the data.  Note that none of the variables were significantly correlated to one another, indicating that it is not necessary to remove variables from the data set.

 


 

 

 


6. Determining relationships between input and output variables. 

 

Neuroet provides simple analytical tools to (i) make predictions, (ii) determine equations between input and output variables, and (iii) determine the predictive importance of input variables (Figure 3).  In cases i and ii, it is necessary to optimize the number of hidden neurons before training; otherwise, the NN might not be able to find patterns in the data.  Too many hidden neurons affect NN performance by over-fitting the data, resulting in poor predictions for test and validation data sets.  Similarly, too few hidden neurons affect NN performance by under-fitting the data.

 

 

Figure 3.  Neuroet provides (i) information on the predictability of training, test, and validation data sets, and (ii) weights and biases, so that scientists can build equations defining the relationships between input and output variables.  Neuroet can also identify the input variable, or combinations of input variables, that significantly contribute(s) to predicting the value of output variables.

 

6.1 Optimizing the number of hidden neurons.  The optimal number of hidden neurons depends on: (i) the number of input and output variables, (ii) the number of training records, (iii) the amount of noise in the output variables, (iv) the complexity of the relationship between input and output variables, and (v) the type of transfer functions.  Since there is no 詒ule of thumb to determine the optimal number of hidden neurons (Sarle, 1999), we approached this problem by repeatedly training NNs (15 x) with fixed numbers of hidden neurons and identifying the optimal number of hidden neurons from those that yielded the lowest generalization estimator score.  We tested our approach using the Donoho-Johnston Benchmarks for NNs (Sarle, 1999, Donoho and Johnston, 1994; Donoho et al. 1995) and discovered that our approach yielded very consistent results (see Section 9, Benchmark testing of Neuroet and 詁est approaches).

 

Two generalization estimators were used:  Schwarz誷 Bayesian criterion (SBC), and corrected Akaike誷 Information Criterion (AICc).  These estimators were calculated by determining the sum of squares errors (SSE) for each of the 15 NNs, discarding NNs that had SSE lower than the 25tth percentile (rounded up), calculating the estimators using the remaining NNs, and taking the median value.  NNs with low SSE were discarded because we assumed that the NN did not reach the global error minimum.  These calculations were conducted for NNs with different number of hidden neurons.  The median generalization estimator score were ranked from lowest to highest and NNs with the lowest score having the optimal number of hidden neurons.

 


The following equations were used to calculate the SBC (Schwarz, 1978) and AICc score (Hurvich and Tsai, 1989):

 

SBC = (n) log (SSE/n) + (p) log (n)

 

AICc = (n) log (SSE/n) + (n + p) / [1 – (p + 2) / n]

 

where n, represents the number of training cases, and p, represents the number of weights and biases.

 

We have provided two files (e.g. data2x.txt and data2y.txt) to demonstrate how to optimize the number of hidden neurons in a NN.  To start the analysis, you must first set the preference panel (Figure 4).  This can be accomplished by clicking the Preferences tab in the main menu (command-;).  When the preference panel is visible, set the Input Scaling Method tab to: 許tandardized =o s=1, set the Transfer Function tab to Hyperbolic Tangent; check Include Bias Term, set Number of Hidden Neurons to 13; set Target Scaling Method tab to: 許tandardized =0 s=1, set the Transfer Function tab to Pure Linear; check the Include Bias Term, set Data Set Proportions in Test and Validation data sets to 10%, click the OK button.

 

Once the preferences have been selected, return to the main panel, click the Optimize Hidden Neurons tab and an interactive panel will appear (Figure 5).  Browse for the inputs file (i.e. data2x.txt) by clicking the Choose button.  Link the file to Neuroet by clicking the Open button.  Browse for the outputs file (i.e. data2y.txt) by clicking the Choose button.  Link the file to Neuroet by clicking the Open button.  Click the Training Method tab until you find 訪evenberg-Marquardt.  Set the maximum number of hidden neurons to the number of variables in data2x.txt (e.g., 13).  Set the Optimization Method to SBC.  Click the Optimize button.  Neuroet will request a file name to save the results.  Either provide a new file name or use the default file name.  Click the Save button.  The Neuroet brain will start pulsating, indicating that it is working on the job.  Figure 6 shows the job in the Task Queue.  The progress of the job can be monitored by double-clicking the job in the Task Queue (Figure 7).

 

The optimum number of hidden neurons was determined to be 4 since NNs trained with 4 hidden neurons yielded the lowest generalized estimator scores (Table 2).  Note that this experiment was repeated using the same data set, we discovered that 5 hidden neurons yielded the lowest generalized estimator scores.  Subtle differences in the optimal number of hidden neuron that are estimated by Neuroet occur frequently and are discussed in detail in the 許ection 9, Benchmark testing of Neuroet and 詁est approaches section.

 

6.2 Training the NN.  After the optimum number of hidden neurons has been determined, we can train, test, and validate the data. 

 

 

 

 

Figure 4. Preferences panel.

 

To start the analysis, first check the preferences.  This can be accomplished by clicking the Preferences tab in the main menu (command-;).  When the preference panel is visible, set the number of hidden neurons to 4, and then click the OK button.  Set the Data set proportion of both the test and validation data sets to 10%.  Ensure that all the other preference values are the same as that used to optimize the number of hidden neurons.  Click the Train Network tab. Browse for the Inputs File (i.e. data2x.txt) by clicking the Choose button.  Link the file to Neuroet by clicking the Open button.  Browse for the outputs file (i.e. data2y.txt) by clicking the Choose button.  Link the file to Neuroet by clicking the Open button.  Click the Training Method tab until you find 訪evenberg-Marquardt. Set the Threshold to 1.0.  Click the Keep Best Generalizations button.  Click the Train Network button.  Neuroet will request a file name to save the results.  Either provide a new file name or use the default file name.  Click the Save button.  The Neuroet brain will start pulsating, indicating that it is working on the job.

 

 

Figure 5. Panel to optimize the hidden neurons.

 

Ten file folders and three files will be produced (Figure 8).  File dataOut.txt contains the raw data files, and the scaled data for training, testing, and validation data.  The files data2Y.txt and data2X.txt contain the scaled data that are formatted for sensitivity analysis (see http://noble.ce.washington.edu, Tools for Data Analysis', Sensitivity analysis).  Folders 1 to 10 represent individual trained NN.

 

 

Figure 6. The Task Queue.

 

 

Figure 7. Panel from task queue showing Neuroet is working on run 7 of hidden neuron number 1.

 

 

 


 


 

Figure 8. Results from training a NN.  Double-clicking Run1 opens Figure 9.

 

 

Figure 9.  Each folder contains 11 subfolders that represent the trained NN at different iterations before training was stopped.  The Results.txt contains a summary of the R-squared values of actual versus predicted outputs for training, testing and validation data sets.  NN #1 represents the state of the NN at eleven iterations before the training was stopped while Final Run represents the state of the NN when training stopped.  Within each subfolder exits four files, weights.txt, prediction.txt, biases.txt and NN.ser.  The weights.txt and biases.txt files are used to build the equation defining the relationship between input variables and output variables (see Section 6.5).  The prediction.txt file contains the predicted and actual outputs.  The NN.ser is a snap shot of the NN and can be used to predict output values using new data.

 

Summary of R-squared values for iterations that occurred just before training stopped (Table 3).  The results of the iterations were ranked by training R-squared value.  For this data set, training data (e.g. 0.86) provided better predictions for output variables than testing (e.g. 0.78) and validation (e.g. 0.77) data sets.  Note the values of training, testing and validation data sets changed with different NNs because the weights and biases were set to random numbers in the first iteration.  The 訪ast training results indicate the R-squared value of the training file when training was stopped.

 


 

 


The predictions.txt file contains the predicted and actual value for each record in the data set.  Figure 10 provides the relationship between predicted and actual values using data from the last iteration.  Note that the NN explained approximately 85% of the variability in the data. 

 

Figure 10.  Predicted and actual values of a trained NN (data from the final run of Run 1).

 

6.2 Testing the NN.  Once the NN has been trained, it is possible to predict the value of outputs using data sets that were not used in training.  We have provided two files (e.g. data3x.txt and data3y.txt) to demonstrate how to test trained NNs.  File data3x.txt contains input data.  Note that all columns are in the same order as the training file.  File data3y.txt contains the actual values of the output.  This file can be used to assess the accuracy of NN predictions.

 

To start the analysis, check that the preferences have been set correctly.  This can be accompl