CrossValidation

java.lang.Object
- com.emphysic.myriad.core.ml.CrossValidation

Direct Known Subclasses:

MonteCarloCV
```
public abstract class CrossValidation
extends java.lang.Object
```
CrossValidation - splits data into train/test and evaluates a machine learning model.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`CrossValidation.Data` Partition of input data
`static class`	`CrossValidation.TrainTestSubsets` Helper class to partition input data

Field Summary

Fields
Modifier and Type Field and Description

protected java.util.Random random
RNG

protected double[][] X
Original samples

protected int[] y
Original sample labels

Fields
Modifier and Type	Field and Description
`protected java.util.Random`	`random` RNG
`protected double[][]`	`X` Original samples
`protected int[]`	`y` Original sample labels

Constructor Summary

Constructors
Constructor and Description
`CrossValidation()` Creates a new empty cross-validator.
`CrossValidation(CrossValidation.Data data)` Create a new cross validator
`CrossValidation(double[][] samples, int[] labels)` Create a new cross validator.

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`static CrossValidation.Data`	`balanceUp(CrossValidation.Data orig)` Balances imbalanced data with the SMOTE (Synthetic Minority Over-sampling TEchnique) as per https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html .
`static CrossValidation.Data`	`balanceUp(CrossValidation.Data orig, smile.math.distance.Distance<double[]> distanceMetric, int nn)` Balances imbalanced data with the SMOTE (Synthetic Minority Over-sampling TEchnique) as per https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html .
`static CrossValidation.Data`	`balanceUp(CrossValidation.Data orig, int nn)` Balances imbalanced data with the SMOTE (Synthetic Minority Over-sampling TEchnique) as per https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html .
`java.util.Map<MLROIFinder,java.lang.Double>`	`eval(int numRounds, MLROIFinder[] models)` Evaluates the specified models for mean accuracy (ratio of true results / total results).
`java.util.Map<MLROIFinder,java.lang.Double>`	`eval(int numRounds, MLROIFinder[] models, boolean upSample)` Evaluates the specified models for mean accuracy (ratio of true results / total results).
`java.util.Map<MLROIFinder,java.util.List<java.lang.Double>>`	`evalModels(int numRounds, MLROIFinder[] models)` Evaluates the specified models for accuracy i.e.
`java.util.Map<MLROIFinder,java.util.List<java.lang.Double>>`	`evalModels(int numRounds, MLROIFinder[] models, boolean upSample)` Evaluates the specified models for accuracy i.e.
`java.util.Map.Entry<MLROIFinder,java.lang.Double>`	`findBestModel(int numRounds, MLROIFinder[] models)` Evaluates the specified models and returns the most accurate (ratio of true results / total results) model.
`java.util.Map.Entry<MLROIFinder,java.lang.Double>`	`findBestModel(int numRounds, MLROIFinder[] models, boolean upSample)` Evaluates the specified models and returns the most accurate (ratio of true results / total results) model.
`static java.lang.Integer`	`findMajorityLabel(CrossValidation.Data data)` Finds the majority label in a sample set, i.e.
`static java.lang.Integer`	`findMajorityLabel(java.util.Map<java.lang.Integer,java.lang.Integer> labelFreqs)` Finds the majority label in a sample set, i.e.
`static java.lang.Integer`	`findMinorityLabel(CrossValidation.Data data)` Finds the minority label in a sample set, i.e.
`static java.lang.Integer`	`findMinorityLabel(java.util.Map<java.lang.Integer,java.lang.Integer> labelFreqs)` Finds the minority label in a sample set, i.e.
`abstract CrossValidation.TrainTestSubsets`	`getTrainTestSubset()` Partition the input data into a training and a test set
`static java.util.Map<java.lang.Integer,java.lang.Integer>`	`labelFrequencies(CrossValidation.Data data)` Tallies the frequencies of each label in a dataset.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - random
```
protected java.util.Random random
```
    RNG
  - X
```
protected double[][] X
```
    Original samples
  - y
```
protected int[] y
```
    Original sample labels
- Constructor Detail
  - CrossValidation
```
public CrossValidation(double[][] samples,
                       int[] labels)
```
    Create a new cross validator.
    
    Parameters:
    
    samples - N samples of M features
    
    labels - N labels
  - CrossValidation
```
public CrossValidation(CrossValidation.Data data)
```
    Create a new cross validator
    
    Parameters:
    
    data - samples and labels to use
  - CrossValidation
```
public CrossValidation()
```
    Creates a new empty cross-validator.
- Method Detail
  - getTrainTestSubset
```
public abstract CrossValidation.TrainTestSubsets getTrainTestSubset()
```
    Partition the input data into a training and a test set
    
    Returns:
    
    input data partitioned into a training and a test set
  - labelFrequencies
```
public static java.util.Map<java.lang.Integer,java.lang.Integer> labelFrequencies(CrossValidation.Data data)
```
    Tallies the frequencies of each label in a dataset.
    
    Parameters:
    
    data - data to tally
    
    Returns:
    
    Map where keys are each label encountered and values are total number of samples in data observed with that label.
  - findMinorityLabel
```
public static java.lang.Integer findMinorityLabel(java.util.Map<java.lang.Integer,java.lang.Integer> labelFreqs)
```
    Finds the minority label in a sample set, i.e. the label for which the fewest observations are found.
    
    Parameters:
    
    labelFreqs - label frequencies map
    
    Returns:
    
    label with fewest number of samples
  - findMinorityLabel
```
public static java.lang.Integer findMinorityLabel(CrossValidation.Data data)
```
    Finds the minority label in a sample set, i.e. the label for which the fewest observations are found.
    
    Parameters:
    
    data - data to examine
    
    Returns:
    
    label with fewest number of samples
  - findMajorityLabel
```
public static java.lang.Integer findMajorityLabel(java.util.Map<java.lang.Integer,java.lang.Integer> labelFreqs)
```
    Finds the majority label in a sample set, i.e. the label for which the most observations are found.
    
    Parameters:
    
    labelFreqs - label frequencies map
    
    Returns:
    
    label with the most number of samples
  - findMajorityLabel
```
public static java.lang.Integer findMajorityLabel(CrossValidation.Data data)
```
    Finds the majority label in a sample set, i.e. the label for which the most observations are found.
    
    Parameters:
    
    data - data to examine
    
    Returns:
    
    label for which the most samples in data were found
  - balanceUp
```
public static CrossValidation.Data balanceUp(CrossValidation.Data orig,
                                             smile.math.distance.Distance<double[]> distanceMetric,
                                             int nn)
```
    Balances imbalanced data with the SMOTE (Synthetic Minority Over-sampling TEchnique) as per https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html . The data are examined and the majority and minority labels are identified as the label found for the most and fewest samples respectively. The ratio of these sample subsets determines how many synthetic samples are generated.
    
    Parameters:
    
    orig - Original (possibly) imbalanced data
    
    distanceMetric - distance metric to use for finding nearest neighbors
    
    nn - number of nearest neighbors to find
    
    Returns:
    
    new data consisting of both the original and synthetic data
  - balanceUp
```
public static CrossValidation.Data balanceUp(CrossValidation.Data orig,
                                             int nn)
```
    Balances imbalanced data with the SMOTE (Synthetic Minority Over-sampling TEchnique) as per https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html . The data are examined and the majority and minority labels are identified as the label found for the most and fewest samples respectively. The ratio of these sample subsets determines how many synthetic samples are generated. Neighbors are determined with simple linear (Euclidean) distance.
    
    Parameters:
    
    orig - data to balance
    
    nn - number of nearest neighbors to use
    
    Returns:
    
    new data consisting of both the original and synthetic data
  - balanceUp
```
public static CrossValidation.Data balanceUp(CrossValidation.Data orig)
```
    Balances imbalanced data with the SMOTE (Synthetic Minority Over-sampling TEchnique) as per https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html . The data are examined and the majority and minority labels are identified as the label found for the most and fewest samples respectively. The ratio of these sample subsets determines how many synthetic samples are generated. Neighbors are determined with simple linear (Euclidean) distance, and as per the original paper five nearest neighbors are chosen.
    
    Parameters:
    
    orig - data to balance
    
    Returns:
    
    new data consisting of both the original and synthetic data
  - evalModels
```
public java.util.Map<MLROIFinder,java.util.List<java.lang.Double>> evalModels(int numRounds,
                                                                              MLROIFinder[] models,
                                                                              boolean upSample)
                                                                       throws java.lang.Exception
```
    Evaluates the specified models for accuracy i.e. the proportion of true results (both true positives and true negatives) in the overall results.
    
    Parameters:
    
    numRounds - number of rounds of testing
    
    models - models to test
    
    upSample - if true, upsample training data to try to balance minority / majority classes
    
    Returns:
    
    the accuracy measurements for each model for each of the rounds of testing
    
    Throws:
    
    java.lang.Exception - if an error occurs training the models
  - evalModels
```
public java.util.Map<MLROIFinder,java.util.List<java.lang.Double>> evalModels(int numRounds,
                                                                              MLROIFinder[] models)
                                                                       throws java.lang.Exception
```
    Evaluates the specified models for accuracy i.e. the proportion of true results (both true positives and true negatives) in the overall results.
    
    Parameters:
    
    numRounds - number of rounds of testing
    
    models - models to test
    
    Returns:
    
    the accuracy measurements for each model for each of the rounds of testing
    
    Throws:
    
    java.lang.Exception - if an error occurs training the models
  - eval
```
public java.util.Map<MLROIFinder,java.lang.Double> eval(int numRounds,
                                                        MLROIFinder[] models,
                                                        boolean upSample)
                                                 throws java.lang.Exception
```
    Evaluates the specified models for mean accuracy (ratio of true results / total results).
    
    Parameters:
    
    numRounds - number of rounds of testing
    
    models - models to test
    
    upSample - if true, upsample training data to try to balance minority / majority classes
    
    Returns:
    
    average(mean) accuracy for each model
    
    Throws:
    
    java.lang.Exception - if an error occurs
  - eval
```
public java.util.Map<MLROIFinder,java.lang.Double> eval(int numRounds,
                                                        MLROIFinder[] models)
                                                 throws java.lang.Exception
```
    Evaluates the specified models for mean accuracy (ratio of true results / total results).
    
    Parameters:
    
    numRounds - number of rounds of testing
    
    models - models to test
    
    Returns:
    
    average(mean) accuracy for each model
    
    Throws:
    
    java.lang.Exception - if an error occurs
  - findBestModel
```
public java.util.Map.Entry<MLROIFinder,java.lang.Double> findBestModel(int numRounds,
                                                                       MLROIFinder[] models,
                                                                       boolean upSample)
                                                                throws java.lang.Exception
```
    Evaluates the specified models and returns the most accurate (ratio of true results / total results) model.
    
    Parameters:
    
    numRounds - number of rounds of testing
    
    models - models to test
    
    upSample - if true, upsample training data to try to balance minority / majority classes
    
    Returns:
    
    the most accurate model found and its mean accuracy score.
    
    Throws:
    
    java.lang.Exception - if an error occurs
  - findBestModel
```
public java.util.Map.Entry<MLROIFinder,java.lang.Double> findBestModel(int numRounds,
                                                                       MLROIFinder[] models)
                                                                throws java.lang.Exception
```
    Evaluates the specified models and returns the most accurate (ratio of true results / total results) model.
    
    Parameters:
    
    numRounds - number of rounds of testing
    
    models - models to test
    
    Returns:
    
    the most accurate model found and its mean accuracy score.
    
    Throws:
    
    java.lang.Exception - if an error occurs

Class CrossValidation

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

random

X

y

Constructor Detail

CrossValidation

CrossValidation

CrossValidation

Method Detail

getTrainTestSubset

labelFrequencies

findMinorityLabel

findMinorityLabel

findMajorityLabel

findMajorityLabel

balanceUp

balanceUp

balanceUp

evalModels

evalModels

eval

eval

findBestModel

findBestModel