Training Algorithms
Training algorithms are used to select the optimal Parameters of a model.
Copyright 2017-2024 SensiML Corporation
This file is part of SensiML™ Piccolo AI™.
SensiML Piccolo AI is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
SensiML Piccolo AI is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with SensiML Piccolo AI. If not, see <https://www.gnu.org/licenses/>.
-
Hierarchical Clustering with Neuron Optimization
Hierarchical Clustering with Neuron Optimization takes as input feature vectors, corresponding class labels, and desired number of patterns, and outputs a model.
Each pattern in a model consists of a centroid, its class label, and its area of influence (AIF). Each centroid is calculated as an average of objects in the cluster, each class label is the label of the majority class, and each AIF is the distance between the centroid and the farthest object in that cluster.
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
number_of_neurons (int) – the maximum number of output clusters (neurons) desired
linkage_method (str) – options are average, complete, ward, and single (default is average)
centroid_calculation (str) – options are robust, mean, and median (default is robust)
flip (int) – default is 1
cluster_method (str) – options are DLCH, DHC, and kmeans (default is DLHC)
aif_method (str) – options are min, max, robust, mean, median (default is max)
singleton_aif (int) – default is 0
min_number_of_dominant_vector (int) – It is used for pruning. It defines min. number of vector for dominant class in the cluster.
max_number_of_weak_vector (int) – It is used for pruning. It defines max. number of vector for weak class in the cluster.
- Returns
one or more models
-
Load Model PME
Load Neuron Array takes an input of feature vectors, corresponding class labels, and a neuron array to use for classification. The neuron array is loaded and classification is performed.
Note: This training algorithm does not perform optimizations on the provided neurons.
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
neuron_array (list) – A list of neurons to load into the hardware simulator.
class_map (dict) – class map for converting labels to neuron categories.
- Returns
a set of models
-
Load Model TensorFlow Lite for Microcontrollers
Provides the ability to upload a TensorFlow Lite flatbuffer to use as the final classifier step in a pipeline.
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
model_parameters (int) – The flatbuffer object of your TensorFlow micro model
class_map (dict) – class map for converting labels to output
estimator_type (str) – defines if this estimator performs regression or classification, defaults to classification
threshold (float) – if no values are greater than the threshold, classify as Unknown
train_history (dict) – training history for this model
model_json (dict) – expects the model json file from the tensorflow api tf_model.to_json()
Example
SensiML provides the ability to train and bring your own NN architecture to use as the classifier for your pipeline. This example starts from the point where you have created features using the SensiML Toolkit
>>> x_train, x_test, x_validate, y_train, y_test, y_validate, class_map = >>> client.pipeline.features_to_tensor(fv_t, test=0.0, validate=.1)
Tensorflow Lite Micro only supports a subset of the full tensorflow functions. For a full list of available functions see the all_ops_resolver.cc. Use the Keras tensorflow API to create the NN graph.
>>> from tensorflow.keras import layers >>> import tensorflow as tf >>> tf_model = tf.keras.Sequential() >>> tf_model.add(layers.Dense(12, activation='relu',kernel_regularizer='l1', input_shape=(x_train.shape[1],))) >>> tf_model.add(layers.Dropout(0.1)) >>> tf_model.add(layers.Dense(8, activation='relu', input_shape=(x_train.shape[1],))) >>> tf_model.add(layers.Dropout(0.1)) >>> tf_model.add(layers.Dense(y_train.shape[1], activation='softmax')) >>> tf_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) >>> tf_model.summary() >>> train_history = {'loss':[], 'val_loss':[], 'accuracy':[], 'val_accuracy':[]}
Train the Tensorflow Model
>>> epochs=100 >>> batch_size=32 >>> data = tf.data.Dataset.from_tensor_slices((x_train, y_train)) >>> shuffle_ds = data.shuffle(buffer_size=x_train.shape[0], reshuffle_each_iteration=True).batch(batch_size) >>> history = tf_model.fit( shuffle_ds, epochs=epochs, batch_size=batch_size, validation_data=(x_validate, y_validate), verbose=0) >>> for key in train_history: >>> train_history[key].extend(history.history[key]) >>> import sensiml.tensorflow.utils as sml_tf >>> sml_tf.plot_training_results(tf_model, train_history, x_train, y_train, x_validate, y_validate)
Qunatize the Tensorflow Model
The
`representative_dataset_generator()`
function is necessary to provide statistical information about your dataset in order to quantize the model weights appropriatley.The TFLiteConverter is used to convert a tensorflow model into a TensorFlow Lite model. The TensorFlow Lite model is stored as a flatbuffer which allows us to easily store and access it on embedded systems.
Quantizing the model allows TensorFlow Lite micro to take advantage of specialized instructions on cortex-M class processors using the cmsis-nn DSP library which gives another huge boost in performance.
Quantizing the model can reduce size by up to 4x as 4 byte floats are converted to 1 byte ints in a number of places within the model.
>>> import numpy as np >>> def representative_dataset_generator(): >>> for value in x_validate: >>> # Each scalar value must be inside of a 2D array that is wrapped in a list >>> yield [np.array(value, dtype=np.float32, ndmin=2)] >>> >>> converter = tf.lite.TFLiteConverter.from_keras_model(tf_model) >>> converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE] >>> converter.representative_dataset = representative_dataset_generator >>> tflite_model_quant = converter.convert()
Uploading Trained TF Lite model to SensiML
>>> class_map_tmp = {k:v+1 for k,v in class_map.items()} #increment by 1 as 0 corresponds to unknown >>> client.pipeline.set_training_algorithm("Load Model TensorFlow Lite for Microcontrollers", >>> params={"model_parameters": { >>> 'tflite': sml_tf.convert_tf_lite(tflite_model_quant)}, >>> "class_map": class_map_tmp, >>> "estimator_type": "classification", >>> "threshold": 0.0, >>> "train_history":train_history, >>> "model_json": tf_model.to_json() >>> }) >>> client.pipeline.set_validation_method("Recall", params={}) >>> client.pipeline.set_classifier("TensorFlow Lite for Microcontrollers", params={}) >>> client.pipeline.set_tvo() >>> results, stats = client.pipeline.execute() >>> >>> results.summarize()
-
Neuron Optimization
Neuron Optimization performs an optimized grid search over KNN/RBF and the number of neurons for the parameters of Hierarchical Clustering with Neuron Optimization. Takes as input feature vectors, corresponding class labels and outputs a model.
Each pattern in a model consists of a centroid, its class label, and its area of influence (AIF). Each centroid is calculated as an average of objects in the cluster, each class label is the label of the majority class, and each AIF is the distance between the centroid and the farthest object in that cluster.
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
neuron_range (list) – the range of max neurons spaces to search over specified as [Min, Max]
linkage_method (str) – options are average, complete, ward, and single (default is average)
centroid_calculation (str) – options are robust, mean, and median (default is robust)
flip (int) – default is 1
cluster_method (str) – options are DLCH, DHC, and kmeans (default is DLHC)
aif_method (str) – options are min, max, robust, mean, median (default is max)
singleton_aif (int) – default is 0
min_number_of_dominant_vector (int) – It is used for pruning. It defines min. number of vector for dominant class in the cluster.
max_number_of_weak_vector (int) – It is used for pruning. It defines max. number of vector for weak class in the cluster.
- Returns
one or more models
“description”: “.”}
return None
-
Bonsai Tree Optimizer
Train a Bonsais Tree Classifier using backpropagation.
For detailed information see the ICML 2017 Paper
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
epochs (str) – The number of training epochs to iterate over
batch_size (float) – The size of batches to use during training
learning_rate (float) – The learning rate for training optimization
project_dimensions (int) – The number of dimensions to project the input feature space into
sigma (float) – tunable hyperparameter
reg_W (float) – regularization for W matrix
reg_V (float) – regularization for V matrix
reg_Theta (float) – regularization for Theta matrix
reg_Z (float) – regularization for Z matrix
sparse_V (float) – sparcity factor for V matrix
sparse_Theta (float) – sparcity factor for Theta matrix
sparse_W (float) – sparcity factor for W matrix
sparse_Z (float) – sparcity factor fo Z matrix
- Returns
model parameters for a bonsai tree classifier
-
Train Fully Connected Neural Network
Provides the ability to train a fully connected neural network model to use as the final classifier step in a pipeline.
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
class_map (dict) – optional, class map for converting labels to output
estimator_type (str) – defines if this estimator performs regression or classification, defaults to classification
threshold (float) – if no values are greater than the threshold, classify as Unknown
dense_layers (list) – The size of each dense layer
drop_out (float) – The amount of dropout to use after each dense layer
batch_normalization (bool) – Use batch normalization
final_activation (str) – the final activation to use
iteration (int) – Maximum optimization attempt
batch_size (int) – The batch size to use during training
metrics (str) – the metric to use for reporting results
learning_rate (float) – The learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.
batch_size – Refers to the number of training examples utilized in one iteration.
loss_function (str) – It is a function that determine how far the predicted values deviate from the actual values in the training data.
tensorflow_optimizer (str) – Optimization algorithms that is used to minimize loss function.
Example
SensiML provides the ability to train NN architecture to use as the classifier for your pipeline. Tensorflow Lite Micro only supports a subset of the full tensorflow functions. For a full list of available functions see the all_ops_resolver.cc. Use the Keras tensorflow API to create the NN graph.
>>> client.project = 'Activity_Detection' >>> client.pipeline = 'tf_p1'
>>> client.pipeline.stop_pipeline()
>>> sensors = ['GyroscopeX', 'GyroscopeY', 'GyroscopeZ', 'AccelerometerX', 'AccelerometerY', 'AccelerometerZ']
>>> client.pipeline.reset()
>>> client.pipeline.set_input_query("Q1")
>>> client.pipeline.add_transform("Windowing", params={"window_size":200, "delta":200, "train_delta":0})
>>> client.pipeline.add_feature_generator([ {'name':'MFCC', 'params':{"columns":sensors,"sample_rate":100, "cepstra_count":1}} ])
>>> client.pipeline.add_transform("Min Max Scale")
>>> client.pipeline.set_validation_method("Recall", params={})
>>> client.pipeline.set_training_algorithm("Train Fully Connected Neural Network", params={ "estimator_type":"classification", "class_map": None, "threshold":0.0, "dense_layers": [64,32,16,8], "drop_out": 0.1, "iterations": 5, "learning_rate": 0.01, "batch_size": 64, "loss_function":"categorical_crossentropy", "tensorflow_optimizer":"adam", "batch_normalization": True, "final_activation":"softmax, })
>>> client.pipeline.set_classifier("TensorFlow Lite for Microcontrollers")
>>> client.pipeline.set_tvo({'validation_seed':None})
>>> results, stats = client.pipeline.execute() >>> results.summarize()
-
xGBoost
Train an ensemble of boosted tree classifiers using the xGBoost training algorithm.
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
max_depth (int) – The max depth to allow a decision tree to reach
n_estimators (int) – The number of decision trees to build.
- Returns
a trained model
-
L1 Lasso
Linear Model trained with L1 prior as regularizer (aka the Lasso).
The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
Technically the Lasso model is optimizing the same objective function as the Elastic Net with
l1_ratio=1.0
(no L2 penalty).- See: Scikit Learn linear_model.Lasso training algorithm https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html
for more information
- Parameters
alpha (float, default=1.0) –
Constant that multiplies the L1 term, controlling regularization strength. alpha must be a non-negative float i.e. in [0, inf).
When alpha = 0, the objective is equivalent to ordinary least squares, solved by the
LinearRegression
object. For numerical reasons, using alpha = 0 with the Lasso object is not advised. Instead, you should use theLinearRegression
object.fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
max_iter (int, default=1000) – The maximum number of iterations.
tol (float, default=1e-4) – The tolerance for the optimization: if the updates are smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
positive (bool, default=False) – When set to
True
, forces the coefficients to be positive.random_state (int, RandomState instance, default=None) – The seed of the pseudo random number generator that selects a random feature to update. Used when
selection
== ‘random’. Pass an int for reproducible output across multiple function calls.Returns – Trained linear regression model
-
L2 Ridge
Linear least squares with l2 regularization.
Minimizes the objective function:
||y - Xw||^2_2 + alpha * ||w||^2_2
This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape (n_samples, n_targets)).
Read more at Scikit Learn linear_model.Ridge https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html
- Parameters
alpha (float, default=1.0) –
Constant that multiplies the L2 term, controlling regularization strength. alpha must be a non-negative float i.e. in [0, inf).
When alpha = 0, the objective is equivalent to ordinary least squares, solved by the
LinearRegression
object. For numerical reasons, using alpha = 0 with the Ridge object is not advised. Instead, you should use theLinearRegression
object.fit_intercept (bool, default=True) – Whether to fit the intercept for this model. If set to false, no intercept will be used in calculations (i.e.
X
andy
are expected to be centered).max_iter (int, default=None) – Maximum number of iterations for conjugate gradient solver. For ‘sparse_cg’ and ‘lsqr’ solvers, the default value is determined by scipy.sparse.linalg. For ‘sag’ solver, the default value is 1000. For ‘lbfgs’ solver, the default value is 15000.
tol (float, default=1e-4) –
The precision of the solution (coef_) is determined by tol which specifies a different convergence criterion for each solver:
’svd’: tol has no impact.
’cholesky’: tol has no impact.
’sparse_cg’: norm of residuals smaller than tol.
’lsqr’: tol is set as atol and btol of scipy.sparse.linalg.lsqr, which control the norm of the residual vector in terms of the norms of matrix and coefficients.
’sag’ and ‘saga’: relative change of coef smaller than tol.
’lbfgs’: maximum of the absolute (projected) gradient=max|residuals| smaller than tol.
solver ({'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}, default='auto') –
Solver to use in the computational routines:
’auto’ chooses the solver automatically based on the type of data.
’svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than ‘cholesky’ at the cost of being slower.
’cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution.
’sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).
’lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.
’sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
’lbfgs’ uses L-BFGS-B algorithm implemented in scipy.optimize.minimize. It can be used only when positive is True.
All solvers except ‘svd’ support both dense and sparse data. However, only ‘lsqr’, ‘sag’, ‘sparse_cg’, and ‘lbfgs’ support sparse input when fit_intercept is True.
New in version 0.17: Stochastic Average Gradient descent solver.
New in version 0.19: SAGA solver.
positive (bool, default=False) – When set to
True
, forces the coefficients to be positive. Only ‘lbfgs’ solver is supported in this case.random_state (int, RandomState instance, default=None) – Used when
solver
== ‘sag’ or ‘saga’ to shuffle the data.
-
Ordinary Least Squares
Ordinary least squares Linear Regression.
Fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.
- Parameters
fit_intercept (bool) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). default=True
positive (bool) – When set to True, forces the coefficients to be positive. This option is only supported for dense arrays. default=False
- Returns
Trained linear regression model
-
RBF with Neuron Allocation Limit
The Train and Prune algorithm takes as input feature vectors, corresponding class labels, and maximum desired number of neurons, and outputs a model.
The training vectors are partitioned into subsets (chunks) and presented to the PME classifier which places neurons and determines areas of influence (AIFs). After each subset is learned, the neurons that fired the most on the validation set are retained and the others are removed (pruned) from the model. After a defined number of train and prune cycles, the complete retained set of neurons is then re-learned, which results in larger neuron AIFs. Train/prune/re-learn cycles continue to run on all of the remaining chunks, keeping the total number of neurons within the limit while giving preference to neurons that fire frequently.
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
chunk_size (int) – the number of training vectors in each chunk
inverse_relearn_frequency (int) – the number of chunks to train and prune between each re-learn phase
max_neurons (int) – the maximum allowed number of neurons
aggressive_neuron_creation (bool) – flag for placing neurons even if they are within the influence field of another neuron of the same category (default is False)
- Returns
a model
-
Random Forest
Train an ensemble of decision tree classifiers using the random forest training algorithm. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control overfitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
max_depth (int) – The max depth to allow a decision tree to reach
n_estimators (int) – The number of decision trees to build.
- Returns
a set of models
-
Train Temporal Convolutional Neural Network
Implements a temporal convolutional neural network, consisting of several temporal blocks with various dilations.
A Temporal Convolutional Neural Network (TCN) is designed for sequential data like time series or text. TCNs use “temporal blocks” with varying dilation rates to capture different time scales. Smaller rates focus on short-term patterns, while larger rates capture long-term dependencies. This diversity enables TCNs to model complex temporal relationships effectively, making them useful for tasks like speech recognition and forecasting.
Note: To build a TCN model, pipeline should include a feature cascading block with cascade number larger than 1.
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
class_map (dict) – optional, class map for converting labels to output
estimator_type (str) – defines if this estimator performs regression or classification, defaults to classification
threshold (float) – if no values are greater than the threshold, classify as Unknown
dense_layers (list) – The size of each dense layer
drop_out (float) – The amount of dropout to use after each dense layer
batch_normalization (bool) – Use batch normalization
final_activation (str) – the final activation to use
iteration (int) – Maximum optimization attempt
batch_size (int) – The batch size to use during training
metrics (str) – the metric to use for reporting results
learning_rate (float) – The learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.
batch_size – Refers to the number of training examples utilized in one iteration.
loss_function (str) – It is a function that determine how far the predicted values deviate from the actual values in the training data.
tensorflow_optimizer (str) – Optimization algorithms that is used to minimize loss function.
number_of_temporal_blocks (int) – Number of Temporal Blocks
number_of_temporal_layers (int) – Number of Temporal Layers within each block
number_of_convolutional_filters (int) – Number of Convolutional filters in each layer
kernel_size (int) – Size of the convolutional filters
residual_block (boolean) – Implementing residual blocks
initial_dilation_rate (int) – The dilation rate of the first temporal block, which is a power of two. The dilation rate of each subsequent block is twice as large as that of the preceding block
number_of_latest_temporal_features (int) – Number of the most relevant temporal components generated by the last temporal layer, i.e. how many of the most recent successive temporal features to be used with the fully connected component of the network to generate classifications. Set this to zero to use the entire temporal range of the output tensor
-
RBF with Neuron Allocation Optimization
RBF with Neuron Allocation Optimization takes as input feature vectors, corresponding class labels, and desired number of iterations (or trials), and outputs a set of models. For each iteration the input vectors are randomly shuffled and presented to the PME classifier which either places the pattern as a neuron or does not. When a neuron is placed, an area of influence (AIF) is determined based on the neuron’s proximity to other neurons in the model and their respective classes.
- Parameters
input_data (DataFrame) – input feature vectors with a label column
label_column (str) – the name of the column in input_data containing labels
number_of_iterations (int) – the number of times to shuffle the training set;
turbo (boolean) – a flag that when True runs through the set of unplaced feature vectors repeatedly until no new neurons are placed (default is True)
number_of_neurons (int) – the maximum allowed number of neurons; when the model reaches this number, the algorithm will stop training
aggressive_neuron_creation (bool) – flag for placing neurons even if they are within the influence field of another neuron of the same category (default is False)
ranking_metric (str) – Method to score models by when evaluating best candidate. Options: [f1_score, sensitivity, accuracy]
- Returns
a set of models