Training Algorithms

Training algorithms are used to select the optimal Parameters of a model.

Copyright 2017-2024 SensiML Corporation

This file is part of SensiML™ Piccolo AI™.

SensiML Piccolo AI is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

SensiML Piccolo AI is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with SensiML Piccolo AI. If not, see <https://www.gnu.org/licenses/>.

Hierarchical Clustering with Neuron Optimization

Hierarchical Clustering with Neuron Optimization takes as input feature vectors, corresponding class labels, and desired number of patterns, and outputs a model.

Each pattern in a model consists of a centroid, its class label, and its area of influence (AIF). Each centroid is calculated as an average of objects in the cluster, each class label is the label of the majority class, and each AIF is the distance between the centroid and the farthest object in that cluster.

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • number_of_neurons (int) – the maximum number of output clusters (neurons) desired

  • linkage_method (str) – options are average, complete, ward, and single (default is average)

  • centroid_calculation (str) – options are robust, mean, and median (default is robust)

  • flip (int) – default is 1

  • cluster_method (str) – options are DLCH, DHC, and kmeans (default is DLHC)

  • aif_method (str) – options are min, max, robust, mean, median (default is max)

  • singleton_aif (int) – default is 0

  • min_number_of_dominant_vector (int) – It is used for pruning. It defines min. number of vector for dominant class in the cluster.

  • max_number_of_weak_vector (int) – It is used for pruning. It defines max. number of vector for weak class in the cluster.

Returns

one or more models

Load Model PME

Load Neuron Array takes an input of feature vectors, corresponding class labels, and a neuron array to use for classification. The neuron array is loaded and classification is performed.

Note: This training algorithm does not perform optimizations on the provided neurons.

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • neuron_array (list) – A list of neurons to load into the hardware simulator.

  • class_map (dict) – class map for converting labels to neuron categories.

Returns

a set of models

Load Model TensorFlow Lite for Microcontrollers

Provides the ability to upload a TensorFlow Lite flatbuffer to use as the final classifier step in a pipeline.

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • model_parameters (int) – The flatbuffer object of your TensorFlow micro model

  • class_map (dict) – class map for converting labels to output

  • estimator_type (str) – defines if this estimator performs regression or classification, defaults to classification

  • threshold (float) – if no values are greater than the threshold, classify as Unknown

  • train_history (dict) – training history for this model

  • model_json (dict) – expects the model json file from the tensorflow api tf_model.to_json()

Example

SensiML provides the ability to train and bring your own NN architecture to use as the classifier for your pipeline. This example starts from the point where you have created features using the SensiML Toolkit

>>> x_train, x_test, x_validate, y_train, y_test, y_validate, class_map =             >>>     client.pipeline.features_to_tensor(fv_t, test=0.0, validate=.1)

Tensorflow Lite Micro only supports a subset of the full tensorflow functions. For a full list of available functions see the all_ops_resolver.cc. Use the Keras tensorflow API to create the NN graph.

>>> from tensorflow.keras import layers
>>> import tensorflow as tf
>>> tf_model = tf.keras.Sequential()
>>> tf_model.add(layers.Dense(12, activation='relu',kernel_regularizer='l1', input_shape=(x_train.shape[1],)))
>>> tf_model.add(layers.Dropout(0.1))
>>> tf_model.add(layers.Dense(8, activation='relu', input_shape=(x_train.shape[1],)))
>>> tf_model.add(layers.Dropout(0.1))
>>> tf_model.add(layers.Dense(y_train.shape[1], activation='softmax'))
>>> tf_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
>>> tf_model.summary()
>>> train_history = {'loss':[], 'val_loss':[], 'accuracy':[], 'val_accuracy':[]}

Train the Tensorflow Model

>>> epochs=100
>>> batch_size=32
>>> data  = tf.data.Dataset.from_tensor_slices((x_train, y_train))
>>> shuffle_ds = data.shuffle(buffer_size=x_train.shape[0], reshuffle_each_iteration=True).batch(batch_size)
>>> history = tf_model.fit( shuffle_ds, epochs=epochs, batch_size=batch_size, validation_data=(x_validate, y_validate), verbose=0)
>>> for key in train_history:
>>>     train_history[key].extend(history.history[key])
>>> import sensiml.tensorflow.utils as sml_tf
>>> sml_tf.plot_training_results(tf_model, train_history, x_train, y_train, x_validate, y_validate)

Qunatize the Tensorflow Model

  • The `representative_dataset_generator()` function is necessary to provide statistical information about your dataset in order to quantize the model weights appropriatley.

  • The TFLiteConverter is used to convert a tensorflow model into a TensorFlow Lite model. The TensorFlow Lite model is stored as a flatbuffer which allows us to easily store and access it on embedded systems.

  • Quantizing the model allows TensorFlow Lite micro to take advantage of specialized instructions on cortex-M class processors using the cmsis-nn DSP library which gives another huge boost in performance.

  • Quantizing the model can reduce size by up to 4x as 4 byte floats are converted to 1 byte ints in a number of places within the model.

    >>> import numpy as np
    >>> def representative_dataset_generator():
    >>>     for value in x_validate:
    >>>     # Each scalar value must be inside of a 2D array that is wrapped in a list
    >>>         yield [np.array(value, dtype=np.float32, ndmin=2)]
    >>>
    >>> converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
    >>> converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
    >>> converter.representative_dataset = representative_dataset_generator
    >>> tflite_model_quant = converter.convert()
    

Uploading Trained TF Lite model to SensiML

>>> class_map_tmp = {k:v+1 for k,v in class_map.items()} #increment by 1 as 0 corresponds to unknown
>>> client.pipeline.set_training_algorithm("Load Model TensorFlow Lite for Microcontrollers",
>>>                                     params={"model_parameters": {
>>>                                             'tflite': sml_tf.convert_tf_lite(tflite_model_quant)},
>>>                                             "class_map": class_map_tmp,
>>>                                             "estimator_type": "classification",
>>>                                             "threshold": 0.0,
>>>                                             "train_history":train_history,
>>>                                             "model_json": tf_model.to_json()
>>>                                             })
>>> client.pipeline.set_validation_method("Recall", params={})
>>> client.pipeline.set_classifier("TensorFlow Lite for Microcontrollers", params={})
>>> client.pipeline.set_tvo()
>>> results, stats = client.pipeline.execute()
>>>
>>> results.summarize()
Neuron Optimization

Neuron Optimization performs an optimized grid search over KNN/RBF and the number of neurons for the parameters of Hierarchical Clustering with Neuron Optimization. Takes as input feature vectors, corresponding class labels and outputs a model.

Each pattern in a model consists of a centroid, its class label, and its area of influence (AIF). Each centroid is calculated as an average of objects in the cluster, each class label is the label of the majority class, and each AIF is the distance between the centroid and the farthest object in that cluster.

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • neuron_range (list) – the range of max neurons spaces to search over specified as [Min, Max]

  • linkage_method (str) – options are average, complete, ward, and single (default is average)

  • centroid_calculation (str) – options are robust, mean, and median (default is robust)

  • flip (int) – default is 1

  • cluster_method (str) – options are DLCH, DHC, and kmeans (default is DLHC)

  • aif_method (str) – options are min, max, robust, mean, median (default is max)

  • singleton_aif (int) – default is 0

  • min_number_of_dominant_vector (int) – It is used for pruning. It defines min. number of vector for dominant class in the cluster.

  • max_number_of_weak_vector (int) – It is used for pruning. It defines max. number of vector for weak class in the cluster.

Returns

one or more models

“description”: “.”}

return None

Bonsai Tree Optimizer

Train a Bonsais Tree Classifier using backpropagation.

For detailed information see the ICML 2017 Paper

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • epochs (str) – The number of training epochs to iterate over

  • batch_size (float) – The size of batches to use during training

  • learning_rate (float) – The learning rate for training optimization

  • project_dimensions (int) – The number of dimensions to project the input feature space into

  • sigma (float) – tunable hyperparameter

  • reg_W (float) – regularization for W matrix

  • reg_V (float) – regularization for V matrix

  • reg_Theta (float) – regularization for Theta matrix

  • reg_Z (float) – regularization for Z matrix

  • sparse_V (float) – sparcity factor for V matrix

  • sparse_Theta (float) – sparcity factor for Theta matrix

  • sparse_W (float) – sparcity factor for W matrix

  • sparse_Z (float) – sparcity factor fo Z matrix

Returns

model parameters for a bonsai tree classifier

Train Fully Connected Neural Network

Provides the ability to train a fully connected neural network model to use as the final classifier step in a pipeline.

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • class_map (dict) – optional, class map for converting labels to output

  • estimator_type (str) – defines if this estimator performs regression or classification, defaults to classification

  • threshold (float) – if no values are greater than the threshold, classify as Unknown

  • dense_layers (list) – The size of each dense layer

  • drop_out (float) – The amount of dropout to use after each dense layer

  • batch_normalization (bool) – Use batch normalization

  • final_activation (str) – the final activation to use

  • iteration (int) – Maximum optimization attempt

  • batch_size (int) – The batch size to use during training

  • metrics (str) – the metric to use for reporting results

  • learning_rate (float) – The learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

  • batch_size – Refers to the number of training examples utilized in one iteration.

  • loss_function (str) – It is a function that determine how far the predicted values deviate from the actual values in the training data.

  • tensorflow_optimizer (str) – Optimization algorithms that is used to minimize loss function.

Example

SensiML provides the ability to train NN architecture to use as the classifier for your pipeline. Tensorflow Lite Micro only supports a subset of the full tensorflow functions. For a full list of available functions see the all_ops_resolver.cc. Use the Keras tensorflow API to create the NN graph.

>>> client.project = 'Activity_Detection'
>>> client.pipeline = 'tf_p1'
>>> client.pipeline.stop_pipeline()
>>> sensors = ['GyroscopeX', 'GyroscopeY', 'GyroscopeZ', 'AccelerometerX', 'AccelerometerY', 'AccelerometerZ']
>>> client.pipeline.reset()
>>> client.pipeline.set_input_query("Q1")
>>> client.pipeline.add_transform("Windowing", params={"window_size":200,
                                "delta":200,
                                "train_delta":0})
>>> client.pipeline.add_feature_generator([
        {'name':'MFCC', 'params':{"columns":sensors,"sample_rate":100, "cepstra_count":1}}
    ])
>>> client.pipeline.add_transform("Min Max Scale")
>>> client.pipeline.set_validation_method("Recall", params={})
>>> client.pipeline.set_training_algorithm("Train Fully Connected Neural Network", params={
                        "estimator_type":"classification",
                        "class_map": None,
                        "threshold":0.0,
                        "dense_layers": [64,32,16,8],
                        "drop_out": 0.1,
                        "iterations": 5,
                        "learning_rate": 0.01,
                        "batch_size": 64,
                        "loss_function":"categorical_crossentropy",
                        "tensorflow_optimizer":"adam",
                        "batch_normalization": True,
                        "final_activation":"softmax,
    })
>>> client.pipeline.set_classifier("TensorFlow Lite for Microcontrollers")
>>> client.pipeline.set_tvo({'validation_seed':None})
>>> results, stats = client.pipeline.execute()
>>> results.summarize()
xGBoost

Train an ensemble of boosted tree classifiers using the xGBoost training algorithm.

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • max_depth (int) – The max depth to allow a decision tree to reach

  • n_estimators (int) – The number of decision trees to build.

Returns

a trained model

L1 Lasso

Linear Model trained with L1 prior as regularizer (aka the Lasso).

The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

Technically the Lasso model is optimizing the same objective function as the Elastic Net with l1_ratio=1.0 (no L2 penalty).

See: Scikit Learn linear_model.Lasso training algorithm https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html

for more information

Parameters
  • alpha (float, default=1.0) –

    Constant that multiplies the L1 term, controlling regularization strength. alpha must be a non-negative float i.e. in [0, inf).

    When alpha = 0, the objective is equivalent to ordinary least squares, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised. Instead, you should use the LinearRegression object.

  • fit_intercept (bool, default=True) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

  • max_iter (int, default=1000) – The maximum number of iterations.

  • tol (float, default=1e-4) – The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol

  • positive (bool, default=False) – When set to True, forces the coefficients to be positive.

  • random_state (int, RandomState instance, default=None) – The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls.

  • Returns – Trained linear regression model

L2 Ridge

Linear least squares with l2 regularization.

Minimizes the objective function:

||y - Xw||^2_2 + alpha * ||w||^2_2

This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape (n_samples, n_targets)).

Read more at Scikit Learn linear_model.Ridge https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html

Parameters
  • alpha (float, default=1.0) –

    Constant that multiplies the L2 term, controlling regularization strength. alpha must be a non-negative float i.e. in [0, inf).

    When alpha = 0, the objective is equivalent to ordinary least squares, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Ridge object is not advised. Instead, you should use the LinearRegression object.

  • fit_intercept (bool, default=True) – Whether to fit the intercept for this model. If set to false, no intercept will be used in calculations (i.e. X and y are expected to be centered).

  • max_iter (int, default=None) – Maximum number of iterations for conjugate gradient solver. For ‘sparse_cg’ and ‘lsqr’ solvers, the default value is determined by scipy.sparse.linalg. For ‘sag’ solver, the default value is 1000. For ‘lbfgs’ solver, the default value is 15000.

  • tol (float, default=1e-4) –

    The precision of the solution (coef_) is determined by tol which specifies a different convergence criterion for each solver:

    • ’svd’: tol has no impact.

    • ’cholesky’: tol has no impact.

    • ’sparse_cg’: norm of residuals smaller than tol.

    • ’lsqr’: tol is set as atol and btol of scipy.sparse.linalg.lsqr, which control the norm of the residual vector in terms of the norms of matrix and coefficients.

    • ’sag’ and ‘saga’: relative change of coef smaller than tol.

    • ’lbfgs’: maximum of the absolute (projected) gradient=max|residuals| smaller than tol.

  • solver ({'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}, default='auto') –

    Solver to use in the computational routines:

    • ’auto’ chooses the solver automatically based on the type of data.

    • ’svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than ‘cholesky’ at the cost of being slower.

    • ’cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution.

    • ’sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).

    • ’lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.

    • ’sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

    • ’lbfgs’ uses L-BFGS-B algorithm implemented in scipy.optimize.minimize. It can be used only when positive is True.

    All solvers except ‘svd’ support both dense and sparse data. However, only ‘lsqr’, ‘sag’, ‘sparse_cg’, and ‘lbfgs’ support sparse input when fit_intercept is True.

    New in version 0.17: Stochastic Average Gradient descent solver.

    New in version 0.19: SAGA solver.

  • positive (bool, default=False) – When set to True, forces the coefficients to be positive. Only ‘lbfgs’ solver is supported in this case.

  • random_state (int, RandomState instance, default=None) – Used when solver == ‘sag’ or ‘saga’ to shuffle the data.

Ordinary Least Squares

Ordinary least squares Linear Regression.

Fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

Parameters
  • fit_intercept (bool) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered). default=True

  • positive (bool) – When set to True, forces the coefficients to be positive. This option is only supported for dense arrays. default=False

Returns

Trained linear regression model

RBF with Neuron Allocation Limit

The Train and Prune algorithm takes as input feature vectors, corresponding class labels, and maximum desired number of neurons, and outputs a model.

The training vectors are partitioned into subsets (chunks) and presented to the PME classifier which places neurons and determines areas of influence (AIFs). After each subset is learned, the neurons that fired the most on the validation set are retained and the others are removed (pruned) from the model. After a defined number of train and prune cycles, the complete retained set of neurons is then re-learned, which results in larger neuron AIFs. Train/prune/re-learn cycles continue to run on all of the remaining chunks, keeping the total number of neurons within the limit while giving preference to neurons that fire frequently.

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • chunk_size (int) – the number of training vectors in each chunk

  • inverse_relearn_frequency (int) – the number of chunks to train and prune between each re-learn phase

  • max_neurons (int) – the maximum allowed number of neurons

  • aggressive_neuron_creation (bool) – flag for placing neurons even if they are within the influence field of another neuron of the same category (default is False)

Returns

a model

Random Forest

Train an ensemble of decision tree classifiers using the random forest training algorithm. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control overfitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • max_depth (int) – The max depth to allow a decision tree to reach

  • n_estimators (int) – The number of decision trees to build.

Returns

a set of models

Train Temporal Convolutional Neural Network

Implements a temporal convolutional neural network, consisting of several temporal blocks with various dilations.

A Temporal Convolutional Neural Network (TCN) is designed for sequential data like time series or text. TCNs use “temporal blocks” with varying dilation rates to capture different time scales. Smaller rates focus on short-term patterns, while larger rates capture long-term dependencies. This diversity enables TCNs to model complex temporal relationships effectively, making them useful for tasks like speech recognition and forecasting.

Note: To build a TCN model, pipeline should include a feature cascading block with cascade number larger than 1.

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • class_map (dict) – optional, class map for converting labels to output

  • estimator_type (str) – defines if this estimator performs regression or classification, defaults to classification

  • threshold (float) – if no values are greater than the threshold, classify as Unknown

  • dense_layers (list) – The size of each dense layer

  • drop_out (float) – The amount of dropout to use after each dense layer

  • batch_normalization (bool) – Use batch normalization

  • final_activation (str) – the final activation to use

  • iteration (int) – Maximum optimization attempt

  • batch_size (int) – The batch size to use during training

  • metrics (str) – the metric to use for reporting results

  • learning_rate (float) – The learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

  • batch_size – Refers to the number of training examples utilized in one iteration.

  • loss_function (str) – It is a function that determine how far the predicted values deviate from the actual values in the training data.

  • tensorflow_optimizer (str) – Optimization algorithms that is used to minimize loss function.

  • number_of_temporal_blocks (int) – Number of Temporal Blocks

  • number_of_temporal_layers (int) – Number of Temporal Layers within each block

  • number_of_convolutional_filters (int) – Number of Convolutional filters in each layer

  • kernel_size (int) – Size of the convolutional filters

  • residual_block (boolean) – Implementing residual blocks

  • initial_dilation_rate (int) – The dilation rate of the first temporal block, which is a power of two. The dilation rate of each subsequent block is twice as large as that of the preceding block

  • number_of_latest_temporal_features (int) – Number of the most relevant temporal components generated by the last temporal layer, i.e. how many of the most recent successive temporal features to be used with the fully connected component of the network to generate classifications. Set this to zero to use the entire temporal range of the output tensor

Transfer Learning

Apply transfer learning to a pre-trained TensorFlow model.

RBF with Neuron Allocation Optimization

RBF with Neuron Allocation Optimization takes as input feature vectors, corresponding class labels, and desired number of iterations (or trials), and outputs a set of models. For each iteration the input vectors are randomly shuffled and presented to the PME classifier which either places the pattern as a neuron or does not. When a neuron is placed, an area of influence (AIF) is determined based on the neuron’s proximity to other neurons in the model and their respective classes.

Parameters
  • input_data (DataFrame) – input feature vectors with a label column

  • label_column (str) – the name of the column in input_data containing labels

  • number_of_iterations (int) – the number of times to shuffle the training set;

  • turbo (boolean) – a flag that when True runs through the set of unplaced feature vectors repeatedly until no new neurons are placed (default is True)

  • number_of_neurons (int) – the maximum allowed number of neurons; when the model reaches this number, the algorithm will stop training

  • aggressive_neuron_creation (bool) – flag for placing neurons even if they are within the influence field of another neuron of the same category (default is False)

  • ranking_metric (str) – Method to score models by when evaluating best candidate. Options: [f1_score, sensitivity, accuracy]

Returns

a set of models