Feature Transforms
Perform row wise operations on a single feature vector. The most common feature transform in the pipeline is Min Max Scale, which translates the output of the feature generation step into 1byte feature values.
-
Min Max Scale
Normalize and scale data to integer values between min_bound and max_bound, while leaving specified passthrough columns unscaled. This operates on each feature column separately and saves min/max data transforming the features prior to classification
- Parameters
min_bound – min value in the output (0~255)
max_bound – max value in the output (0~255)
feature_min_max_parameters – Dictionary of ‘maximums’ and ‘minimums’. If a non-empty dictionary is passed as parameter, the minimum and maximum value will be calculated based on the ‘maximums’ and ‘minimums’ in the dictionary. If the value of this parameter is {}, then a new min-max value for each feature is calculated.
pad – pad the min and max value by +-col.std()/pad. Can be used to make min max more robust to unseen data.
feature_min_max_defaults – allows you to set the min max value for all values at once. example {‘minimum’:-1000, maximum:1000}
- Returns
The scaled dataframe and minimums and maximums for each feature. If ‘feature_min_max_parameters’ values is {} then the minimums and maximums for each feature are calculated based on the data passed.
Examples
>>> from pandas import DataFrame >>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]], columns=['feature1', 'feature2', 'feature3']) >>> df['Subject'] = 's01' >>> df Out: feature1 feature2 feature3 Subject 0 -3 6 5 s01 1 3 7 8 s01 2 0 6 3 s01 3 -2 8 7 s01 4 2 9 6 s01 >>> client.pipeline.reset() >>> client.pipeline.set_input_data('test_data', df, force = True) >>> client.pipeline.add_transform('Min Max Scale', params={'passthrough_columns':['Subject'], 'min_bound' : 0, 'max_bound' : 255}) Out: Subject feature1 feature2 feature3 0 s01 0 0 101 1 s01 254 84 254 2 s01 127 0 0 3 s01 42 169 203 4 s01 212 254 152
Passing min-max parameter as arguments
>>> my_min_max_param = {'maximums': {'feature1': 30, 'feature2': 100, 'feature3': 500}, 'minimums': {'feature1': 0, 'feature2': 0, 'feature3': -100}} >>> client.pipeline.reset() >>> client.pipeline.set_input_data('test_data', df, force = True) >>> client.pipeline.add_transform('Min Max Scale', params={'passthrough_columns':['Subject'], 'min_bound' : 0, 'max_bound' : 255, 'feature_min_max_parameters': my_min_max_param}) >>> results, stats = client.pipeline.execute() >>> print results, stats Out: feature1 feature2 feature3 Subject 0 0 15 44 s01 1 25 17 45 s01 2 0 15 43 s01 3 0 20 45 s01 4 16 22 45 s01
- library.core_functions.feature_transforms.normalize(input_data: DataFrame, passthrough_columns: List[str])
Scale each feature vector to between -1 and 1 by dividing each feature in a feature vector by the absolute maximum value in that feature vector.
This function transfer the input_data and passthrough_columns from the previous pipeline block.
- Returns
Normalized dataframe
- Return type
dataframe
Examples
>>> from pandas import DataFrame >>> df = DataFrame([[3, 3], [4, 5], [5, 7], [4, 6], [3, 1], [3, 1], [4, 3], [5, 5], [4, 7], [3, 6]], columns=['accelx', 'accely']) >>> df['Subject'] = 's01' >>> df['Rep'] = [0] * 5 + [1] * 5 >>> df Out: accelx accely Subject Rep 0 3 3 s01 0 1 4 5 s01 0 2 5 7 s01 0 3 4 6 s01 0 4 3 1 s01 0 5 3 1 s01 1 6 4 3 s01 1 7 5 5 s01 1 8 4 7 s01 1 9 3 6 s01 1 >>> client.pipeline.reset() >>> client.pipeline.set_input_data('testn', df, data_columns=['accelx', 'accely'], group_columns=['Subject','Rep']) >>> client.pipeline.add_transform('Normalize') >>> r, s = client.pipeline.execute() >>> r Out: Rep Subject accelx accely 0 0 s01 1.000000 1.000000 1 0 s01 0.800000 1.000000 2 0 s01 0.714286 1.000000 3 0 s01 0.666667 1.000000 4 0 s01 1.000000 0.333333 5 1 s01 1.000000 0.333333 6 1 s01 1.000000 0.750000 7 1 s01 1.000000 1.000000 8 1 s01 0.571429 1.000000 9 1 s01 0.500000 1.000000
- library.core_functions.feature_transforms.quantize_254(input_data: DataFrame, passthrough_columns: List[str])
Scalar quantization of a normalized dataframe to integers between 0 and 254. This step should only be applied after features have been normalized to the range [-1, 1]. This function transfer the input_data and passthrough_columns from the previous pipeline block. It does not require any feature-specific statistics to be saved to the knowledgepack.
- Returns
quantized dataframe
- Return type
dataframe