Feature Transforms

Perform row wise operations on a single feature vector. The most common feature transform in the pipeline is Min Max Scale, which translates the output of the feature generation step into 1byte feature values.

Min Max Scale

Normalize and scale data to integer values between min_bound and max_bound, while leaving specified passthrough columns unscaled. This operates on each feature column separately and saves min/max data transforming the features prior to classification

Parameters
  • min_bound – min value in the output (0~255)

  • max_bound – max value in the output (0~255)

  • feature_min_max_parameters – Dictionary of ‘maximums’ and ‘minimums’. If a non-empty dictionary is passed as parameter, the minimum and maximum value will be calculated based on the ‘maximums’ and ‘minimums’ in the dictionary. If the value of this parameter is {}, then a new min-max value for each feature is calculated.

  • pad – pad the min and max value by +-col.std()/pad. Can be used to make min max more robust to unseen data.

  • feature_min_max_defaults – allows you to set the min max value for all values at once. example {‘minimum’:-1000, maximum:1000}

Returns

The scaled dataframe and minimums and maximums for each feature. If ‘feature_min_max_parameters’ values is {} then the minimums and maximums for each feature are calculated based on the data passed.

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                    [-2, 8, 7], [2, 9, 6]],
                    columns=['feature1', 'feature2', 'feature3'])
>>> df['Subject'] = 's01'
>>> df
    Out:
       feature1  feature2  feature3 Subject
    0        -3         6         5     s01
    1         3         7         8     s01
    2         0         6         3     s01
    3        -2         8         7     s01
    4         2         9         6     s01
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('test_data', df, force = True)
>>> client.pipeline.add_transform('Min Max Scale',
        params={'passthrough_columns':['Subject'],
                'min_bound' : 0, 'max_bound' : 255})
    Out:
          Subject  feature1  feature2  feature3
        0     s01         0         0       101
        1     s01       254        84       254
        2     s01       127         0         0
        3     s01        42       169       203
        4     s01       212       254       152

Passing min-max parameter as arguments

>>> my_min_max_param = {'maximums': {'feature1': 30,
                                    'feature2': 100,
                                    'feature3': 500},
                        'minimums': {'feature1': 0,
                                    'feature2': 0,
                                    'feature3': -100}}
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('test_data', df, force = True)
>>> client.pipeline.add_transform('Min Max Scale',
                    params={'passthrough_columns':['Subject'],
                            'min_bound' : 0,
                            'max_bound' : 255,
                            'feature_min_max_parameters': my_min_max_param})
>>> results, stats = client.pipeline.execute()
>>> print results, stats
    Out:
            feature1  feature2  feature3 Subject
         0         0        15        44     s01
         1        25        17        45     s01
         2         0        15        43     s01
         3         0        20        45     s01
         4        16        22        45     s01
library.core_functions.feature_transforms.normalize(input_data: DataFrame, passthrough_columns: List[str])

Scale each feature vector to between -1 and 1 by dividing each feature in a feature vector by the absolute maximum value in that feature vector.

This function transfer the input_data and passthrough_columns from the previous pipeline block.

Returns

Normalized dataframe

Return type

dataframe

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[3, 3], [4, 5], [5, 7], [4, 6], [3, 1],
            [3, 1], [4, 3], [5, 5], [4, 7], [3, 6]],
            columns=['accelx', 'accely'])
>>> df['Subject'] = 's01'
>>> df['Rep'] = [0] * 5 + [1] * 5
>>> df
    Out:
       accelx  accely Subject  Rep
    0       3       3     s01    0
    1       4       5     s01    0
    2       5       7     s01    0
    3       4       6     s01    0
    4       3       1     s01    0
    5       3       1     s01    1
    6       4       3     s01    1
    7       5       5     s01    1
    8       4       7     s01    1
    9       3       6     s01    1
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('testn', df, data_columns=['accelx', 'accely'], group_columns=['Subject','Rep'])
>>> client.pipeline.add_transform('Normalize')
>>> r, s = client.pipeline.execute()
>>> r
    Out:
          Rep Subject  accelx    accely
            0   0   s01 1.000000    1.000000
            1   0   s01 0.800000    1.000000
            2   0   s01 0.714286    1.000000
            3   0   s01 0.666667    1.000000
            4   0   s01 1.000000    0.333333
            5   1   s01 1.000000    0.333333
            6   1   s01 1.000000    0.750000
            7   1   s01 1.000000    1.000000
            8   1   s01 0.571429    1.000000
            9   1   s01 0.500000    1.000000
library.core_functions.feature_transforms.quantize_254(input_data: DataFrame, passthrough_columns: List[str])

Scalar quantization of a normalized dataframe to integers between 0 and 254. This step should only be applied after features have been normalized to the range [-1, 1]. This function transfer the input_data and passthrough_columns from the previous pipeline block. It does not require any feature-specific statistics to be saved to the knowledgepack.

Returns

quantized dataframe

Return type

dataframe