Feature Generators

A collection of feature generators work on a segment of data to extract meaningful information. The combination of the output from all feature generators becomes a feature vector.

Statistical

Absolute Mean

Computes the arithmetic mean of absolute value in each column of columns in the dataframe.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Absolute Mean',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxAbsMean  gen_0002_accelyAbsMean  gen_0003_accelzAbsMean
    0     s01                     2.0                     7.2                     5.8

Absolute Sum

Computes the cumulative sum of absolute values in each column in ‘columns’ in the dataframe.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Absolute Sum',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxAbsSum  gen_0002_accelyAbsSum  gen_0003_accelzAbsSum
    0     s01                   10.0                   36.0                   29.0

Interquartile Range

The IQR (inter quartile range) of a vector V with N items, is the difference between the 75th percentile and 25th percentile value.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Interquartile Range',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxIQR  gen_0002_accelyIQR  gen_0003_accelzIQR
    0     s01                 4.0                 2.0                 2.0

Kurtosis

Kurtosis is the degree of ‘peakedness’ or ‘tailedness’ in the distribution and is related to the shape. A high Kurtosis portrays a chart with fat tail and peaky distribution, whereas a low Kurtosis corresponds to the skinny tails and the distribution is concentrated towards the mean. Kurtosis is calculated using Fisher’s method.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Kurtosis',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxKurtosis  gen_0002_accelyKurtosis  gen_0003_accelzKurtosis
    0     s01                -1.565089                -1.371972                -1.005478

Linear Regression Stats

Calculate a linear least-squares regression and returns the linear regression stats which are slope, intercept, r value, standard error.

slope: Slope of the regression line. intercept: Intercept of the regression line. r value: Correlation coefficient. StdErr: Standard error of the estimated gradient.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> from pandas import DataFrame
>>> df = pd.DataFrame({'Subject': ['s01'] * 10,'Class': ['Crawling'] * 10 ,'Rep': [1] * 10 })
>>> df["X"] = [i + 2 for i in range(10)]
>>> df["Y"] = [i for i in range(10)]
>>> df["Z"] = [1, 2, 3, 3, 5, 5, 7, 7, 9, 10]
>>> print(df)
    out:
      Subject     Class  Rep   X  Y   Z
    0     s01  Crawling    1   2  0   1
    1     s01  Crawling    1   3  1   2
    2     s01  Crawling    1   4  2   3
    3     s01  Crawling    1   5  3   3
    4     s01  Crawling    1   6  4   5
    5     s01  Crawling    1   7  5   5
    6     s01  Crawling    1   8  6   7
    7     s01  Crawling    1   9  7   7
    8     s01  Crawling    1  10  8   9
    9     s01  Crawling    1  11  9  10

>>> client.upload_dataframe('test_data', df, force=True)
>>> client.pipeline.reset(delete_cache=True)
>>> client.pipeline.set_input_data('test_data.csv',
                                group_columns=['Subject','Rep'],
                                label_column='Class',
                                data_columns=['X','Y','Z'])
>>> client.pipeline.add_feature_generator([{'name':'Linear Regression Stats',
                                         'params':{"columns": ['X','Y','Z'] }}])
>>> results, stats = client.pipeline.execute()
>>> print(results.T)
    out:
                                             0
    Rep                                      1
    Subject                                s01
    gen_0001_XLinearRegressionSlope          1
    gen_0001_XLinearRegressionIntercept      2
    gen_0001_XLinearRegressionR              1
    gen_0001_XLinearRegressionStdErr         0
    gen_0002_YLinearRegressionSlope          1
    gen_0002_YLinearRegressionIntercept      0
    gen_0002_YLinearRegressionR              1
    gen_0002_YLinearRegressionStdErr         0
    gen_0003_ZLinearRegressionSlope      0.982
    gen_0003_ZLinearRegressionIntercept  0.782
    gen_0003_ZLinearRegressionR          0.987
    gen_0003_ZLinearRegressionStdErr     0.056

Maximum

Computes the maximum of each column in ‘columns’ in the dataframe. A maximum of a vector V the maximum value in V.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Maximum',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxmaximum  gen_0002_accelymaximum  gen_0003_accelzmaximum
    0     s01                     3.0                     9.0                     8.0

Mean

Computes the arithmetic mean of each column in columns in the dataframe.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Mean',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxMean  gen_0002_accelyMean  gen_0003_accelzMean
    0     s01                  0.0                  7.2                  5.8

Median

The median of a vector V with N items, is the middle value of a sorted copy of V (V_sorted). When N is even, it is the average of the two middle values in V_sorted.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Median',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxMedian  gen_0002_accelyMedian  gen_0003_accelzMedian
    0     s01                    0.0                    7.0                    6.0

Minimum

Computes the minimum of each column in ‘columns’ in the dataframe. A minimum of a vector V the minimum value in V.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Minimum',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxminimum  gen_0002_accelyminimum  gen_0003_accelzminimum
    0     s01                    -3.0                     6.0                     3.0

Negative Zero Crossings

Computes the number of times the selected input crosses the mean+threshold and mean-threshold values with a negative slope. The threshold value is specified by the user. crossing the mean value when the threshold is 0 only coutns as a single crossing.

Parameters

columns – list of columns on which to apply the feature generator
threshold – value in addition to mean which must be crossed to count as a crossing

Returns

Returns data frame with specified column(s).

Return type

DataFrame

25th Percentile

Computes the 25th percentile of each column in ‘columns’ in the dataframe. A q-th percentile of a vector V of length N is the q-th ranked value in a sorted copy of V. If the normalized ranking doesn’t match the q exactly, interpolation is done on two nearest values.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'25th Percentile',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelx25Percentile  gen_0002_accely25Percentile  gen_0003_accelz25Percentile
    0     s01                         -2.0                          6.0                          5.0

75th Percentile

Computes the 75th percentile of each column in ‘columns’ in the dataframe. A q-th percentile of a vector V of length N is the q-th ranked value in a sorted copy of V. If the normalized ranking doesn’t match the q exactly, interpolation is done on two nearest values.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with 75th percentile of each specified column.
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'75th Percentile',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelx75Percentile  gen_0002_accely75Percentile  gen_0003_accelz75Percentile
    0     s01                          2.0                          8.0                          7.0

100th Percentile

Computes the 100th percentile of each column in ‘columns’ in the dataframe. A 100th percentile of a vector V the maximum value in V.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns feature vector with 100th percentile (sample maximum) of each specified column.
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                        [-2, 8, 7], [2, 9, 6]],
                        columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'100th Percentile',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelx100Percentile  gen_0002_accely100Percentile  gen_0003_accelz100Percentile
    0     s01                           3.0                           9.0                           8.0

Positive Zero Crossings

Computes the number of times the selected input crosses the mean+threshold and mean-threshold values with a positive slope. The threshold value is specified by the user. crossing the mean value when the threshold is 0 only counts as a single crossing.

Parameters

columns – list of columns on which to apply the feature generator
threshold – value in addition to mean which must be crossed to count as a crossing

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Skewness

The skewness is the measure of asymmetry of the distribution of a variable about its mean. The skewness value can be positive, negative, or even undefined. A positive skew indicates that the tail on the right side is fatter than the left. A negative value indicates otherwise.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
            [-2, 8, 7], [2, 9, 6]],
            columns=['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Skewness',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxSkew  gen_0002_accelySkew  gen_0003_accelzSkew
    0     s01                  0.0             0.363174            -0.395871

Standard Deviation

The standard deviation of a vector V with N items, is the measure of spread of the distribution. The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(abs(x - x.mean())**2)).

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Standard Deviation',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxStd  gen_0002_accelyStd  gen_0003_accelzStd
    0     s01            2.280351             1.16619            1.720465

Sum

Computes the cumulative sum of each column in ‘columns’ in the dataframe.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Standard Deviation',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxSum  gen_0002_accelySum  gen_0003_accelzSum
    0     s01                 0.0                36.0                29.0

Variance

Computes the variance of desired column(s) in the dataframe.

Parameters: columns – list of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Variance',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxVariance  gen_0002_accelyVariance  gen_0003_accelzVariance
    0     s01                      6.5                      1.7                      3.7

Zero Crossings

Computes the number of times the selected input crosses the mean+threshold and mean-threshold values. The threshold value is specified by the user. crossing the mean value when the threshold is 0 only counts as a single crossing.

Parameters

columns – list of columns on which to apply the feature generator
threshold – value in addition to mean which must be crossed to count as a crossing

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Zero Crossings',
                             'params':{"columns": ['accelx', 'accely', 'accelz'],
                                       "threshold: 5}
                            }])
>>> result, stats = client.pipeline.execute()

Histogram

Histogram

Translates to the data stream(s) from a segment into a feature vector in histogram space.

Parameters

column (list of strings) – name of the sensor streams to use
range_left (int) – the left limit (or the min) of the range for a fixed bin histogram
range_right (int) – the right limit (or the max) of the range for a fixed bin histogram
number_of_bins (int, optional) – the number of bins used for the histogram
scaling_factor (int, optional) – scaling factor used to fit for the device

Returns

feature vector in histogram space.

Return type

DataFrame

Examples

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    s01  Crawling    1     377     569    4019
    s01  Crawling    1     357     594    4051
    s01  Crawling    1     333     638    4049
    s01  Crawling    1     340     678    4053
    s01  Crawling    1     372     708    4051
    s01  Crawling    1     410     733    4028
    s01  Crawling    1     450     733    3988
    s01  Crawling    1     492     696    3947
    s01  Crawling    1     518     677    3943
    s01  Crawling    1     528     695    3988
   s01  Crawling    1      -1    2558    4609
   s01   Running    1     -44   -3971     843
   s01   Running    1     -47   -3982     836
   s01   Running    1     -43   -3973     832
   s01   Running    1     -40   -3973     834
   s01   Running    1     -48   -3978     844
   s01   Running    1     -52   -3993     842
   s01   Running    1     -64   -3984     821
   s01   Running    1     -64   -3966     813
   s01   Running    1     -66   -3971     826
   s01   Running    1     -62   -3988     827
   s01   Running    1     -57   -3984     843

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> client.pipeline.add_feature_generator([{'name':'Histogram',
                             'params':{"columns": ['accelx','accely','accelz'],
                                       "range_left": 10,
                                       "range_right": 1000,
                                       "number_of_bins": 5,
                                       "scaling_factor": 254 }}])
>>> results, stats = client.pipeline.execute()

>>> print results
    out:
          Class  Rep Subject  gen_0000_hist_bin_000000  gen_0000_hist_bin_000001  gen_0000_hist_bin_000002  gen_0000_hist_bin_000003  gen_0000_hist_bin_000004
    0  Crawling    1     s01                       8.0                      38.0                      46.0                      69.0                       0.0
    1   Running    1     s01                      85.0                       0.0                       0.0                       0.0                      85.0

Histogram Auto Scale Range

Translates to the data stream(s) from a segment into a feature vector in histogram space where the range is set by the min and max values and the number of bins by the user.

Parameters

column (list of strings) – name of the sensor streams to use
number_of_bins (int, optional) – the number of bins used for the histogram
scaling_factor (int, optional) – scaling factor used to fit for the device

Returns

feature vector in histogram space.

Return type

DataFrame

Examples

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    s01  Crawling    1     377     569    4019
    s01  Crawling    1     357     594    4051
    s01  Crawling    1     333     638    4049
    s01  Crawling    1     340     678    4053
    s01  Crawling    1     372     708    4051
    s01  Crawling    1     410     733    4028
    s01  Crawling    1     450     733    3988
    s01  Crawling    1     492     696    3947
    s01  Crawling    1     518     677    3943
    s01  Crawling    1     528     695    3988
   s01  Crawling    1      -1    2558    4609
   s01   Running    1     -44   -3971     843
   s01   Running    1     -47   -3982     836
   s01   Running    1     -43   -3973     832
   s01   Running    1     -40   -3973     834
   s01   Running    1     -48   -3978     844
   s01   Running    1     -52   -3993     842
   s01   Running    1     -64   -3984     821
   s01   Running    1     -64   -3966     813
   s01   Running    1     -66   -3971     826
   s01   Running    1     -62   -3988     827
   s01   Running    1     -57   -3984     843

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> client.pipeline.add_feature_generator([{'name':'Histogram',
                             'params':{"columns": ['accelx','accely','accelz'],
                                       "range_left": 10,
                                       "range_right": 1000,
                                       "number_of_bins": 5,
                                       "scaling_factor": 254 }}])
>>> results, stats = client.pipeline.execute()

>>> print results
    out:
          Class  Rep Subject  gen_0000_hist_bin_000000  gen_0000_hist_bin_000001  gen_0000_hist_bin_000002  gen_0000_hist_bin_000003  gen_0000_hist_bin_000004
    0  Crawling    1     s01                       8.0                      38.0                      46.0                      69.0                       0.0
    1   Running    1     s01                      85.0                       0.0                       0.0                       0.0                      85.0

Sampling

Downsample

This function takes a dataframe input_data as input and performs group by operation on specified group_columns. For each group, it drops the passthrough_columns and performs downsampling on the remaining columns by following steps:

Divide the entire column into windows of size total length/new_length.
Calculate mean for each window.
Concatenate all the mean values.
The length of the downsampled signal is equal to new length.

Then all such means are concatenated to get new_length * # of columns. These constitute features in downstream analyses. For instance, if there are three columns and the new_length value is 12, then total number of means we get is 12 * 3 = 36. Each will represent a feature.

Parameters

input_data (DataFrame) – Input pandas dataframe.
columns (List[str]) – List of column names to perform downsampling.
new_length (int) – Downsampled length. Defaults to 12.

Returns

DataFrame containing Downsampled Feature Vector.

Return type

DataFrame

Examples

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    s01  Crawling    1     377     569    4019
    s01  Crawling    1     357     594    4051
    s01  Crawling    1     333     638    4049
    s01  Crawling    1     340     678    4053
    s01  Crawling    1     372     708    4051
    s01  Crawling    1     410     733    4028
    s01  Crawling    1     450     733    3988
    s01  Crawling    1     492     696    3947
    s01  Crawling    1     518     677    3943
    s01  Crawling    1     528     695    3988
   s01  Crawling    1      -1    2558    4609
   s01   Running    1     -44   -3971     843
   s01   Running    1     -47   -3982     836
   s01   Running    1     -43   -3973     832
   s01   Running    1     -40   -3973     834
   s01   Running    1     -48   -3978     844
   s01   Running    1     -52   -3993     842
   s01   Running    1     -64   -3984     821
   s01   Running    1     -64   -3966     813
   s01   Running    1     -66   -3971     826
   s01   Running    1     -62   -3988     827
   s01   Running    1     -57   -3984     843

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> client.pipeline.add_feature_generator([{'name':'Downsample',
                             'params':{"columns": ['accelx'],
                                       "new_length": 5}}])
>>> results, stats = client.pipeline.execute()

>>> print results
    out:
          Class  Rep Subject  gen_0001_accelx_0  gen_0001_accelx_1  gen_0001_accelx_2  gen_0001_accelx_3  gen_0001_accelx_4
    0  Crawling    1     s01              367.0              336.5              391.0              471.0              523.0
    1   Running    1     s01              -45.5              -41.5              -50.0              -64.0              -64.0

Downsample Average with Normalization

This function takes input_data dataframe as input and group by group_columns. Then for each group, it drops the passthrough_columns and performs a convolution on the remaining columns.

On each column, perform the following steps:

Divide the entire column into windows of size total length/new_length.
Calculate mean for each window
Concatenate all the mean values into a feature vector of length new_length
Normalize the signal to be between 0-255

Then all such means are concatenated to get new_length * # of columns. These constitute features in downstream analyses. For instance, if there are three columns and the new_length value is 12, then total number of means we get is 12 * 3 = 36. Each will represent a feature.

Parameters

input_data (DataFrame) – Input data to transform
columns – List of columns
group_columns (a list) – List of columns on which grouping is to be done. Each group will go through downsampling one at a time
new_length (int) – Dopwnsample Length length

Returns

Downsampled Features Normalized

Return type

DataFrame

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[3, 3], [4, 5], [5, 7], [4, 6],
                    [3, 1], [3, 1], [4, 3], [5, 5],
                    [4, 7], [3, 6]], columns=['accelx', 'accely'])
>>> df
Out:
   accelx  accely
0       3       3
1       4       5
2       5       7
3       4       6
4       3       1
5       3       1
6       4       3
7       5       5
8       4       7
9       3       6
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('test_data', df, force=True)
>>> client.pipeline.add_feature_generator(["Downsample Average with Normalization"],
         params = {"group_columns": []},
                   function_defaults={"columns":['accelx', 'accely'],
                                     'new_length' : 5})
>>> result, stats = client.pipeline.execute()
>>> print result
    Out:
           accelx_1  accelx_2  accelx_3  accelx_4  accelx_5  accely_1  accely_2
        0       3.5       4.5         3       4.5       3.5         4       6.5
           accely_3  accely_4  accely_5
        0         1         4       6.5

Downsample Max With Normaliztion

This function takes input_data dataframe as input and group by group_columns. Then for each group, it drops the passthrough_columns and performs a max downsampling on the remaining columns.

On each column, perform the following steps:

Divide the entire column into windows of size total length/new_length.
Calculate max value for each window
Concatenate all the max values into a feature vector of length new_length
Nomralize the signal to be between 0-255

Then all such means are concatenated to get new_length * # of columns. These constitute features in downstream analyses. For instance, if there are three columns and the new_length value is 12, then total number of means we get is 12 * 3 = 36. Each will represent a feature.

Parameters

input_data (DataFrame) – Input data to transform
columns – List of columns
group_columns (a list) – List of columns on which grouping is to be done. Each group will go through downsampling one at a time
new_length (int) – Dopwnsample Length length

Returns

Downsampled Features Normalized to the Max Value

Return type

DataFrame

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[3, 3], [4, 5], [5, 7], [4, 6],
                    [3, 1], [3, 1], [4, 3], [5, 5],
                    [4, 7], [3, 6]], columns=['accelx', 'accely'])
>>> df
Out:
   accelx  accely
0       3       3
1       4       5
2       5       7
3       4       6
4       3       1
5       3       1
6       4       3
7       5       5
8       4       7
9       3       6
>>> client.pipeline.reset()
>>> client.pipeline.set_input_data('test_data', df, force=True)
>>> client.pipeline.add_feature_generator(["Downsample Max with Normalization"],
         params = {"group_columns": []},
                   function_defaults={"columns":['accelx', 'accely'],
                                     'new_length' : 5})
>>> result, stats = client.pipeline.execute()
>>> print result
    Out:
           accelx_1  accelx_2  accelx_3  accelx_4  accelx_5  accely_1  accely_2
        0       3.5       4.5         3       4.5       3.5         4       6.5
           accely_3  accely_4  accely_5
        0         1         4       6.5

Rate of Change

Mean Crossing Rate

Calculates the rate at which the mean value is crossed for each specified column. Works with grouped data. The total number of mean value crossings are found and then the number is divided by the total number of samples to get the mean_crossing_rate.

Parameters

input_data (DataFrame) – The input data.
columns (list of strings) – A list of all column names on which mean_crossing_rate is to be found.

Returns

Return the number of mean crossings divided by the length of the signal.

Return type

DataFrame

Examples

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    s01  Crawling    1     377     569    4019
    s01  Crawling    1     357     594    4051
    s01  Crawling    1     333     638    4049
    s01  Crawling    1     340     678    4053
    s01  Crawling    1     372     708    4051
    s01  Crawling    1     410     733    4028
    s01  Crawling    1     450     733    3988
    s01  Crawling    1     492     696    3947
    s01  Crawling    1     518     677    3943
    s01  Crawling    1     528     695    3988
   s01  Crawling    1      -1    2558    4609
   s01   Running    1     -44   -3971     843
   s01   Running    1     -47   -3982     836
   s01   Running    1     -43   -3973     832
   s01   Running    1     -40   -3973     834
   s01   Running    1     -48   -3978     844
   s01   Running    1     -52   -3993     842
   s01   Running    1     -64   -3984     821
   s01   Running    1     -64   -3966     813
   s01   Running    1     -66   -3971     826
   s01   Running    1     -62   -3988     827
   s01   Running    1     -57   -3984     843

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')

>>> client.pipeline.add_feature_generator([{'name':'Mean Crossing Rate',
                             'params':{"columns": ['accelx','accely', 'accelz']}
                            }])

>>> results, stats = client.pipeline.execute()
>>> print results
    out:
          Class  Rep Subject  gen_0001_accelxMeanCrossingRate  gen_0002_accelyMeanCrossingRate  gen_0003_accelzMeanCrossingRate
    0  Crawling    1     s01                         0.181818                         0.090909                         0.090909
    1   Running    1     s01                         0.090909                         0.454545                         0.363636

Mean Difference

Calculate the mean difference of each specified column. Works with grouped data. For a given column, it finds difference of ith element and (i-1)th element and finally takes the mean value of the entire column.

mean(diff(arr)) = mean(arr[i] - arr[i-1]), for all 1 <= i <= n.

Parameters: columns – The columns represents a list of all column names on which mean_difference is to be found.
Returns: Return the number of mean difference divided by the length of the signal.
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Mean Difference',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxMeanDifference  gen_0002_accelyMeanDifference  gen_0003_accelzMeanDifference
    0     s01                           1.25                           0.75                           0.25

Second Sigma Crossing Rate

Calculates the rate at which 2nd standard deviation value (second sigma) is crossed for each specified column. The total number of second sigma crossings are found and then the number is divided by total number of samples to get the second_sigma_crossing_rate.

Parameters: columns – The columns represents a list of all column names on which second_sigma_crossing_rate is to be found.
Returns: Return the second sigma crossing rate.
Return type: DataFrame

Examples

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    s01  Crawling    1     377     569    4019
    s01  Crawling    1     357     594    4051
    s01  Crawling    1     333     638    4049
    s01  Crawling    1     340     678    4053
    s01  Crawling    1     372     708    4051
    s01  Crawling    1     410     733    4028
    s01  Crawling    1     450     733    3988
    s01  Crawling    1     492     696    3947
    s01  Crawling    1     518     677    3943
    s01  Crawling    1     528     695    3988
   s01  Crawling    1      -1    2558    4609
   s01   Running    1     -44   -3971     843
   s01   Running    1     -47   -3982     836
   s01   Running    1     -43   -3973     832
   s01   Running    1     -40   -3973     834
   s01   Running    1     -48   -3978     844
   s01   Running    1     -52   -3993     842
   s01   Running    1     -64   -3984     821
   s01   Running    1     -64   -3966     813
   s01   Running    1     -66   -3971     826
   s01   Running    1     -62   -3988     827
   s01   Running    1     -57   -3984     843

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> client.pipeline.add_feature_generator([{'name':'Second Sigma Crossing Rate',
                             'params':{"columns": ['accelx','accely', 'accelz']}
                            }])
>>> results, stats = client.pipeline.execute()

>>> print results
    out:
          Class  Rep Subject  gen_0001_accelx2ndSigmaCrossingRate  gen_0002_accely2ndSigmaCrossingRate  gen_0003_accelz2ndSigmaCrossingRate
    0  Crawling    1     s01                             0.090909                             0.090909                                  0.0
    1   Running    1     s01                             0.000000                             0.000000                                  0.0

Sigma Crossing Rate

Calculates the rate at which standard deviation value (sigma) is crossed for each specified column. The total number of sigma crossings are found and then the number is divided by total number of samples to get the sigma_crossing_rate.

Parameters: columns – The columns represents a list of all column names on which sigma_crossing_rate is to be found.
Returns: Return the sigma crossing rate.
Return type: DataFrame

Examples

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    s01  Crawling    1     377     569    4019
    s01  Crawling    1     357     594    4051
    s01  Crawling    1     333     638    4049
    s01  Crawling    1     340     678    4053
    s01  Crawling    1     372     708    4051
    s01  Crawling    1     410     733    4028
    s01  Crawling    1     450     733    3988
    s01  Crawling    1     492     696    3947
    s01  Crawling    1     518     677    3943
    s01  Crawling    1     528     695    3988
   s01  Crawling    1      -1    2558    4609
   s01   Running    1     -44   -3971     843
   s01   Running    1     -47   -3982     836
   s01   Running    1     -43   -3973     832
   s01   Running    1     -40   -3973     834
   s01   Running    1     -48   -3978     844
   s01   Running    1     -52   -3993     842
   s01   Running    1     -64   -3984     821
   s01   Running    1     -64   -3966     813
   s01   Running    1     -66   -3971     826
   s01   Running    1     -62   -3988     827
   s01   Running    1     -57   -3984     843

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> client.pipeline.add_feature_generator([{'name':'Sigma Crossing Rate',
                             'params':{"columns": ['accelx','accely', 'accelz']}
                            }])
>>> results, stats = client.pipeline.execute()

>>> print results
    out:
          Class  Rep Subject  gen_0001_accelxSigmaCrossingRate  gen_0002_accelySigmaCrossingRate  gen_0003_accelzSigmaCrossingRate
    0  Crawling    1     s01                          0.090909                               0.0                               0.0
    1   Running    1     s01                          0.000000                               0.0                               0.0

Threshold Crossing Rate

Calculates the rate at which each specified column crosses a given threshold. The total number of threshold crossings are found, and then the number is divided by the total number of samples to get the threshold_crossing_rate.

Parameters

input_data (DataFrame) – The input data.
columns (list of strings) – A list of all column names on which threshold_crossing_rate is to be found.
threshold (int, optional) – The threshold value. Defaults to 0.

Returns

Return the number of threshold crossings divided by the length of the signal.

Return type

DataFrame

Threshold With Offset Crossing Rate

Calculates the rate at which each specified column crosses a given threshold with a specified offset. The total number of threshold crossings are found, and then the number is divided by the total number of samples to get the threshold_crossing_rate.

Parameters

input_data (DataFrame) – The input data.
columns (list of strings) – A list of all column names on which threshold_crossing_rate is to be found.
threshold (int, optional) – The threshold value. Defaults to 0.
offset (int, optional) – The offset value. Defaults to 0.

Returns

Return the number of threshold crossings divided by the length of the signal.

Return type

DataFrame

Zero Crossing Rate

Calculates the rate at which zero value is crossed for each specified column. The total number of zero crossings are found and then the number is divided by total number of samples to get the zero_crossing_rate.

Parameters: columns – The columns represents a list of all column names on which zero_crossing_rate is to be found.
Returns: A DataFrame of containing zero crossing rate

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Zero Crossing Rate',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxZeroCrossingRate  gen_0002_accelyZeroCrossingRate  gen_0003_accelzZeroCrossingRate
    0     s01                              0.6                              0.0                              0.0

Frequency

Dominant Frequency

Calculate the dominant frequency for each specified signal. For each column, find the frequency at which the signal has highest power.

Note: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero padded

Parameters: columns – List of columns on which dominant_frequency needs to be calculated
Returns: DataFrame of dominant_frequency for each column and the specified group_columns

Examples

>>> import matplotlib.pyplot as plt
>>> import numpy as np

>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:].tolist()

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )

>>> client.pipeline.add_feature_generator([{'name':'Dominant Frequency',
                             'params':{"columns": ['accelx', 'accely', 'accelz' ],
                                      "sample_rate" : sample
                                      }
                            }])

>>> result, stats = client.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelxDomFreq  gen_0002_accelyDomFreq  gen_0003_accelzDomFreq
    0      0     s01                    22.0                    28.0                    34.0
    1      1     s01                    22.0                    26.0                    52.0

MFCC

Translates the data stream(s) from a segment into a feature vector of Mel-Frequency Cepstral Coefficients (MFCC). The features are derived in the frequency domain that mimic human auditory response.

Note: The current FFT length is 512. Data larger than this will be truncated. Data smaller than this will be zero padded.

Parameters

input_data (DataFrame) – The input data.
columns (list of strings) – Names of the sensor streams to use.
sample_rate (int) – Sampling rate
cepstra_count (int) – Number of coefficients to generate.

Returns

Feature vector of MFCC coefficients.

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'MFCC', 'params':{"columns": ['accelx'],
                                                      "sample_rate": 10,
                                                      "cepstra_count": 23 }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxmfcc_000000  gen_0001_accelxmfcc_000001 ... gen_0001_accelxmfcc_000021  gen_0001_accelxmfcc_000022
    0     s01                    131357.0                    -46599.0 ...                      944.0                       308.0

MFE

Translates the data stream(s) from a segment into a feature vector of Mel-Frequency Energy (MFE). The power spectrum of each frame of audio is passed through a filterbank of triangular filters which are spaced uniformly in the mel-frequency domain.

Parameters

input_data (DataFrame) – The input data.
columns (list of strings) – Names of the sensor streams to use.
num_filters (int) – Number of filters for the mel-scale filterbank.

Returns

Feature vector of MFE coefficients.

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'MFE', 'params':{"columns": ['accelx'],
                                                      "num_filters": 23 }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxmfe_000000  gen_0001_accelxmfe_000001 ... gen_0001_accelxmfe_000021  gen_0001_accelxmfe_000022
    0     s01                    131357.0                    -46599.0 ...                      944.0                       308.0

Peak Frequencies

Calculate the peak frequencies for each specified signal. For each column, find the frequencies at which the signal has highest power.

Note: The current FFT length is 512. Data larger than this will be truncated. Data smaller than this will be zero padded.: The FFT is computed and the cutoff frequency is converted to a bin based on the following formula: fft_min_bin_index = (min_freq * FFT_length / sample_rate) fft_max_bin_index = (max_freq * FFT_length / sample_rate)

Parameters

input_data (DataFrame) – The input data.
columns (List[str]) – A list of column names on which ‘dominant_frequency’ needs to be calculated.
sample_rate (int) – The sample rate of the sensor data.
window_type (str) – The type of window to apply to the signal before taking the FFT. Currently only ‘hanning’ window is supported.
num_peaks (int) – The number of peaks to identify.
min_frequency (int) – The minimum frequency bound to look for peaks.
max_frequency (int) – The maximum frequency bound to look for peaks.
threshold (int) – The threshold value that a peak must be above to be considered a peak.

Returns

DataFrame containing the ‘peak frequencies’ for each column and the specified group_columns.

Return type

DataFrame

Power Spectrum

Calculate the power spectrum for the signal. The resulting power spectrum will be binned into number_of_bins.

Note: The current FFT length is 512. Data larger than this will be truncated. Data smaller than this will be zero padded.

Parameters

input_data (DataFrame) – The input data.
columns (List[str]) – A list of column names to use for the frequency calculation.
window_type (str) – The type of window to apply to the signal before taking the FFT. Defaults to ‘hanning’.
number_of_bins (int) – The number of bins to use to compute the power spectrum.

Returns

DataFrame containing the ‘power spectrum’ for each column and the specified group_columns.

Return type

DataFrame

Spectral Entropy

Calculate the spectral entropy for each specified signal. For each column, first calculate the power spectrum, and then using the power spectrum, calculate the entropy in the spectral domain. Spectral entropy measures the spectral complexity of the signal.

Note: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero padded

Parameters: columns – List of all columns for which spectral_entropy is to be calculated
Returns: DataFrame of spectral_entropy for each column and the specified group_columns

Examples

>>> import matplotlib.pyplot as plt
>>> import numpy as np

>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:].tolist()

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )

>>> client.pipeline.add_feature_generator([{'name':'Spectral Entropy',
                             'params':{"columns": ['accelx', 'accely', 'accelz' ]}
                            }])

>>> result, stats = client.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelxSpecEntr  gen_0002_accelySpecEntr  gen_0003_accelzSpecEntr
    0      0     s01                  1.97852                 1.983631                 1.981764
    1      1     s01                  1.97852                 2.111373                 2.090683

Shape

Global Peak to Peak of High Frequency

Global peak to peak of high frequency. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.

Parameters

smoothing_factor (int) – over the cutoff frequency. The number of elements in individual columns should be al least three times the smoothing factor.
columns – List of str; Set of columns on which to apply the feature generator

Returns

DataFrame of global p2p high frequency for each column and the specified group_columns

Examples

>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )

>>> client.pipeline.add_feature_generator([{'name':'Global Peak to Peak of High Frequency',
                             'params':{"smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz'] }}])

>>> result, stats = client.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelxMaxP2PGlobalAC  gen_0002_accelyMaxP2PGlobalAC  gen_0003_accelzMaxP2PGlobalAC
    0      0     s01                            3.6                            7.8                      86.400002
    1      1     s01                            3.6                            7.8                     165.000000

Global Peak to Peak of Low Frequency

Global peak to peak of low frequency. The low frequency signal is calculated by applying a moving average filter with a smoothing factor.

Parameters

smoothing_factor (int) – frequencies over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator

Returns

DataFrame of global p2p low frequency for each column and the specified group_columns

Examples

>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )

>>> client.pipeline.add_feature_generator([{'name':'Global Peak to Peak of Low Frequency',
                             'params':{"smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz'] }}])

>>> result, stats = client.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelxMaxP2PGlobalDC  gen_0002_accelyMaxP2PGlobalDC  gen_0003_accelzMaxP2PGlobalDC
    0      0     s01                     195.600006                     191.800003                     187.000000
    1      1     s01                     195.600006                     191.800003                     185.800003

Max Peak to Peak of first half of High Frequency

Max Peak to Peak of first half of High Frequency. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.

Parameters

smoothing_factor (int) – frequencies over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator

Returns

DataFrame of max p2p half high frequency for each column and the specified group_columns

Examples

>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )

>>> client.pipeline.add_feature_generator([{'name':'Max Peak to Peak of first half of High Frequency',
                             'params':{"smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz'] }}])

>>> result, stats = client.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelxMaxP2P1stHalfAC  gen_0002_accelyMaxP2P1stHalfAC  gen_0003_accelzMaxP2P1stHalfAC
    0      0     s01                             1.8                             7.0                             1.8
    1      1     s01                             1.8                             7.0                            20.0

Global Min Max Sum

This function is the sum of the maximum and minimum values. It is also used as the ‘min max amplitude difference’.

Parameters: columns – (list of str): Set of columns on which to apply the feature generator
Returns: DataFrame of min max sum for each column and the specified group_columns

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]],
                    columns=['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )

>>> client.pipeline.add_feature_generator([{'name':'Global Min Max Sum',
                             'params':{"columns": ['accelx','accely','accelz'] }}])

>>> result, stats = client.pipeline.execute()
>>> print result
    out:
       Subject  gen_0001_accelxMinMaxSum  gen_0002_accelyMinMaxSum  gen_0003_accelzMinMaxSum
    0     s01                       0.0                      15.0                      11.0

Global Peak to Peak

Global Peak to Peak of signal.

Parameters: columns – (list of str): Set of columns on which to apply the feature generator
Returns: DataFrame of peak to peak for each column and the specified group_columns

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]],
                    columns=['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )

>>> client.pipeline.add_feature_generator([{'name':'Global Peak to Peak',
                             'params':{"columns": ['accelx','accely','accelz'] }}])

>>> result, stats = client.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxP2P  gen_0002_accelyP2P  gen_0003_accelzP2P
    0     s01                 6.0                 3.0                 5.0

Shape Absolute Median Difference

Computes the absolute value of the difference in median between the first and second half of a signal

Parameters

columns – list of columns on which to apply the feature generator
center_ratio – ratio of the signal to be on the first half to second half

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Shape Absolute Median Difference',
                             'params':{"columns": ['accelx', 'accely', 'accelz'],
                                    "center_ratio": 0.5}
                            }])
>>> result, stats = client.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxShapeAbsoluteMedianDifference  gen_0002_accelyShapeAbsoluteMedianDifference  gen_0003_accelzShapeAbsoluteMedianDifference
    0     s01

Difference of Peak to Peak of High Frequency between two halves

Calculates the difference of peak to peak of high frequency between two halves. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.

Parameters

input_data (DataFrame) – Input pandas dataframe.
smoothing_factor (int) – Determines the amount of attenuation for frequencies over the cutoff frequency. Number of elements in individual columns should be at least three times the smoothing factor.
columns (List[str]) – List of column names on which to apply the feature generator.

Returns

difference high frequency for each column and the specified group_columns.

Return type

DataFrame

Examples

>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )

>>> client.pipeline.add_feature_generator([{'name':'Difference of Peak to Peak of High Frequency between two halves',
                             'params':{"smoothing_factor": 5,
                                       "columns": ['accelz'] }}])

>>> result, stats = client.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelzACDiff
    0      0     s01              -5.199997
    1      1     s01              13.000000

Shape Median Difference

Computes the difference in median between the first and second half of a signal

Parameters

columns – list of columns on which to apply the feature generator
center_ratio – ratio of the signal to be on the first half to second half

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> client.pipeline.add_feature_generator([{'name':'Shape Median Difference',
                             'params':{"columns": ['accelx', 'accely', 'accelz'],
                                       "center_ratio: 0.5}
                            }])
>>> result, stats = client.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxShapeMedianDifference  gen_0002_accelyShapeMedianDifference  gen_0003_accelzShapeMedianDifference
    0     s01

Ratio of Peak to Peak of High Frequency between two halves

Calculates the ratio of peak to peak of high frequency between two halves. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.

Parameters

input_data (DataFrame) – Input pandas dataframe.
smoothing_factor (int) – Determines the amount of attenuation for frequencies over the cutoff frequency. Number of elements in individual columns should be at least three times the smoothing factor.
columns (List[str]) – List of column names on which to apply the feature generator.

Returns

ratio high frequency for each column and the specified group_columns.

Return type

DataFrame

Examples

>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )

>>> client.pipeline.add_feature_generator([{'name':'Ratio of Peak to Peak of High Frequency between two halves',
                             'params':{"smoothing_factor": 5,
                                       "columns": ['accelz'] }}])

>>> result, stats = client.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelzACRatio
    0      0     s01                3.888882
    1      1     s01                0.350000

Time

Abs Percent Time Over Threshold: Percentage of absolute value of samples in the series that are above the offset

Average Time Over Threshold: Average of the time spent above threshold for all times crossed.

Percent Time Over Second Sigma

Percentage of samples in the series that are above the sample mean + two sigma

Parameters: columns – List of str; Set of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    s01  Crawling    1     377     569    4019
    s01  Crawling    1     357     594    4051
    s01  Crawling    1     333     638    4049
    s01  Crawling    1     340     678    4053
    s01  Crawling    1     372     708    4051
    s01  Crawling    1     410     733    4028
    s01  Crawling    1     450     733    3988
    s01  Crawling    1     492     696    3947
    s01  Crawling    1     518     677    3943
    s01  Crawling    1     528     695    3988
   s01  Crawling    1      -1    2558    4609
   s01   Running    1     -44   -3971     843
   s01   Running    1     -47   -3982     836
   s01   Running    1     -43   -3973     832
   s01   Running    1     -40   -3973     834
   s01   Running    1     -48   -3978     844
   s01   Running    1     -52   -3993     842
   s01   Running    1     -64   -3984     821
   s01   Running    1     -64   -3966     813
   s01   Running    1     -66   -3971     826
   s01   Running    1     -62   -3988     827
   s01   Running    1     -57   -3984     843

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> client.pipeline.add_feature_generator([{'name':'Percent Time Over Second Sigma',
                             'params':{"columns": ['accelx','accely','accelz'] }}])
>>> results, stats = client.pipeline.execute()

>>> print results
    out:
          Class  Rep Subject  gen_0001_accelxPctTimeOver2ndSigma  gen_0002_accelyPctTimeOver2ndSigma  gen_0003_accelzPctTimeOver2ndSigma
    0  Crawling    1     s01                                 0.0                            0.090909                            0.090909
    1   Running    1     s01                                 0.0                            0.000000                            0.000000

Percent Time Over Sigma

Percentage of samples in the series that are above the sample mean + one sigma

Parameters: columns – List of str; Set of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    s01  Crawling    1     377     569    4019
    s01  Crawling    1     357     594    4051
    s01  Crawling    1     333     638    4049
    s01  Crawling    1     340     678    4053
    s01  Crawling    1     372     708    4051
    s01  Crawling    1     410     733    4028
    s01  Crawling    1     450     733    3988
    s01  Crawling    1     492     696    3947
    s01  Crawling    1     518     677    3943
    s01  Crawling    1     528     695    3988
   s01  Crawling    1      -1    2558    4609
   s01   Running    1     -44   -3971     843
   s01   Running    1     -47   -3982     836
   s01   Running    1     -43   -3973     832
   s01   Running    1     -40   -3973     834
   s01   Running    1     -48   -3978     844
   s01   Running    1     -52   -3993     842
   s01   Running    1     -64   -3984     821
   s01   Running    1     -64   -3966     813
   s01   Running    1     -66   -3971     826
   s01   Running    1     -62   -3988     827
   s01   Running    1     -57   -3984     843

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> client.pipeline.add_feature_generator([{'name':'Percent Time Over Sigma',
                             'params':{"columns": ['accelx','accely','accelz'] }}])
>>> results, stats = client.pipeline.execute()

>>> print results
    out:
        Class  Rep Subject  gen_0001_accelxPctTimeOverSigma  gen_0002_accelyPctTimeOverSigma  gen_0003_accelzPctTimeOverSigma
  0  Crawling    1     s01                         0.181818                         0.090909                         0.090909
  1   Running    1     s01                         0.272727                         0.090909                         0.272727

Percent Time Over Threshold

Percentage of samples in the series that are above threshold

Parameters: columns – List of str; Set of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    s01  Crawling    1     377     569    4019
    s01  Crawling    1     357     594    4051
    s01  Crawling    1     333     638    4049
    s01  Crawling    1     340     678    4053
    s01  Crawling    1     372     708    4051
    s01  Crawling    1     410     733    4028
    s01  Crawling    1     450     733    3988
    s01  Crawling    1     492     696    3947
    s01  Crawling    1     518     677    3943
    s01  Crawling    1     528     695    3988
   s01  Crawling    1      -1    2558    4609
   s01   Running    1     -44   -3971     843
   s01   Running    1     -47   -3982     836
   s01   Running    1     -43   -3973     832
   s01   Running    1     -40   -3973     834
   s01   Running    1     -48   -3978     844
   s01   Running    1     -52   -3993     842
   s01   Running    1     -64   -3984     821
   s01   Running    1     -64   -3966     813
   s01   Running    1     -66   -3971     826
   s01   Running    1     -62   -3988     827
   s01   Running    1     -57   -3984     843

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> client.pipeline.add_feature_generator([{'name':'Percent Time Over Threshold',
                             'params':{"columns": ['accelx','accely','accelz'] }}])
>>> results, stats = client.pipeline.execute()

Percent Time Over Zero

Percentage of samples in the series that are positive.

Parameters: columns – List of str; Set of columns on which to apply the feature generator
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    s01  Crawling    1     377     569    4019
    s01  Crawling    1     357     594    4051
    s01  Crawling    1     333     638    4049
    s01  Crawling    1     340     678    4053
    s01  Crawling    1     372     708    4051
    s01  Crawling    1     410     733    4028
    s01  Crawling    1     450     733    3988
    s01  Crawling    1     492     696    3947
    s01  Crawling    1     518     677    3943
    s01  Crawling    1     528     695    3988
   s01  Crawling    1      -1    2558    4609
   s01   Running    1     -44   -3971     843
   s01   Running    1     -47   -3982     836
   s01   Running    1     -43   -3973     832
   s01   Running    1     -40   -3973     834
   s01   Running    1     -48   -3978     844
   s01   Running    1     -52   -3993     842
   s01   Running    1     -64   -3984     821
   s01   Running    1     -64   -3966     813
   s01   Running    1     -66   -3971     826
   s01   Running    1     -62   -3988     827
   s01   Running    1     -57   -3984     843

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> client.pipeline.add_feature_generator([{'name':'Percent Time Over Zero',
                             'params':{"columns": ['accelx','accely','accelz'] }}])
>>> results, stats = client.pipeline.execute()

>>> print results
    out:
        Class  Rep Subject  gen_0001_accelxPctTimeOverZero  gen_0002_accelyPctTimeOverZero  gen_0003_accelzPctTimeOverZero
  0  Crawling    1     s01                        0.909091                             1.0                             1.0
  1   Running    1     s01                        0.000000                             0.0                             1.0

Duration of the Signal

Duration of the signal. It is calculated by dividing the length of vector by the sampling rate.

Parameters

sample_rate – float; Sampling rate
columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Duration of the Signal',
                             'params':{"columns": ['accelx'] ,
                                       "sample_rate": 10
                                      }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
       Subject  gen_0001_accelxDurSignal
     0     s01                       0.5

Area

Absolute Area

Absolute area of the signal. Absolute area = sum(abs(signal(t)) dt), where abs(signal(t)) is absolute signal value at time t, and dt is sampling time (dt = 1/sample_rate).

Parameters

sample_rate – Sampling rate of the signal
columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Absolute Area',
                             'params':{"columns": ['accelx', 'accely', 'accelz'],
                                       "sample_rate": 10 }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxAbsArea  gen_0002_accelyAbsArea  gen_0003_accelzAbsArea
    0     s01                     1.0                     3.6                     2.9

Absolute Area of High Frequency

Absolute area of high frequency components of the signal. It calculates absolute area by applying a moving average filter on the signal with a smoothing factor and subtracting the filtered signal from the original.

Parameters

sample_rate – float; Sampling rate of the signal
smoothing_factor (int) – Determines the amount of attenuation for frequencies over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Absolute Area of High Frequency',
                             'params':{"sample_rate": 10,
                                       "smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxAbsAreaAc  gen_0002_accelyAbsAreaAc  gen_0003_accelzAbsAreaAc
    0     s01                 76.879997                800.099976                470.160004

Absolute Area of Low Frequency

Absolute area of low frequency components of the signal. It calculates absolute area by first applying a moving average filter on the signal with a smoothing factor.

Parameters

sample_rate – float; Sampling rate of the signal
smoothing_factor (int) – over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Absolute Area of Spectrum',
                             'params':{"sample_rate": 10,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxAbsAreaSpec  gen_0002_accelyAbsAreaSpec  gen_0003_accelzAbsAreaSpec
    0     s01                       260.0                      2660.0                      1830.0

Absolute Area of Spectrum

Absolute area of spectrum.

Parameters

sample_rate – Sampling rate of the signal
columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])

>>> client.pipeline.add_feature_generator([{'name':'Absolute Area of Spectrum',
                             'params':{"sample_rate": 10,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxAbsAreaSpec  gen_0002_accelyAbsAreaSpec  gen_0003_accelzAbsAreaSpec
    0     s01                       260.0                      2660.0                      1830.0

Total Area

Total area under the signal. Total area = sum(signal(t)*dt), where signal(t) is signal value at time t, and dt is sampling time (dt = 1/sample_rate).

Parameters

sample_rate – Sampling rate of the signal
columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])

>>> client.pipeline.add_feature_generator([{'name':'Total Area',
                             'params':{"sample_rate": 10,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen0001_accelxTotArea  gen_0002_accelyTotArea  gen_0003_accelzTotArea
    0     s01                    0.0                     3.6                     2.9

Total Area of High Frequency

Total area of high frequency components of the signal. It calculates total area by applying a moving average filter on the signal with a smoothing factor and subtracting the filtered signal from the original.

Parameters

sample_rate – float; Sampling rate of the signal
smoothing_factor (int) – Determines the amount of attenuation for frequencies over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Total Area of High Frequency',
                             'params':{"sample_rate": 10,
                                       "smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxTotAreaAc  gen_0002_accelyTotAreaAc  gen_0003_accelzTotAreaAc
    0     s01                       0.0                      0.12                      0.28

Total Area of Low Frequency

Total area of low frequency components of the signal. It calculates total area by first applying a moving average filter on the signal with a smoothing factor.

Parameters

sample_rate – float; Sampling rate of the signal
smoothing_factor (int) – frequencies over the cutoff frequency.
columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Total Area of Low Frequency',
                             'params':{"sample_rate": 10,
                                       "smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0001_accelxTotAreaDc  gen_0002_accelyTotAreaDc  gen_0003_accelzTotAreaDc
    0     s01                       0.0                      0.72                      0.58

Energy

Average Demeaned Energy

Average Demeaned Energy.

Calculate the element-wise demeaned by its column average of the input columns.
Sum the squared components across each column for the total demeaned energy per sample.
Take the average of the sum of squares to get the average demeaned energy.

Parameters: columns – List of str; The columns represents a list of all column names on which average_energy is to be found.
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Average Demeaned Energy',
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0000_AvgDemeanedEng
    0     s01                     9.52

Average Energy

Average Energy.

Calculate the element-wise square of the input columns.
Sum the squared components across each column for the total energy per sample.
Take the average of the sum of squares to get the average energy.

\[\frac{1}{N}\sum_{i=1}^{N}x_{i}^2+y_{i}^2+..n_{i}^2\]

Parameters: columns – List of str; The columns represents a list of all column names on which average_energy is to be found.
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print(df)
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Average Energy',
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute()

>>> print(result)
    out:
      Subject  gen_0000_AvgEng
    0     s01             95.0

Total Energy

Total Energy.

Calculate the element-wise abs sum of the input columns.
Sum the energy values over all streams to get the total energy.

Parameters: columns – List of str; The columns represents a list of all column names on which total energy is to be found.
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Total Energy',
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute()

>>> print result
    out:
      Subject  gen_0000_TotEng
    0     s01            475.0

Physical

Average of Movement Intensity

Calculates the average movement intensity defined by:

\[\frac{1}{N}\sum_{i=1}^{N} \sqrt{x_{i}^2 + y_{i}^2 + .. n_{i}^2}\]

Parameters: columns (list) – list of columns to calculate average movement intensity.
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print(df)
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Average of Movement Intensity',
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute()

>>> print(result)
    out:
      Subject  gen_0000_AvgInt
    0     s01         9.0

Average Signal Magnitude Area

Average signal magnitude area.

\[\frac{1}{N}\sum_{i=1}^{N} {x_{i} + y_{i} + .. n_{i}}\]

Parameters: columns – List of str; The columns represents a list of all column names on which average_energy is to be found.
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print(df)
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':"Average Signal Magnitude Area",
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute()

>>> print(result)
    out:
      Subject  gen_0000_AvgSigMag
        s01          13.0

Variance of Movement Intensity

Variance of movement intensity

Parameters: columns – List of str; The columns represents a list of all column names on which average_energy is to be found.
Returns: Returns data frame with specified column(s).
Return type: DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print(df)
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01

>>> client.pipeline.reset(delete_cache=False)
>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> client.pipeline.add_feature_generator([{'name':'Variance of Movement Intensity',
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = client.pipeline.execute()

>>> print(result)
    out:
      Subject  gen_0000_VarInt
    0     s01         3.082455

library.core_functions.feature_generators.fg_physical.magnitude(input_data, input_columns): Computes the magnitude of each column in a dataframe

Sensor Fusion

Abs Max Column

Returns the index of the column with the max abs value for each segment.

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use

Returns

feature vector with index of max abs value column.

Return type

DataFrame

Cross Column Correlation

Compute the correlation of the slopes between two columns.

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use
sample_frequency (int) – frequency to sample correlation at. Default 1 which is every sample

Returns

feature vector mean difference

Return type

DataFrame

Max Column

Returns the index of the column with the max value for each segment.

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use

Returns

feature vector with index of max column.

Return type

DataFrame

Cross Column Mean Crossing Rate

Compute the crossing rate of column 2 of over the mean of column 1

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use (requires 2 inputs)

Returns

feature vector mean crossing rate

Return type

DataFrame

Cross Column Mean Crossing with Offset

Compute the crossing rate of column 2 of over the mean of column 1

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use (requires 2 inputs)

Returns

feature vector mean crossing rate

Return type

DataFrame

Two Column Mean Difference

Compute the mean difference between two columns.

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use

Returns

feature vector mean difference

Return type

DataFrame

Two Column Median Difference

Compute the median difference between two columns.

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use

Returns

feature vector median difference

Return type

DataFrame

Min Column

Returns the index of the column with the min value for each segment.

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use

Returns

feature vector with index of max abs value column.

Return type

DataFrame

Two Column Min Max Difference

Compute the min max difference between two columns. Computes the location of the min value for each of the two columns, whichever one larger, it computes the difference between the two at that index.

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use

Returns

feature vector difference of two columns

Return type

DataFrame

Two Column Peak To Peak Difference

Compute the max value for each column, then subtract the first column for the second.

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use

Returns

feature vector mean difference

Return type

DataFrame

Two Column Peak Location Difference

Computes the location of the maximum value for each column and then finds the difference: between those two points.

Parameters

input_data (DataFrame) – input data
columns (list of strings) – name of the sensor streams to use

Returns

feature vector mean difference

Return type

DataFrame