Feature Generators

A collection of feature generators work on a segment of data to extract meaningful information. The combination of the output from all feature generators becomes a feature vector.

Statistical

Absolute Mean

Computes the arithmetic mean of absolute value in each column of columns in the dataframe.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Absolute Mean',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxAbsMean  gen_0002_accelyAbsMean  gen_0003_accelzAbsMean
    0     s01                     2.0                     7.2                     5.8
Absolute Sum

Computes the cumulative sum of absolute values in each column in ‘columns’ in the dataframe.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Absolute Sum',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxAbsSum  gen_0002_accelyAbsSum  gen_0003_accelzAbsSum
    0     s01                   10.0                   36.0                   29.0
Interquartile Range

The IQR (inter quartile range) of a vector V with N items, is the difference between the 75th percentile and 25th percentile value.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Interquartile Range',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxIQR  gen_0002_accelyIQR  gen_0003_accelzIQR
    0     s01                 4.0                 2.0                 2.0
Kurtosis

Kurtosis is the degree of ‘peakedness’ or ‘tailedness’ in the distribution and is related to the shape. A high Kurtosis portrays a chart with fat tail and peaky distribution, whereas a low Kurtosis corresponds to the skinny tails and the distribution is concentrated towards the mean. Kurtosis is calculated using Fisher’s method.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Kurtosis',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxKurtosis  gen_0002_accelyKurtosis  gen_0003_accelzKurtosis
    0     s01                -1.565089                -1.371972                -1.005478
Linear Regression Stats

Calculate a linear least-squares regression and returns the linear regression stats which are slope, intercept, r value, standard error.

slope: Slope of the regression line. intercept: Intercept of the regression line. r value: Correlation coefficient. StdErr: Standard error of the estimated gradient.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> from pandas import DataFrame
>>> df = pd.DataFrame({'Subject': ['s01'] * 10,'Class': ['Crawling'] * 10 ,'Rep': [1] * 10 })
>>> df["X"] = [i + 2 for i in range(10)]
>>> df["Y"] = [i for i in range(10)]
>>> df["Z"] = [1, 2, 3, 3, 5, 5, 7, 7, 9, 10]
>>> print(df)
    out:
      Subject     Class  Rep   X  Y   Z
    0     s01  Crawling    1   2  0   1
    1     s01  Crawling    1   3  1   2
    2     s01  Crawling    1   4  2   3
    3     s01  Crawling    1   5  3   3
    4     s01  Crawling    1   6  4   5
    5     s01  Crawling    1   7  5   5
    6     s01  Crawling    1   8  6   7
    7     s01  Crawling    1   9  7   7
    8     s01  Crawling    1  10  8   9
    9     s01  Crawling    1  11  9  10
>>> dsk.upload_dataframe('test_data', df, force=True)
>>> dsk.pipeline.reset(delete_cache=True)
>>> dsk.pipeline.set_input_data('test_data.csv',
                                group_columns=['Subject','Rep'],
                                label_column='Class',
                                data_columns=['X','Y','Z'])
>>> dsk.pipeline.add_feature_generator([{'name':'Linear Regression Stats',
                                         'params':{"columns": ['X','Y','Z'] }}])
>>> results, stats = dsk.pipeline.execute()
>>> print(results.T)
    out:
                                             0
    Rep                                      1
    Subject                                s01
    gen_0001_XLinearRegressionSlope          1
    gen_0001_XLinearRegressionIntercept      2
    gen_0001_XLinearRegressionR              1
    gen_0001_XLinearRegressionStdErr         0
    gen_0002_YLinearRegressionSlope          1
    gen_0002_YLinearRegressionIntercept      0
    gen_0002_YLinearRegressionR              1
    gen_0002_YLinearRegressionStdErr         0
    gen_0003_ZLinearRegressionSlope      0.982
    gen_0003_ZLinearRegressionIntercept  0.782
    gen_0003_ZLinearRegressionR          0.987
    gen_0003_ZLinearRegressionStdErr     0.056
Maximum

Computes the maximum of each column in ‘columns’ in the dataframe. A maximum of a vector V the maximum value in V.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Maximum',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxmaximum  gen_0002_accelymaximum  gen_0003_accelzmaximum
    0     s01                     3.0                     9.0                     8.0
Mean

Computes the arithmetic mean of each column in columns in the dataframe.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Mean',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxMean  gen_0002_accelyMean  gen_0003_accelzMean
    0     s01                  0.0                  7.2                  5.8
Median

The median of a vector V with N items, is the middle value of a sorted copy of V (V_sorted). When N is even, it is the average of the two middle values in V_sorted.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Median',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxMedian  gen_0002_accelyMedian  gen_0003_accelzMedian
    0     s01                    0.0                    7.0                    6.0
Minimum

Computes the minimum of each column in ‘columns’ in the dataframe. A minimum of a vector V the minimum value in V.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Minimum',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxminimum  gen_0002_accelyminimum  gen_0003_accelzminimum
    0     s01                    -3.0                     6.0                     3.0
Negative Zero Crossings

Computes the number of times the selected input crosses the mean+threshold and mean-threshold values with a negative slope. The threshold value is specified by the user. crossing the mean value when the threshold is 0 only coutns as a single crossing.

Parameters
  • columns – list of columns on which to apply the feature generator

  • threshold – value in addition to mean which must be crossed to count as a crossing

Returns

Returns data frame with specified column(s).

Return type

DataFrame

25th Percentile

Computes the 25th percentile of each column in ‘columns’ in the dataframe. A q-th percentile of a vector V of length N is the q-th ranked value in a sorted copy of V. If the normalized ranking doesn’t match the q exactly, interpolation is done on two nearest values.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'25th Percentile',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelx25Percentile  gen_0002_accely25Percentile  gen_0003_accelz25Percentile
    0     s01                         -2.0                          6.0                          5.0
75th Percentile

Computes the 75th percentile of each column in ‘columns’ in the dataframe. A q-th percentile of a vector V of length N is the q-th ranked value in a sorted copy of V. If the normalized ranking doesn’t match the q exactly, interpolation is done on two nearest values.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with 75th percentile of each specified column.

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'75th Percentile',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelx75Percentile  gen_0002_accely75Percentile  gen_0003_accelz75Percentile
    0     s01                          2.0                          8.0                          7.0
100th Percentile

Computes the 100th percentile of each column in ‘columns’ in the dataframe. A 100th percentile of a vector V the maximum value in V.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns feature vector with 100th percentile (sample maximum) of each specified column.

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                        [-2, 8, 7], [2, 9, 6]],
                        columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'100th Percentile',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelx100Percentile  gen_0002_accely100Percentile  gen_0003_accelz100Percentile
    0     s01                           3.0                           9.0                           8.0
Positive Zero Crossings

Computes the number of times the selected input crosses the mean+threshold and mean-threshold values with a positive slope. The threshold value is specified by the user. crossing the mean value when the threshold is 0 only counts as a single crossing.

Parameters
  • columns – list of columns on which to apply the feature generator

  • threshold – value in addition to mean which must be crossed to count as a crossing

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Skewness

The skewness is the measure of asymmetry of the distribution of a variable about its mean. The skewness value can be positive, negative, or even undefined. A positive skew indicates that the tail on the right side is fatter than the left. A negative value indicates otherwise.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
            [-2, 8, 7], [2, 9, 6]],
            columns=['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Skewness',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxSkew  gen_0002_accelySkew  gen_0003_accelzSkew
    0     s01                  0.0             0.363174            -0.395871
Standard Deviation

The standard deviation of a vector V with N items, is the measure of spread of the distribution. The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(abs(x - x.mean())**2)).

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Standard Deviation',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxStd  gen_0002_accelyStd  gen_0003_accelzStd
    0     s01            2.280351             1.16619            1.720465
Sum

Computes the cumulative sum of each column in ‘columns’ in the dataframe.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Standard Deviation',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxSum  gen_0002_accelySum  gen_0003_accelzSum
    0     s01                 0.0                36.0                29.0
Variance

Computes the variance of desired column(s) in the dataframe.

Parameters

columns – list of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Variance',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxVariance  gen_0002_accelyVariance  gen_0003_accelzVariance
    0     s01                      6.5                      1.7                      3.7
Zero Crossings

Computes the number of times the selected input crosses the mean+threshold and mean-threshold values. The threshold value is specified by the user. crossing the mean value when the threshold is 0 only counts as a single crossing.

Parameters
  • columns – list of columns on which to apply the feature generator

  • threshold – value in addition to mean which must be crossed to count as a crossing

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Zero Crossings',
                             'params':{"columns": ['accelx', 'accely', 'accelz'],
                                       "threshold: 5}
                            }])
>>> result, stats = dsk.pipeline.execute()

Histogram

Histogram

Translates to the data stream(s) from a segment into a feature vector in histogram space.

Parameters
  • column (list of strings) – name of the sensor streams to use

  • range_left (int) – the left limit (or the min) of the range for a fixed bin histogram

  • range_right (int) – the right limit (or the max) of the range for a fixed bin histogram

  • number_of_bins (int, optional) – the number of bins used for the histogram

  • scaling_factor (int, optional) – scaling factor used to fit for the device

Returns

feature vector in histogram space.

Return type

DataFrame

Examples

>>> dsk.pipeline.reset()
>>> df = dsk.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    0      s01  Crawling    1     377     569    4019
    1      s01  Crawling    1     357     594    4051
    2      s01  Crawling    1     333     638    4049
    3      s01  Crawling    1     340     678    4053
    4      s01  Crawling    1     372     708    4051
    5      s01  Crawling    1     410     733    4028
    6      s01  Crawling    1     450     733    3988
    7      s01  Crawling    1     492     696    3947
    8      s01  Crawling    1     518     677    3943
    9      s01  Crawling    1     528     695    3988
    10     s01  Crawling    1      -1    2558    4609
    11     s01   Running    1     -44   -3971     843
    12     s01   Running    1     -47   -3982     836
    13     s01   Running    1     -43   -3973     832
    14     s01   Running    1     -40   -3973     834
    15     s01   Running    1     -48   -3978     844
    16     s01   Running    1     -52   -3993     842
    17     s01   Running    1     -64   -3984     821
    18     s01   Running    1     -64   -3966     813
    19     s01   Running    1     -66   -3971     826
    20     s01   Running    1     -62   -3988     827
    21     s01   Running    1     -57   -3984     843
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> dsk.pipeline.add_feature_generator([{'name':'Histogram',
                             'params':{"columns": ['accelx','accely','accelz'],
                                       "range_left": 10,
                                       "range_right": 1000,
                                       "number_of_bins": 5,
                                       "scaling_factor": 254 }}])
>>> results, stats = dsk.pipeline.execute()
>>> print results
    out:
          Class  Rep Subject  gen_0000_hist_bin_000000  gen_0000_hist_bin_000001  gen_0000_hist_bin_000002  gen_0000_hist_bin_000003  gen_0000_hist_bin_000004
    0  Crawling    1     s01                       8.0                      38.0                      46.0                      69.0                       0.0
    1   Running    1     s01                      85.0                       0.0                       0.0                       0.0                      85.0
Histogram Auto Scale Range

Translates to the data stream(s) from a segment into a feature vector in histogram space where the range is set by the min and max values and the number of bins by the user.

Parameters
  • column (list of strings) – name of the sensor streams to use

  • number_of_bins (int, optional) – the number of bins used for the histogram

  • scaling_factor (int, optional) – scaling factor used to fit for the device

Returns

feature vector in histogram space.

Return type

DataFrame

Examples

>>> dsk.pipeline.reset()
>>> df = dsk.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    0      s01  Crawling    1     377     569    4019
    1      s01  Crawling    1     357     594    4051
    2      s01  Crawling    1     333     638    4049
    3      s01  Crawling    1     340     678    4053
    4      s01  Crawling    1     372     708    4051
    5      s01  Crawling    1     410     733    4028
    6      s01  Crawling    1     450     733    3988
    7      s01  Crawling    1     492     696    3947
    8      s01  Crawling    1     518     677    3943
    9      s01  Crawling    1     528     695    3988
    10     s01  Crawling    1      -1    2558    4609
    11     s01   Running    1     -44   -3971     843
    12     s01   Running    1     -47   -3982     836
    13     s01   Running    1     -43   -3973     832
    14     s01   Running    1     -40   -3973     834
    15     s01   Running    1     -48   -3978     844
    16     s01   Running    1     -52   -3993     842
    17     s01   Running    1     -64   -3984     821
    18     s01   Running    1     -64   -3966     813
    19     s01   Running    1     -66   -3971     826
    20     s01   Running    1     -62   -3988     827
    21     s01   Running    1     -57   -3984     843
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> dsk.pipeline.add_feature_generator([{'name':'Histogram',
                             'params':{"columns": ['accelx','accely','accelz'],
                                       "range_left": 10,
                                       "range_right": 1000,
                                       "number_of_bins": 5,
                                       "scaling_factor": 254 }}])
>>> results, stats = dsk.pipeline.execute()
>>> print results
    out:
          Class  Rep Subject  gen_0000_hist_bin_000000  gen_0000_hist_bin_000001  gen_0000_hist_bin_000002  gen_0000_hist_bin_000003  gen_0000_hist_bin_000004
    0  Crawling    1     s01                       8.0                      38.0                      46.0                      69.0                       0.0
    1   Running    1     s01                      85.0                       0.0                       0.0                       0.0                      85.0

Sampling

Downsample

This function takes input_data dataframe as input and group by group_columns. Then for each group, it drops the passthrough_columns and perform downsampling on the remaining columns.

On each column, perform the following steps:

  • Divide the entire column into windows of size total length/new_length.

  • Calculate mean for each window

  • Concatenate all the mean values.

  • The length of the downsampled signal is equal to ‘new length’.

Then all such means are concatenated to get new_length * # of columns. These constitute features in downstream analyses. For instance, if there are three columns and the new_length value is 12, then total number of means we get is 12 * 3 = 36. Each will represent a feature.

Parameters
  • columns – List of columns to be downsampled

  • new_length – integer; Downsampled length

Returns

DataFrame; downsampled dataframe

Examples

>>> dsk.pipeline.reset()
>>> df = dsk.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    0      s01  Crawling    1     377     569    4019
    1      s01  Crawling    1     357     594    4051
    2      s01  Crawling    1     333     638    4049
    3      s01  Crawling    1     340     678    4053
    4      s01  Crawling    1     372     708    4051
    5      s01  Crawling    1     410     733    4028
    6      s01  Crawling    1     450     733    3988
    7      s01  Crawling    1     492     696    3947
    8      s01  Crawling    1     518     677    3943
    9      s01  Crawling    1     528     695    3988
    10     s01  Crawling    1      -1    2558    4609
    11     s01   Running    1     -44   -3971     843
    12     s01   Running    1     -47   -3982     836
    13     s01   Running    1     -43   -3973     832
    14     s01   Running    1     -40   -3973     834
    15     s01   Running    1     -48   -3978     844
    16     s01   Running    1     -52   -3993     842
    17     s01   Running    1     -64   -3984     821
    18     s01   Running    1     -64   -3966     813
    19     s01   Running    1     -66   -3971     826
    20     s01   Running    1     -62   -3988     827
    21     s01   Running    1     -57   -3984     843
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> dsk.pipeline.add_feature_generator([{'name':'Downsample',
                             'params':{"columns": ['accelx'],
                                       "new_length": 5}}])
>>> results, stats = dsk.pipeline.execute()
>>> print results
    out:
          Class  Rep Subject  gen_0001_accelx_0  gen_0001_accelx_1  gen_0001_accelx_2  gen_0001_accelx_3  gen_0001_accelx_4
    0  Crawling    1     s01              367.0              336.5              391.0              471.0              523.0
    1   Running    1     s01              -45.5              -41.5              -50.0              -64.0              -64.0
Downsample Average with Normalization

This function takes input_data dataframe as input and group by group_columns. Then for each group, it drops the passthrough_columns and performs a convolution on the remaining columns.

On each column, perform the following steps:

  • Divide the entire column into windows of size total length/new_length.

  • Calculate mean for each window

  • Concatenate all the mean values into a feature vector of length new_length

  • Normalize the signal to be between 0-255

Then all such means are concatenated to get new_length * # of columns. These constitute features in downstream analyses. For instance, if there are three columns and the new_length value is 12, then total number of means we get is 12 * 3 = 36. Each will represent a feature.

Parameters
  • input_data – dataframe

  • columns – List of columns to be downsampled

  • group_columns (a list) – List of columns on which grouping is to be done. Each group will go through downsampling one at a time

  • new_length – integer; Downsampled length

  • **kwargs

Returns

DataFrame; convolution avg dataframe

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[3, 3], [4, 5], [5, 7], [4, 6],
                    [3, 1], [3, 1], [4, 3], [5, 5],
                    [4, 7], [3, 6]], columns=['accelx', 'accely'])
>>> df
Out:
   accelx  accely
0       3       3
1       4       5
2       5       7
3       4       6
4       3       1
5       3       1
6       4       3
7       5       5
8       4       7
9       3       6
>>> dsk.pipeline.reset()
>>> dsk.pipeline.set_input_data('test_data', df, force=True)
>>> dsk.pipeline.add_feature_generator(["Downsample Average with Normalization"],
         params = {"group_columns": []},
                   function_defaults={"columns":['accelx', 'accely'],
                                     'new_length' : 5})
>>> result, stats = dsk.pipeline.execute()
>>> print result
    Out:
           accelx_1  accelx_2  accelx_3  accelx_4  accelx_5  accely_1  accely_2
        0       3.5       4.5         3       4.5       3.5         4       6.5
           accely_3  accely_4  accely_5
        0         1         4       6.5
Downsample Max With Normaliztion

This function takes input_data dataframe as input and group by group_columns. Then for each group, it drops the passthrough_columns and performs a max downsampling on the remaining columns.

On each column, perform the following steps:

  • Divide the entire column into windows of size total length/new_length.

  • Calculate mean for each window

  • Concatenate all the mean values into a feature vector of length new_length

  • Nomralize the signal to be between 0-255

Then all such means are concatenated to get new_length * # of columns. These constitute features in downstream analyses. For instance, if there are three columns and the new_length value is 12, then total number of means we get is 12 * 3 = 36. Each will represent a feature.

Parameters
  • input_data – dataframe

  • columns – List of columns to be downsampled

  • group_columns (a list) – List of columns on which grouping is to be done. Each group will go through downsampling one at a time

  • new_length – integer; Downsampled length

  • **kwargs

Returns

DataFrame; convolution avg dataframe

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[3, 3], [4, 5], [5, 7], [4, 6],
                    [3, 1], [3, 1], [4, 3], [5, 5],
                    [4, 7], [3, 6]], columns=['accelx', 'accely'])
>>> df
Out:
   accelx  accely
0       3       3
1       4       5
2       5       7
3       4       6
4       3       1
5       3       1
6       4       3
7       5       5
8       4       7
9       3       6
>>> dsk.pipeline.reset()
>>> dsk.pipeline.set_input_data('test_data', df, force=True)
>>> dsk.pipeline.add_feature_generator(["Downsample Max with Normalization"],
         params = {"group_columns": []},
                   function_defaults={"columns":['accelx', 'accely'],
                                     'new_length' : 5})
>>> result, stats = dsk.pipeline.execute()
>>> print result
    Out:
           accelx_1  accelx_2  accelx_3  accelx_4  accelx_5  accely_1  accely_2
        0       3.5       4.5         3       4.5       3.5         4       6.5
           accely_3  accely_4  accely_5
        0         1         4       6.5

Rate of Change

Mean Crossing Rate

Calculates the rate at which mean value is crossed for each specified column. Works with grouped data. The total number of mean value crossings are found and then the number is divided by total number of samples to get the mean_crossing_rate.

Parameters

columns – The columns represents a list of all column names on which mean_crossing_rate is to be found.

Returns

Return the number of mean crossings divided by the length of the signal.

Return type

DataFrame

Examples

>>> dsk.pipeline.reset()
>>> df = dsk.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    0      s01  Crawling    1     377     569    4019
    1      s01  Crawling    1     357     594    4051
    2      s01  Crawling    1     333     638    4049
    3      s01  Crawling    1     340     678    4053
    4      s01  Crawling    1     372     708    4051
    5      s01  Crawling    1     410     733    4028
    6      s01  Crawling    1     450     733    3988
    7      s01  Crawling    1     492     696    3947
    8      s01  Crawling    1     518     677    3943
    9      s01  Crawling    1     528     695    3988
    10     s01  Crawling    1      -1    2558    4609
    11     s01   Running    1     -44   -3971     843
    12     s01   Running    1     -47   -3982     836
    13     s01   Running    1     -43   -3973     832
    14     s01   Running    1     -40   -3973     834
    15     s01   Running    1     -48   -3978     844
    16     s01   Running    1     -52   -3993     842
    17     s01   Running    1     -64   -3984     821
    18     s01   Running    1     -64   -3966     813
    19     s01   Running    1     -66   -3971     826
    20     s01   Running    1     -62   -3988     827
    21     s01   Running    1     -57   -3984     843
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> dsk.pipeline.add_feature_generator([{'name':'Mean Crossing Rate',
                             'params':{"columns": ['accelx','accely', 'accelz']}
                            }])
>>> results, stats = dsk.pipeline.execute()
>>> print results
    out:
          Class  Rep Subject  gen_0001_accelxMeanCrossingRate  gen_0002_accelyMeanCrossingRate  gen_0003_accelzMeanCrossingRate
    0  Crawling    1     s01                         0.181818                         0.090909                         0.090909
    1   Running    1     s01                         0.090909                         0.454545                         0.363636
Mean Difference

Calculate the mean difference of each specified column. Works with grouped data. For a given column, it finds difference of ith element and (i-1)th element and finally takes the mean value of the entire column.

mean(diff(arr)) = mean(arr[i] - arr[i-1]), for all 1 <= i <= n.

Parameters

columns – The columns represents a list of all column names on which mean_difference is to be found.

Returns

Return the number of mean difference divided by the length of the signal.

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Mean Difference',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxMeanDifference  gen_0002_accelyMeanDifference  gen_0003_accelzMeanDifference
    0     s01                           1.25                           0.75                           0.25
Second Sigma Crossing Rate

Calculates the rate at which 2nd standard deviation value (second sigma) is crossed for each specified column. The total number of second sigma crossings are found and then the number is divided by total number of samples to get the second_sigma_crossing_rate.

Parameters

columns – The columns represents a list of all column names on which second_sigma_crossing_rate is to be found.

Returns

Return the second sigma crossing rate.

Return type

DataFrame

Examples

>>> dsk.pipeline.reset()
>>> df = dsk.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    0      s01  Crawling    1     377     569    4019
    1      s01  Crawling    1     357     594    4051
    2      s01  Crawling    1     333     638    4049
    3      s01  Crawling    1     340     678    4053
    4      s01  Crawling    1     372     708    4051
    5      s01  Crawling    1     410     733    4028
    6      s01  Crawling    1     450     733    3988
    7      s01  Crawling    1     492     696    3947
    8      s01  Crawling    1     518     677    3943
    9      s01  Crawling    1     528     695    3988
    10     s01  Crawling    1      -1    2558    4609
    11     s01   Running    1     -44   -3971     843
    12     s01   Running    1     -47   -3982     836
    13     s01   Running    1     -43   -3973     832
    14     s01   Running    1     -40   -3973     834
    15     s01   Running    1     -48   -3978     844
    16     s01   Running    1     -52   -3993     842
    17     s01   Running    1     -64   -3984     821
    18     s01   Running    1     -64   -3966     813
    19     s01   Running    1     -66   -3971     826
    20     s01   Running    1     -62   -3988     827
    21     s01   Running    1     -57   -3984     843
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> dsk.pipeline.add_feature_generator([{'name':'Second Sigma Crossing Rate',
                             'params':{"columns": ['accelx','accely', 'accelz']}
                            }])
>>> results, stats = dsk.pipeline.execute()
>>> print results
    out:
          Class  Rep Subject  gen_0001_accelx2ndSigmaCrossingRate  gen_0002_accely2ndSigmaCrossingRate  gen_0003_accelz2ndSigmaCrossingRate
    0  Crawling    1     s01                             0.090909                             0.090909                                  0.0
    1   Running    1     s01                             0.000000                             0.000000                                  0.0
Sigma Crossing Rate

Calculates the rate at which standard deviation value (sigma) is crossed for each specified column. The total number of sigma crossings are found and then the number is divided by total number of samples to get the sigma_crossing_rate.

Parameters

columns – The columns represents a list of all column names on which sigma_crossing_rate is to be found.

Returns

Return the sigma crossing rate.

Return type

DataFrame

Examples

>>> dsk.pipeline.reset()
>>> df = dsk.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    0      s01  Crawling    1     377     569    4019
    1      s01  Crawling    1     357     594    4051
    2      s01  Crawling    1     333     638    4049
    3      s01  Crawling    1     340     678    4053
    4      s01  Crawling    1     372     708    4051
    5      s01  Crawling    1     410     733    4028
    6      s01  Crawling    1     450     733    3988
    7      s01  Crawling    1     492     696    3947
    8      s01  Crawling    1     518     677    3943
    9      s01  Crawling    1     528     695    3988
    10     s01  Crawling    1      -1    2558    4609
    11     s01   Running    1     -44   -3971     843
    12     s01   Running    1     -47   -3982     836
    13     s01   Running    1     -43   -3973     832
    14     s01   Running    1     -40   -3973     834
    15     s01   Running    1     -48   -3978     844
    16     s01   Running    1     -52   -3993     842
    17     s01   Running    1     -64   -3984     821
    18     s01   Running    1     -64   -3966     813
    19     s01   Running    1     -66   -3971     826
    20     s01   Running    1     -62   -3988     827
    21     s01   Running    1     -57   -3984     843
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> dsk.pipeline.add_feature_generator([{'name':'Sigma Crossing Rate',
                             'params':{"columns": ['accelx','accely', 'accelz']}
                            }])
>>> results, stats = dsk.pipeline.execute()
>>> print results
    out:
          Class  Rep Subject  gen_0001_accelxSigmaCrossingRate  gen_0002_accelySigmaCrossingRate  gen_0003_accelzSigmaCrossingRate
    0  Crawling    1     s01                          0.090909                               0.0                               0.0
    1   Running    1     s01                          0.000000                               0.0                               0.0
Threshold Crossing Rate

The total number of threshold crossings are found, and the number is divided by total number of samples to get the threshold_crossing_rate.

Threshold With Offset Crossing Rate

The total number of threshold crossings are found, and the number is divided by the total number of samples to get the threshold_crossing_rate.

Zero Crossing Rate

Calculates the rate at which zero value is crossed for each specified column. The total number of zero crossings are found and then the number is divided by total number of samples to get the zero_crossing_rate.

Parameters

columns – The columns represents a list of all column names on which zero_crossing_rate is to be found.

Returns

A dataframe of containing zero crossing rate

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Zero Crossing Rate',
                             'params':{"columns": ['accelx', 'accely', 'accelz'] }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxZeroCrossingRate  gen_0002_accelyZeroCrossingRate  gen_0003_accelzZeroCrossingRate
    0     s01                              0.6                              0.0                              0.0

Frequency

Dominant Frequency

Calculate the dominant frequency for each specified signal. For each column, find the frequency at which the signal has highest power.

Note: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero padded

Parameters

columns – List of columns on which dominant_frequency needs to be calculated

Returns

DataFrame of dominant_frequency for each column and the specified group_columns

Examples

>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:].tolist()
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Dominant Frequency',
                             'params':{"columns": ['accelx', 'accely', 'accelz' ],
                                      "sample_rate" : sample
                                      }
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelxDomFreq  gen_0002_accelyDomFreq  gen_0003_accelzDomFreq
    0      0     s01                    22.0                    28.0                    34.0
    1      1     s01                    22.0                    26.0                    52.0
Peak Frequencies

Calculate the peak frequencies for each specified signal. For each column, find the frequencies at which the signal has highest power.

Note: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero padded

The FFT is computed and the cuttoff frequency is converted to a bin based on the following formula

fft_min_bin_index = (min_freq * FFT_length / sample_rate); fft_max_bin_index = (max_freq * FFT_length / sample_rate);

Parameters
  • columns – List of columns on which dominant_frequency needs to be calculated

  • sample_rate – sample rate of the sensor data

  • window_type – hanning

  • num_peaks – the number of peaks to identify

  • min_frequency – the min frequency bound to look for peaks

  • max_frequency – the max frequency bound to look for peaks

  • threshold – the threshold value a peak must be above to be considered a peak

Returns

DataFrame of peak frequencies for each column and the specified group_columns

Spectral Entropy

Calculate the spectral entropy for each specified signal. For each column, first calculate the power spectrum, and then using the power spectrum, calculate the entropy in the spectral domain. Spectral entropy measures the spectral complexity of the signal.

Note: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero padded

Parameters

columns – List of all columns for which spectral_entropy is to be calculated

Returns

DataFrame of spectral_entropy for each column and the specified group_columns

Examples

>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:].tolist()
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Spectral Entropy',
                             'params':{"columns": ['accelx', 'accely', 'accelz' ]}
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelxSpecEntr  gen_0002_accelySpecEntr  gen_0003_accelzSpecEntr
    0      0     s01                  1.97852                 1.983631                 1.981764
    1      1     s01                  1.97852                 2.111373                 2.090683
MFCC

Translates the data stream(s) from a segment into a feature vector of Mel-Frequency Cepstral Coefficients (MFCC). The features are derived in the frequency domain that mimic human auditory response.

Note: the current FFT length is 512, data larger than this will be truncated. Data smaller than this will be zero padded

Parameters
  • columns (list of strings) – names of the sensor streams to use

  • sample_rate (int) – sampling rate

  • cepstra_count (int) – number of coefficients to generate

Returns

feature vector of MFCC coefficients.

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'MFCC', 'params':{"columns": ['accelx'],
                                                      "sample_rate": 10,
                                                      "cepstra_count": 23 }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxmfcc_000000  gen_0001_accelxmfcc_000001 ... gen_0001_accelxmfcc_000021  gen_0001_accelxmfcc_000022
    0     s01                    131357.0                    -46599.0 ...                      944.0                       308.0

Shape

Global Peak to Peak of High Frequency

Global peak to peak of high frequency. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.

Parameters
  • smoothing_factor (int) – over the cutoff frequency. The number of elements in individual columns should be al least three times the smoothing factor.

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

DataFrame of global p2p high frequency for each column and the specified group_columns

Examples

>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Global Peak to Peak of High Frequency',
                             'params':{"smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelxMaxP2PGlobalAC  gen_0002_accelyMaxP2PGlobalAC  gen_0003_accelzMaxP2PGlobalAC
    0      0     s01                            3.6                            7.8                      86.400002
    1      1     s01                            3.6                            7.8                     165.000000
Global Peak to Peak of Low Frequency

Global peak to peak of low frequency. The low frequency signal is calculated by applying a moving average filter with a smoothing factor.

Parameters
  • smoothing_factor (int) – frequencies over the cutoff frequency.

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

DataFrame of global p2p low frequency for each column and the specified group_columns

Examples

>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Global Peak to Peak of Low Frequency',
                             'params':{"smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelxMaxP2PGlobalDC  gen_0002_accelyMaxP2PGlobalDC  gen_0003_accelzMaxP2PGlobalDC
    0      0     s01                     195.600006                     191.800003                     187.000000
    1      1     s01                     195.600006                     191.800003                     185.800003
Max Peak to Peak of first half of High Frequency

Max Peak to Peak of first half of High Frequency. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.

Parameters
  • smoothing_factor (int) – frequencies over the cutoff frequency.

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

DataFrame of max p2p half high frequency for each column and the specified group_columns

Examples

>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Max Peak to Peak of first half of High Frequency',
                             'params':{"smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelxMaxP2P1stHalfAC  gen_0002_accelyMaxP2P1stHalfAC  gen_0003_accelzMaxP2P1stHalfAC
    0      0     s01                             1.8                             7.0                             1.8
    1      1     s01                             1.8                             7.0                            20.0
Global Min Max Sum

This function is the sum of the maximum and minimum values. It is also used as the ‘min max amplitude difference’.

Parameters

columns – (list of str): Set of columns on which to apply the feature generator

Returns

DataFrame of min max sum for each column and the specified group_columns

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]],
                    columns=['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Global Min Max Sum',
                             'params':{"columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
       Subject  gen_0001_accelxMinMaxSum  gen_0002_accelyMinMaxSum  gen_0003_accelzMinMaxSum
    0     s01                       0.0                      15.0                      11.0
Global Peak to Peak

Global Peak to Peak of signal.

Parameters

columns – (list of str): Set of columns on which to apply the feature generator

Returns

DataFrame of peak to peak for each column and the specified group_columns

Examples

>>> from pandas import DataFrame
>>> df = DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3], [-2, 8, 7], [2, 9, 6]],
                    columns=['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Global Peak to Peak',
                             'params':{"columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxP2P  gen_0002_accelyP2P  gen_0003_accelzP2P
    0     s01                 6.0                 3.0                 5.0
Shape Absolute Median Difference

Computes the absolute value of the difference in median between the first and second half of a signal

Parameters
  • columns – list of columns on which to apply the feature generator

  • center_ratio – ratio of the signal to be on the first half to second half

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Shape Absolute Median Difference',
                             'params':{"columns": ['accelx', 'accely', 'accelz'],
                                    "center_ratio": 0.5}
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxShapeAbsoluteMedianDifference  gen_0002_accelyShapeAbsoluteMedianDifference  gen_0003_accelzShapeAbsoluteMedianDifference
    0     s01
Difference of Peak to Peak of High Frequency between two halves

Difference of peak to peak of high frequency between two halves. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.

Parameters
  • smoothing_factor (int) – frequencies over the cutoff frequency. The number of elements in individual columns should be at lest three times the smoothing factor.

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

DataFrame of difference high frequency for each column and the specified group_columns

Examples

>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Difference of Peak to Peak of High Frequency between two halves',
                             'params':{"smoothing_factor": 5,
                                       "columns": ['accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelzACDiff
    0      0     s01              -5.199997
    1      1     s01              13.000000
Shape Median Difference

Computes the difference in median between the first and second half of a signal

Parameters
  • columns – list of columns on which to apply the feature generator

  • center_ratio – ratio of the signal to be on the first half to second half

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Shape Median Difference',
                             'params':{"columns": ['accelx', 'accely', 'accelz'],
                                       "center_ratio: 0.5}
                            }])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxShapeMedianDifference  gen_0002_accelyShapeMedianDifference  gen_0003_accelzShapeMedianDifference
    0     s01
Ratio of Peak to Peak of High Frequency between two halves

Ratio of peak to peak of high frequency between two halves. The high frequency signal is calculated by subtracting the moving average filter output from the original signal.

Parameters
  • smoothing_factor (int) – frequencies over the cutoff frequency. of elements in individual columns should be al least three times the smoothing factor.

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

DataFrame of ratio high frequency for each column and the specified group_columns

Examples

>>> import numpy as np
>>> sample = 100
>>> df = pd.DataFrame()
>>> df = pd.DataFrame({ 'Subject': ['s01'] * sample ,
            'Class': ['0'] * (sample/2) + ['1'] * (sample/2) })
>>> x = np.arange(sample)
>>> fx = 2; fy = 3; fz = 5
>>> df['accelx'] = 100 * np.sin(2 * np.pi * fx * x / sample )
>>> df['accely'] = 100 * np.sin(2 * np.pi * fy * x / sample )
>>> df['accelz'] = 100 * np.sin(2 * np.pi * fz * x / sample )
>>> df['accelz'] = df['accelx'][:25].tolist() + df['accely'][25:50].tolist() + df['accelz'][50:75].tolist() + df['accely'][75:].tolist()
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject','Class']
                   )
>>> dsk.pipeline.add_feature_generator([{'name':'Ratio of Peak to Peak of High Frequency between two halves',
                             'params':{"smoothing_factor": 5,
                                       "columns": ['accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
       Class Subject  gen_0001_accelzACRatio
    0      0     s01                3.888882
    1      1     s01                0.350000

Time

Abs Percent Time Over Threshold

Percentage of absolute value of samples in the series that are above the offset

Average Time Over Threshold

Average of the time spent above threshold for all times crossed.

Percent Time Over Second Sigma

Percentage of samples in the series that are above the sample mean + two sigma

Parameters

columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> dsk.pipeline.reset()
>>> df = dsk.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    0      s01  Crawling    1     377     569    4019
    1      s01  Crawling    1     357     594    4051
    2      s01  Crawling    1     333     638    4049
    3      s01  Crawling    1     340     678    4053
    4      s01  Crawling    1     372     708    4051
    5      s01  Crawling    1     410     733    4028
    6      s01  Crawling    1     450     733    3988
    7      s01  Crawling    1     492     696    3947
    8      s01  Crawling    1     518     677    3943
    9      s01  Crawling    1     528     695    3988
    10     s01  Crawling    1      -1    2558    4609
    11     s01   Running    1     -44   -3971     843
    12     s01   Running    1     -47   -3982     836
    13     s01   Running    1     -43   -3973     832
    14     s01   Running    1     -40   -3973     834
    15     s01   Running    1     -48   -3978     844
    16     s01   Running    1     -52   -3993     842
    17     s01   Running    1     -64   -3984     821
    18     s01   Running    1     -64   -3966     813
    19     s01   Running    1     -66   -3971     826
    20     s01   Running    1     -62   -3988     827
    21     s01   Running    1     -57   -3984     843
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> dsk.pipeline.add_feature_generator([{'name':'Percent Time Over Second Sigma',
                             'params':{"columns": ['accelx','accely','accelz'] }}])
>>> results, stats = dsk.pipeline.execute()
>>> print results
    out:
          Class  Rep Subject  gen_0001_accelxPctTimeOver2ndSigma  gen_0002_accelyPctTimeOver2ndSigma  gen_0003_accelzPctTimeOver2ndSigma
    0  Crawling    1     s01                                 0.0                            0.090909                            0.090909
    1   Running    1     s01                                 0.0                            0.000000                            0.000000
Percent Time Over Sigma

Percentage of samples in the series that are above the sample mean + one sigma

Parameters

columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> dsk.pipeline.reset()
>>> df = dsk.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    0      s01  Crawling    1     377     569    4019
    1      s01  Crawling    1     357     594    4051
    2      s01  Crawling    1     333     638    4049
    3      s01  Crawling    1     340     678    4053
    4      s01  Crawling    1     372     708    4051
    5      s01  Crawling    1     410     733    4028
    6      s01  Crawling    1     450     733    3988
    7      s01  Crawling    1     492     696    3947
    8      s01  Crawling    1     518     677    3943
    9      s01  Crawling    1     528     695    3988
    10     s01  Crawling    1      -1    2558    4609
    11     s01   Running    1     -44   -3971     843
    12     s01   Running    1     -47   -3982     836
    13     s01   Running    1     -43   -3973     832
    14     s01   Running    1     -40   -3973     834
    15     s01   Running    1     -48   -3978     844
    16     s01   Running    1     -52   -3993     842
    17     s01   Running    1     -64   -3984     821
    18     s01   Running    1     -64   -3966     813
    19     s01   Running    1     -66   -3971     826
    20     s01   Running    1     -62   -3988     827
    21     s01   Running    1     -57   -3984     843
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> dsk.pipeline.add_feature_generator([{'name':'Percent Time Over Sigma',
                             'params':{"columns": ['accelx','accely','accelz'] }}])
>>> results, stats = dsk.pipeline.execute()
>>> print results
    out:
        Class  Rep Subject  gen_0001_accelxPctTimeOverSigma  gen_0002_accelyPctTimeOverSigma  gen_0003_accelzPctTimeOverSigma
  0  Crawling    1     s01                         0.181818                         0.090909                         0.090909
  1   Running    1     s01                         0.272727                         0.090909                         0.272727
Percent Time Over Threshold

Percentage of samples in the series that are above threshold

Parameters

columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> dsk.pipeline.reset()
>>> df = dsk.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    0      s01  Crawling    1     377     569    4019
    1      s01  Crawling    1     357     594    4051
    2      s01  Crawling    1     333     638    4049
    3      s01  Crawling    1     340     678    4053
    4      s01  Crawling    1     372     708    4051
    5      s01  Crawling    1     410     733    4028
    6      s01  Crawling    1     450     733    3988
    7      s01  Crawling    1     492     696    3947
    8      s01  Crawling    1     518     677    3943
    9      s01  Crawling    1     528     695    3988
    10     s01  Crawling    1      -1    2558    4609
    11     s01   Running    1     -44   -3971     843
    12     s01   Running    1     -47   -3982     836
    13     s01   Running    1     -43   -3973     832
    14     s01   Running    1     -40   -3973     834
    15     s01   Running    1     -48   -3978     844
    16     s01   Running    1     -52   -3993     842
    17     s01   Running    1     -64   -3984     821
    18     s01   Running    1     -64   -3966     813
    19     s01   Running    1     -66   -3971     826
    20     s01   Running    1     -62   -3988     827
    21     s01   Running    1     -57   -3984     843
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> dsk.pipeline.add_feature_generator([{'name':'Percent Time Over Threshold',
                             'params':{"columns": ['accelx','accely','accelz'] }}])
>>> results, stats = dsk.pipeline.execute()
Percent Time Over Zero

Percentage of samples in the series that are positive.

Parameters

columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> dsk.pipeline.reset()
>>> df = dsk.datasets.load_activity_raw_toy()
>>> print df
    out:
       Subject     Class  Rep  accelx  accely  accelz
    0      s01  Crawling    1     377     569    4019
    1      s01  Crawling    1     357     594    4051
    2      s01  Crawling    1     333     638    4049
    3      s01  Crawling    1     340     678    4053
    4      s01  Crawling    1     372     708    4051
    5      s01  Crawling    1     410     733    4028
    6      s01  Crawling    1     450     733    3988
    7      s01  Crawling    1     492     696    3947
    8      s01  Crawling    1     518     677    3943
    9      s01  Crawling    1     528     695    3988
    10     s01  Crawling    1      -1    2558    4609
    11     s01   Running    1     -44   -3971     843
    12     s01   Running    1     -47   -3982     836
    13     s01   Running    1     -43   -3973     832
    14     s01   Running    1     -40   -3973     834
    15     s01   Running    1     -48   -3978     844
    16     s01   Running    1     -52   -3993     842
    17     s01   Running    1     -64   -3984     821
    18     s01   Running    1     -64   -3966     813
    19     s01   Running    1     -66   -3971     826
    20     s01   Running    1     -62   -3988     827
    21     s01   Running    1     -57   -3984     843
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')
>>> dsk.pipeline.add_feature_generator([{'name':'Percent Time Over Zero',
                             'params':{"columns": ['accelx','accely','accelz'] }}])
>>> results, stats = dsk.pipeline.execute()
>>> print results
    out:
        Class  Rep Subject  gen_0001_accelxPctTimeOverZero  gen_0002_accelyPctTimeOverZero  gen_0003_accelzPctTimeOverZero
  0  Crawling    1     s01                        0.909091                             1.0                             1.0
  1   Running    1     s01                        0.000000                             0.0                             1.0
Duration of the Signal

Duration of the signal. It is calculated by dividing the length of vector by the sampling rate.

Parameters
  • sample_rate – float; Sampling rate

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Duration of the Signal',
                             'params':{"columns": ['accelx'] ,
                                       "sample_rate": 10
                                      }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
       Subject  gen_0001_accelxDurSignal
     0     s01                       0.5

Area

Absolute Area

Absolute area of the signal. Absolute area = sum(abs(signal(t)) dt), where abs(signal(t)) is absolute signal value at time t, and dt is sampling time (dt = 1/sample_rate).

Parameters
  • sample_rate – Sampling rate of the signal

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Absolute Area',
                             'params':{"columns": ['accelx', 'accely', 'accelz'],
                                       "sample_rate": 10 }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxAbsArea  gen_0002_accelyAbsArea  gen_0003_accelzAbsArea
    0     s01                     1.0                     3.6                     2.9
Absolute Area of High Frequency

Absolute area of high frequency components of the signal. It calculates absolute area by applying a moving average filter on the signal with a smoothing factor and subtracting the filtered signal from the original.

Parameters
  • sample_rate – float; Sampling rate of the signal

  • smoothing_factor (int) – Determines the amount of attenuation for frequencies over the cutoff frequency.

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Absolute Area of High Frequency',
                             'params':{"sample_rate": 10,
                                       "smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxAbsAreaAc  gen_0002_accelyAbsAreaAc  gen_0003_accelzAbsAreaAc
    0     s01                 76.879997                800.099976                470.160004
Absolute Area of Low Frequency

Absolute area of low frequency components of the signal. It calculates absolute area by first applying a moving average filter on the signal with a smoothing factor.

Parameters
  • sample_rate – float; Sampling rate of the signal

  • smoothing_factor (int) – over the cutoff frequency.

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Absolute Area of Spectrum',
                             'params':{"sample_rate": 10,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxAbsAreaSpec  gen_0002_accelyAbsAreaSpec  gen_0003_accelzAbsAreaSpec
    0     s01                       260.0                      2660.0                      1830.0
Absolute Area of Spectrum

Absolute area of spectrum.

Parameters
  • sample_rate – Sampling rate of the signal

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Absolute Area of Spectrum',
                             'params':{"sample_rate": 10,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxAbsAreaSpec  gen_0002_accelyAbsAreaSpec  gen_0003_accelzAbsAreaSpec
    0     s01                       260.0                      2660.0                      1830.0
Total Area

Total area under the signal. Total area = sum(signal(t)*dt), where signal(t) is signal value at time t, and dt is sampling time (dt = 1/sample_rate).

Parameters
  • sample_rate – Sampling rate of the signal

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Total Area',
                             'params':{"sample_rate": 10,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen0001_accelxTotArea  gen_0002_accelyTotArea  gen_0003_accelzTotArea
    0     s01                    0.0                     3.6                     2.9
Total Area of High Frequency

Total area of high frequency components of the signal. It calculates total area by applying a moving average filter on the signal with a smoothing factor and subtracting the filtered signal from the original.

Parameters
  • sample_rate – float; Sampling rate of the signal

  • smoothing_factor (int) – Determines the amount of attenuation for frequencies over the cutoff frequency.

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Total Area of High Frequency',
                             'params':{"sample_rate": 10,
                                       "smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxTotAreaAc  gen_0002_accelyTotAreaAc  gen_0003_accelzTotAreaAc
    0     s01                       0.0                      0.12                      0.28
Total Area of Low Frequency

Total area of low frequency components of the signal. It calculates total area by first applying a moving average filter on the signal with a smoothing factor.

Parameters
  • sample_rate – float; Sampling rate of the signal

  • smoothing_factor (int) – frequencies over the cutoff frequency.

  • columns – List of str; Set of columns on which to apply the feature generator

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Total Area of Low Frequency',
                             'params':{"sample_rate": 10,
                                       "smoothing_factor": 5,
                                       "columns": ['accelx','accely','accelz']
                                      }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0001_accelxTotAreaDc  gen_0002_accelyTotAreaDc  gen_0003_accelzTotAreaDc
    0     s01                       0.0                      0.72                      0.58

Energy

Average Demeaned Energy

Average Demeaned Energy.

  1. Calculate the element-wise demeaned by its column average of the input columns.

  2. Sum the squared components across each column for the total demeaned energy per sample.

  3. Take the average of the sum of squares to get the average demeaned energy.

Parameters

columns – List of str; The columns represents a list of all column names on which average_energy is to be found.

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Average Demeaned Energy',
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0000_AvgDemeanedEng
    0     s01                     9.52
Average Energy

Average Energy.

  1. Calculate the element-wise square of the input columns.

  2. Sum the squared components across each column for the total energy per sample.

  3. Take the average of the sum of squares to get the average energy.

\[\frac{1}{N}\sum_{i=1}^{N}x_{i}^2+y_{i}^2+..n_{i}^2\]
Parameters

columns – List of str; The columns represents a list of all column names on which average_energy is to be found.

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print(df)
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Average Energy',
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print(result)
    out:
      Subject  gen_0000_AvgEng
    0     s01             95.0
Total Energy

Total Energy.

  1. Calculate the element-wise abs sum of the input columns.

  2. Sum the energy values over all streams to get the total energy.

Parameters

columns – List of str; The columns represents a list of all column names on which total energy is to be found.

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print df
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Total Energy',
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print result
    out:
      Subject  gen_0000_TotEng
    0     s01            475.0

Physical

Average of Movement Intensity

Calculates the average movment intensity defined by:

\[\frac{1}{N}\sum_{i=1}^{N} \sqrt{x_{i}^2 + y_{i}^2 + .. n_{i}^2}\]
Parameters

columns (list) – list of columns to cacluate average movement intensity.

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print(df)
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Average of Movement Intensity',
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print(result)
    out:
      Subject  gen_0000_AvgInt
    0     s01         9.0
Average Signal Magnitude Area

Average signal magnitude area.

\[\frac{1}{N}\sum_{i=1}^{N} {x_{i} + y_{i} + .. n_{i}}\]
Parameters

columns – List of str; The columns represents a list of all column names on which average_energy is to be found.

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print(df)
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':"Average Signal Magnitude Area",
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print(result)
    out:
      Subject  gen_0000_AvgSigMag
        s01          13.0
Variance of Movement Intensity

Variance of movement intensity

Parameters

columns – List of str; The columns represents a list of all column names on which average_energy is to be found.

Returns

Returns data frame with specified column(s).

Return type

DataFrame

Examples

>>> import pandas as pd
>>> df = pd.DataFrame([[-3, 6, 5], [3, 7, 8], [0, 6, 3],
                       [-2, 8, 7], [2, 9, 6]],
                       columns= ['accelx', 'accely', 'accelz'])
>>> df['Subject'] = 's01'
>>> print(df)
    out:
       accelx  accely  accelz Subject
    0      -3       6       5     s01
    1       3       7       8     s01
    2       0       6       3     s01
    3      -2       8       7     s01
    4       2       9       6     s01
>>> dsk.pipeline.reset(delete_cache=False)
>>> dsk.pipeline.set_input_data('test_data', df, force=True,
                    data_columns = ['accelx', 'accely', 'accelz'],
                    group_columns = ['Subject'])
>>> dsk.pipeline.add_feature_generator([{'name':'Variance of Movement Intensity',
                             'params':{ "columns": ['accelx','accely','accelz'] }}])
>>> result, stats = dsk.pipeline.execute()
>>> print(result)
    out:
      Subject  gen_0000_VarInt
    0     s01         3.082455
library.core_functions.feature_generators.fg_physical.magnitude(input_data, input_columns)

Computes the magnitude of each column in a dataframe

Sensor Fusion

Abs Max Column

Returns the index of the column with the max abs value for each segment.

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use

Returns

feature vector with index of max abs value column.

Return type

DataFrame

Cross Column Correlation

Compute the correlation of the slopes between two columns.

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use

  • sample_frequency (int) – frequency to sample correlation at. Default 1 which is every sample

Returns

feature vector mean difference

Return type

DataFrame

Max Column

Returns the index of the column with the max value for each segment.

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use

Returns

feature vector with index of max column.

Return type

DataFrame

Cross Column Mean Crossing Rate

Compute the crossing rate of column 2 of over the mean of column 1

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use (requires 2 inputs)

Returns

feature vector mean crossing rate

Return type

DataFrame

Cross Column Mean Crossing with Offset

Compute the crossing rate of column 2 of over the mean of column 1

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use (requires 2 inputs)

Returns

feature vector mean crossing rate

Return type

DataFrame

Two Column Mean Difference

Compute the mean difference between two columns.

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use

Returns

feature vector mean difference

Return type

DataFrame

Two Column Median Difference

Compute the median difference between two columns.

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use

Returns

feature vector median difference

Return type

DataFrame

Min Column

Returns the index of the column with the min value for each segment.

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use

Returns

feature vector with index of max abs value column.

Return type

DataFrame

Two Column Min Max Difference

Compute the min max difference between two columns. Computes the location of the min value for each of the two columns, whichever one larger, it computes the difference between the two at that index.

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use

Returns

feature vector difference of two columns

Return type

DataFrame

Two Column Peak To Peak Difference

Compute the max value for each column, then subtract the first column for the second.

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use

Returns

feature vector mean difference

Return type

DataFrame

Two Column Peak Location Difference
Computes the location of the maximum value for each column and then finds the difference

between those two points.

Parameters
  • input_data (DataFrame) – input data

  • columns (list of strings) – name of the sensor streams to use

Returns

feature vector mean difference

Return type

DataFrame