Segmenters

Takes input from the sensor transform/filter step and buffers the data until a segment is found.

Windowing

Parameters

window_size (int) – The size of the window
delta (int) – The slide of the window
enable_train_delta (bool) – Enable or disable the train delta parameter
train_delta (int, optional) – Use this delta for sliding during training.
return_segment_index (bool, optional) – _description_. Defaults to False.

Use Labels

This Segmenter uses the labels directly from your data set for training.

You should use this if

you are testing out models and want to have very accurate labeling
have not decided on how you will segment your data
you are going to implement your own segmentation algorithm

Note: This function expects you to implement your own segmentation algorithm in the firmware. that matches the types of segments you are creating.

Model firmware defaults to a sliding window of max window size.

Parameters

window_size (int) –
return_segment_index (bool, optional) – Defaults to False.

Windowing Threshold Segmentation

This function transfer the input_data and group_column from the previous pipeline block. This is a single pass threshold segmentation algorithm which transforms a window of the data stream that defined with ‘threshold_space_width’ into threshold space. The threshold space can be computed as ‘standard deviation’(std), ‘sum’, ‘absolute sum’, ‘absolute average’ and ‘variance’. The vt threshold is then compared against the calculated value with a comparison type of >=. Once the threshold space is detected above the vt_threshold that becomes the anchor point. The segment starts at the index of the detected point minus a user specified offset. The end of the segment is immediately set to the window size.

Parameters

column_of_interest (str) – name of the stream to use for segmentation
window_size (int) – number of samples in the window (default is 100)
offset (int) – The offset from the anchor point and the start of the segment. for a offset of 0, the start of the window will start at the anchor point. ( default is 0)
vt_threshold (int) – vt_threshold value which determines the segment.
threshold_space_width (int) – Size of the threshold buffer.
threshold_space (str) – Threshold transformation space. (std, sum, absolute sum, variance, absolute avg)
comparison (str) – the comparison between threshold space and vertical threshold (>=, <=)
return_segment_index (False) – Set to true to see the segment indexes for start and end.

Returns

The segmented result will have a new column called SegmentID that contains the segment IDs.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> df
    out:
           Subject     Class  Rep  accelx  accely  accelz
        0      s01  Crawling    1     377     569    4019
        1      s01  Crawling    1     357     594    4051
        2      s01  Crawling    1     333     638    4049
        3      s01  Crawling    1     340     678    4053
        4      s01  Crawling    1     372     708    4051
        5      s01  Crawling    1     410     733    4028
        6      s01  Crawling    1     450     733    3988
        7      s01  Crawling    1     492     696    3947
        8      s01  Crawling    1     518     677    3943
        9      s01  Crawling    1     528     695    3988
        10     s01  Crawling    1      -1    2558    4609
        11     s01   Running    1     -44   -3971     843
        12     s01   Running    1     -47   -3982     836
        13     s01   Running    1     -43   -3973     832
        14     s01   Running    1     -40   -3973     834
        15     s01   Running    1     -48   -3978     844
        16     s01   Running    1     -52   -3993     842
        17     s01   Running    1     -64   -3984     821
        18     s01   Running    1     -64   -3966     813
        19     s01   Running    1     -66   -3971     826
        20     s01   Running    1     -62   -3988     827
        21     s01   Running    1     -57   -3984     843

>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')

>>> client.pipeline.add_transform("Windowing Threshold Segmentation",
                       params={"column_of_interest": 'accelx',
                               "window_size": 5,
                               "offset": 0,
                               "vt_threshold": 0.05,
                               "threshold_space_width": 4,
                               "threshold_space": 'std',
                               "return_segment_index": False
                              })

>>> results, stats = client.pipeline.execute()
>>> print results
    out:
              Class  Rep  SegmentID Subject  accelx  accely  accelz
       0   Crawling    1          0     s01     377     569    4019
       1   Crawling    1          0     s01     357     594    4051
       2   Crawling    1          0     s01     333     638    4049
       3   Crawling    1          0     s01     340     678    4053
       4   Crawling    1          0     s01     372     708    4051
       5   Crawling    1          1     s01     410     733    4028
       6   Crawling    1          1     s01     450     733    3988
       7   Crawling    1          1     s01     492     696    3947
       8   Crawling    1          1     s01     518     677    3943
       9   Crawling    1          1     s01     528     695    3988
       10   Running    1          0     s01     -44   -3971     843
       11   Running    1          0     s01     -47   -3982     836
       12   Running    1          0     s01     -43   -3973     832
       13   Running    1          0     s01     -40   -3973     834
       14   Running    1          0     s01     -48   -3978     844
       15   Running    1          1     s01     -52   -3993     842
       16   Running    1          1     s01     -64   -3984     821
       17   Running    1          1     s01     -64   -3966     813
       18   Running    1          1     s01     -66   -3971     826
       19   Running    1          1     s01     -62   -3988     827

Max Min Threshold Segmentation

This is a max min threshold segmentation algorithm which transforms a window of the data stream of size threshold_space_width into threshold space. This function transfer the input_data and group_column from the previous pipeline block.

The threshold space can be computed as standard deviation, sum, absolute sum, absolute average and variance. The vt threshold is then compared against the calculated value with a comparison type of >= for the start of the segment and <= for the end of the segment. This algorithm is a two pass detection, the first pass detects the start of the segment, the second pass detects the end of the segment.

Parameters

column_of_interest (str) – name of the stream to use for segmentation
max_segment_length (int) – number of samples in the window (default is 100)
min_segment_length – The smallest segment allowed.
threshold_space_width (float) – number of samples to check for being above the vt_threshold before forgetting segment.
threshold_space (std) – Threshold transformation space. (std, sum, absolute sum, variance, absolute avg)
first_vt_threshold (int) – vt_threshold value to begin detecting a segment
second_vt_threshold (int) – vt_threshold value to detect a segments end.
return_segment_index (False) – set to true to see the segment indexes for start and end.

Returns

The segmented result will have a new column called SegmentID that contains the segment IDs.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> df
    out:
           Subject     Class  Rep  accelx  accely  accelz
        0      s01  Crawling    1     377     569    4019
        1      s01  Crawling    1     357     594    4051
        2      s01  Crawling    1     333     638    4049
        3      s01  Crawling    1     340     678    4053
        4      s01  Crawling    1     372     708    4051
        5      s01  Crawling    1     410     733    4028
        6      s01  Crawling    1     450     733    3988
        7      s01  Crawling    1     492     696    3947
        8      s01  Crawling    1     518     677    3943
        9      s01  Crawling    1     528     695    3988
        10     s01  Crawling    1      -1    2558    4609
        11     s01   Running    1     -44   -3971     843
        12     s01   Running    1     -47   -3982     836
        13     s01   Running    1     -43   -3973     832
        14     s01   Running    1     -40   -3973     834
        15     s01   Running    1     -48   -3978     844
        16     s01   Running    1     -52   -3993     842
        17     s01   Running    1     -64   -3984     821
        18     s01   Running    1     -64   -3966     813
        19     s01   Running    1     -66   -3971     826
        20     s01   Running    1     -62   -3988     827
        21     s01   Running    1     -57   -3984     843

>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')

>>> client.pipeline.add_transform("Max Min Threshold Segmentation",
                   params={ "column_of_interest": 'accelx',
                            "max_segment_length": 5,
                            "min_segment_length": 5,
                            "threshold_space_width": 3,
                            "threshold_space": 'std',
                            "first_vt_threshold": 0.05,
                            "second_vt_threshold": 0.05,
                            "return_segment_index": False})

>>> results, stats = client.pipeline.execute()
>>> print results
    out:
              Class  Rep  SegmentID Subject  accelx  accely  accelz
        0  Crawling    1          0     s01     377     569    4019
        1  Crawling    1          0     s01     357     594    4051
        2  Crawling    1          0     s01     333     638    4049
        3  Crawling    1          0     s01     340     678    4053
        4  Crawling    1          0     s01     372     708    4051
        5   Running    1          0     s01     -44   -3971     843
        6   Running    1          0     s01     -47   -3982     836
        7   Running    1          0     s01     -43   -3973     832
        8   Running    1          0     s01     -40   -3973     834
        9   Running    1          0     s01     -48   -3978     844

General Threshold Segmentation

This is a general threshold segmentation algorithm which transforms a window of the data stream of size threshold_space_width into threshold space. This function transfer the input_data and group_column from the previous pipeline block.

The threshold space can be computed as standard deviation, sum, absolute sum, absolute average and variance. The vt threshold is then compared against the calculated value with a comparison type of <= or >= based on the use of “min” or “max” in the comparison type. This algorithm is a two pass detection, the first pass detects the start of the segment, the second pass detects the end of the segment. In this generalized algorithm, the two can be set independently.

Parameters

first_column_of_interest (str) – name of the stream to use for first threshold segmentation
second_column_of_interest (str) – name of the stream to use for second threshold segmentation
max_segment_length (int) – number of samples in the window (default is 200)
min_segment_length (int) – The smallest segment allowed. (default 100)
first_vt_threshold (int) – vt_threshold value to begin detecting a segment
first_threshold_space (str) – threshold space to detect segment against (std, variance, absolute avg, absolute sum, sum)
first_comparison (str) – detect threshold above(max) or below(min) the vt_threshold (max, min)
second_vt_threshold (int) – vt_threshold value to detect a segments end.
second_threshold_space (str) – threshold space to detect segment end (std, variance, absolute avg, absolute sum, sum)
second_comparison (str) – detect threshold above(max) or below(min) the vt_threshold (max, min) threshold_space_width (int): the size of the buffer that the threshold value is calculated from.
return_segment_index (False) – set to true to see the segment indexes for start and end.

Returns

The segmented result will have a new column called SegmentID that contains the segment IDs.

Return type

DataFrame

Example

>>> client.pipeline.reset()
>>> df = client.datasets.load_activity_raw_toy()
>>> df
    out:
           Subject     Class  Rep  accelx  accely  accelz
        0      s01  Crawling    1     377     569    4019
        1      s01  Crawling    1     357     594    4051
        2      s01  Crawling    1     333     638    4049
        3      s01  Crawling    1     340     678    4053
        4      s01  Crawling    1     372     708    4051
        5      s01  Crawling    1     410     733    4028
        6      s01  Crawling    1     450     733    3988
        7      s01  Crawling    1     492     696    3947
        8      s01  Crawling    1     518     677    3943
        9      s01  Crawling    1     528     695    3988
        10     s01  Crawling    1      -1    2558    4609
        11     s01   Running    1     -44   -3971     843
        12     s01   Running    1     -47   -3982     836
        13     s01   Running    1     -43   -3973     832
        14     s01   Running    1     -40   -3973     834
        15     s01   Running    1     -48   -3978     844
        16     s01   Running    1     -52   -3993     842
        17     s01   Running    1     -64   -3984     821
        18     s01   Running    1     -64   -3966     813
        19     s01   Running    1     -66   -3971     826
        20     s01   Running    1     -62   -3988     827
        21     s01   Running    1     -57   -3984     843

>>> client.pipeline.set_input_data('test_data', df, force=True,
                    data_columns=['accelx', 'accely', 'accelz'],
                    group_columns=['Subject', 'Class', 'Rep'],
                    label_column='Class')

>>> client.pipeline.add_transform("General Threshold Segmentation",
                   params={"first_column_of_interest": 'accelx',
                        "second_column_of_interest": 'accely',
                        "max_segment_length": 5,
                        "min_segment_length": 5,
                        "threshold_space_width": 2,
                        "first_vt_threshold": 0.05,
                        "first_threshold_space": 'std',
                        "first_comparison": 'max',
                        "second_vt_threshold": 0.05,
                        "second_threshold_space": 'std',
                        "second_comparison": 'min',
                        "return_segment_index": False})

>>> results, stats = client.pipeline.execute()
>>> print results
    out:
              Class  Rep  SegmentID Subject  accelx  accely  accelz
        0  Crawling    1          0     s01     377     569    4019
        1  Crawling    1          0     s01     357     594    4051
        2  Crawling    1          0     s01     333     638    4049
        3  Crawling    1          0     s01     340     678    4053
        4  Crawling    1          0     s01     372     708    4051
        5   Running    1          0     s01     -44   -3971     843
        6   Running    1          0     s01     -47   -3982     836
        7   Running    1          0     s01     -43   -3973     832
        8   Running    1          0     s01     -40   -3973     834
        9   Running    1          0     s01     -48   -3978     844

Double Peak Key Segmentation

Considers a double peak as the key to begin segmentation and a single peak as the end.

Parameters

input_data (DataFrame) – The input data.
column_of_interest (str) – The name of the stream to use for segmentation.
group_columns (List[str]) – A list of column names to use for grouping.
return_segment_index (bool) – If True, returns the segment indexes for start and end. This should only be used for visualization purposes and not for pipeline building.
min_peak_to_peak (int) – Minimum peak-to-peak distance for a potential double peak.
max_peak_to_peak (int) – Maximum peak-to-peak distance for a potential double peak.
twist_threshold (int) – Threshold to detect a first downward slope in a double peak.
end_twist_threshold (int) – Threshold to detect an upward slope preceding the last peak in a double peak.
last_twist_threshold (int) – Minimum threshold difference between the last peak and the following minimums.
max_segment_length (int) – The maximum number of samples a segment can contain. A segment length too large will not fit on the device.

Returns

If return_segment_index is True, returns a dictionary containing the start and: end indexes of each segment for visualization purposes. Otherwise, returns a DataFrame.

Return type

DataFrame

Adaptive Windowing Segmentation

A sliding windowing technique with adaptive sizing. This will find the largest point after min_segment_length that is above the threshold. That point will be considered the end of the segment. If no points are above the threshold before reaching max segment length, then the segment will stop at max_segment_length

Parameters

input_data (DataFrame) – The input data.
columns_of_interest (str) – The stream to use for segmentation.
group_columns ([str]) – A list of column names to use for grouping.
max_segment_length (int) – This is the maximum number of samples a segment can contain.
min_segment_length (int) – segment can contain.
threshold (int) – The threshold must be met to start looking for the end of the segment early. If the threshold is not met, the segment ends at the max_segment_length
absolute_value (bool) – Takes the absolute value of the sensor data prior do doing the comparison
return_segment_index (False) – Set to true to see the segment indexes for start and end.