Segmentation Parameter Selection

Many of the segmentation algorithms work by identifying a threshold level above which the start or end of a segment is triggered. Identifying the correct threshold for a dataset can be challenging. The Toolkit has some built in heuristics for identifying the best parameters for some of the segmentation algorithms. In some cases though, you may want to select the parameters manually. This tutorial will show you how to use the Python SDK to select good parameters

Try It Yourself

You can download the Notebook file here to follow along with this tutorial in your own environment.

Prerequisites

Download the Demo Project and extract into this folder
Update to the latest SensiML Python SDK
To see the visualizations install ipywidgets

[ ]:

!pip install ipywidgets sensiml -U

Overview

In this tutorial we will

Load a .DCLPROJ file into the Python SDK using the DCLProject API
Upload a .DCLPROJ file as a new project to the server using the Client
Visualize the threshold space for segmentation algorithms and select an appropriate threshold
Run the Segmentation algorithm and compare the generated segments to the ground truth
Compare the generated segments to the manually labeled segments

[2]:

from sensiml import Client
from sensiml.dclproj import DCLProject, plot_threshold_space, DataSegments, segment_list_to_datasegments

1 Load a .DCLPROJ file into the Python SDK using the DCLProject API

The DCLProject API provides read-only access to a .DCLPROJ file which is the native database of a Data Studio project. To load the project

[3]:

DCLP_PROJECT_PATH = "DemoProject/DemoProject.dclproj"

dcl = DCLProject(path=DCLP_PROJECT_PATH)

2 Upload a .DCLPROJ file using the Python SDK Client

You can use the client to upload either a .DCLPROJ or .DCLI file. The .DCLPROJ format is a database format used by the Data Studio. The .DCLI file is a JSON format is open sourced which also fully describes the project. You can read more about the .DCLI format in our documentation. For this example we will use the .DCLPROJ format.

[4]:

client = Client()

client.upload_project("Segment Demo Project", DCLP_PROJECT_PATH)
client.project = "Segment Demo Project"

Project with this name already exists.

select and set the variables for the GROUND_TRUTH_SESSION, PREDICTED_SESSION, and FILENAME from the .dcproj database

[5]:

client.list_captures()

[5]:

	Name	Last Modified	Created	UUID
0	File1.csv	2023-02-24T20:13:25.553485Z	2023-02-24T20:13:25.553485Z	a877cc1f-ce30-44c8-b345-f35daa83939c

[ ]:

client.list_segmenters()[["name","custom","created_at"]]

[7]:

GROUND_TRUTH_SESSION = "Training Session"
FILENAME="File1.csv"

3 Visualize the threshold space for segmentation algorithms and select an appropriate threshold

threshold_width is the region the threshold space is calculated over.

threshold type is the type of calculation performed on sensor data within that region. The values of the threshold type are (absolute sum, sum, variance, std)

column_of_interest is the column of sensor data used to compare against the threshold

The larger the threshold width the less sensitive to noise the threshold will be, but the slower it will be to trigger. Select different parameters below to see how it affects the threshold space.

[8]:

threshold_width=15 # threshold
threshold_type='std' #"absolute sum", "std", "variance"

capture = dcl.get_capture(FILENAME)
gt_segs = dcl.get_capture_segments(FILENAME, GROUND_TRUTH_SESSION)

sensor_columns_to_use = ['AccelerometerZ', 'GyroscopeX'] #'AccelerometerX', 'AccelerometerY', 'GyroscopeY', 'GyroscopeZ'

for column_of_interest in sensor_columns_to_use:
    plot_threshold_space(capture, column_of_interest, gt_segs, threshold_type=threshold_type, threshold_width=threshold_width)

../../_images/sensiml-python-sdk_additional-tutorials_selecting-parameters-for-segmentation-algorithms_13_0.png

../../_images/sensiml-python-sdk_additional-tutorials_selecting-parameters-for-segmentation-algorithms_13_1.png

4 Run the Segmentation algorithm and compare the generated segments to the ground truth

In the following cell the Windowing Threshold Segmentation algorithm is used as the segmentation algorithm with the input parameters:

column_of_interest: AccelerometerZ
threshold_space_width 15
threshold_space std

The function will execute a pipeline job in the cloud and return the results which are converted into a DataSegments Obj

[9]:

client.pipeline="Segmenter Pipeline"

client.pipeline.reset()
client.pipeline.set_input_capture([FILENAME])

# Set the Windowing Threshold to the values you would like to have in the segmenter algorithm
client.pipeline.add_transform("Windowing Threshold Segmentation", params={"column_of_interest": 'AccelerometerZ',
                                "window_size": 100,
                                "offset": 50,
                                "vt_threshold": 10000,
                                "threshold_space_width": 15,
                                "comparison": "maximum", #options: <maximum/minimum>,
                                "threshold_space": "std", #options: <std/absolute sum/sum/variance/absolute avg>,
                                "return_segment_index": True,
                                })

r,s = client.pipeline.execute()
new_segs  = segment_list_to_datasegments(r).nearest_labels(gt_segs, verbose=False)

Capture files do not have group columns, use a data file if you need group columns.
Executing Pipeline with Steps:

------------------------------------------------------------------------
 0.     Name: File1.csv                                 Type: capturefile
------------------------------------------------------------------------
------------------------------------------------------------------------
 1.     Name: Windowing Threshold Segmentation          Type: segmenter
------------------------------------------------------------------------
------------------------------------------------------------------------



Results Retrieved... Execution Time: 0 min. 0 sec.

5 Compare the generated segments to the manually labled ground truth

The DataSegments obj has a number of APIs which are very useful for manipulating, visualizing, and comparing segments. To see a comparison between the ground truth segments and the predicted segments use the confusion_matrix API

[10]:

new_segs.confusion_matrix(gt_segs)

Pred Session:
Model:
-----------  ---------  ---------  ---  -------  -------
             Gesture A  Gesture B  UNK  Support  Sense %
Gesture A            7          0    0        7    100.0
Gesture B            0          9    1       10     90.0
UNK                  0          0    0        0        0
-----------  ---------  ---------  ---  -------  -------
Total                7          9    1       17
Pos_Pred(%)      100.0      100.0    0   Acc(%)    94.12
-----------  ---------  ---------  ---  -------  -------

[10]:

You can also use the DataSegments plot_segments API to plot only the labels directly from a DataSegments object.

[28]:

xlim=(100,3000)
ax = capture.plot(figsize=(16,4), xlim=xlim)
print("Predicted")
new_segs.plot_segments(figsize=(16,4), xlim=xlim)
print("Ground Truth")
gt_segs.plot_segments(figsize=(16,4), labels=new_segs.label_values, xlim=xlim)

Predicted
Ground Truth

../../_images/sensiml-python-sdk_additional-tutorials_selecting-parameters-for-segmentation-algorithms_19_1.png

../../_images/sensiml-python-sdk_additional-tutorials_selecting-parameters-for-segmentation-algorithms_19_2.png

../../_images/sensiml-python-sdk_additional-tutorials_selecting-parameters-for-segmentation-algorithms_19_3.png

This tutorial has demonstrated how to select parameters for the Segmentation algorithms by created a plot that transforms the sensor data into the threshold space. You can use this to select the parameters for your own segmentation algorithms.