Apply Ground Truth to Generated Segments

In this tutorial you will learn how to apply labels from a ground truth session to labels generated from a segmentation algorithm. In your own workflow, you may generate an autosegmentation algorithm and label all of the segments as Unknown. Then use this workflow to update them with the ground truth values. This makes experimenting with new segmentation algorithms much quicker.

Try It Yourself

You can download the Notebook file here to follow along with this tutorial in your own environment.

Prerequisites

  • Download the Demo Project and extract into this folder

  • To see the visualizations install ipywidgets

  • Update to the latest SensiML Python SDK

[ ]:
!pip install ipywidgets sensiml -U

Connecting to a Data Studio project

The python SDK can be used to interface directly with a .dclproj file in READ only mode. In this tutorial we will

  • Load a .DCLPROJ file into the Python SDK using the DCLProject API

  • Upload a .DCLPROJ file as a new project to the server using the Client

  • Compare the segments in the “Training Session” and “Predicted” Session using the confusion_matrix DataSegments API

  • Visualize the segments against the raw sensor data using the plot_segments API

  • Update the predicted segments with the closest labels from the “Ground Truth” Session

  • Upload the new predicted segments

1 Load a .DCLPROJ file into the Python SDK using the DCLProject API

The DCLProject API provides read-only access to a .DCLPROJ file which is the native database format of a Data Studio project. To load the project

[46]:
from sensiml.dclproj import DCLProject

DCLP_PROJECT_PATH = "DemoProject/DemoProject.dclproj"

dcl = DCLProject(path=DCLP_PROJECT_PATH)

2 Upload a .DCLPROJ file using the Python SDK Client

You can use the client to upload either a .DCLPROJ or .DCLI file. The .DCLPROJ format is a database format used by the Data Studio.

[ ]:
from sensiml import Client
client = Client()

client.upload_project("Segment Demo Project", DCLP_PROJECT_PATH)

3 Compare the segments in the “Training Session” and “Predicted Session”

We will use the DCLProject API to get access to the segment information that is stored in the .DCLPROJ database.

To see the list of captures in the database use the .list_captures API

[21]:
dcl.list_captures()
[21]:
uuid name file_size number_samples set_sample_rate created_at local_status last_modified capture_configuration set Device
0 adb3f19a-cf85-4b9c-b3e5-8bbf393bfb99 File1.csv 124043 3402 119 2023-02-24 00:16:27.740546 Synced 2023-02-24 00:30:28.177732 Nano 33 BLE Sense IMU train Nano33 BLE Sense - 127.0.0.1:5555

To see a list of sessions in the database us the .list_sessions API

[22]:
dcl.list_sessions()
[22]:
id name parameters custom preprocess created_at local_status last_modified
0 1 Training Session None 1 None 2023-02-24 00:15:18.719975 Synced 2023-02-24 00:28:45.751122
1 2 Segmentation {"inputs":{"input_data":"","first_column_of_in... 0 {"0":{"name":"MagnitudeGxGyGz","actual_name":"... 2023-02-24 00:15:19.560820 Synced 2023-02-24 00:29:02.792429

select and set the variables for the GROUND_TRUTH_SESSION, PREDICTED_SESSION, and FILENAME from the .dcproj database

[28]:
GROUND_TRUTH_SESSION = "Training Session"
PREDICTED_SESSION = "Segmentation"
FILENAME="File1.csv"

Load the ground truth and predicted segments as DataSegments objects using the get_capture_segments API

[29]:
gt_segs = dcl.get_capture_segments(FILENAME, GROUND_TRUTH_SESSION)
pred_segs = dcl.get_capture_segments(FILENAME, PREDICTED_SESSION)
pred_segs.head()
[29]:
label_value capture segment_id capture_sample_sequence_start capture_sample_sequence_end session uuid segment_length
0 Gesture A File1.csv 0 305 365 Segmentation c3b730c0-15f2-4e36-ac23-45c94d35bad3 60
1 Gesture A File1.csv 1 434 494 Segmentation f1c880e4-d276-4409-8983-84db8abd7072 60
2 Gesture A File1.csv 2 519 618 Segmentation ff092a08-823b-4c56-b0dc-5f44776ee48d 99
3 Gesture A File1.csv 3 619 679 Segmentation 7a97141f-1497-4932-a1ab-93141ff2c966 60
4 Gesture A File1.csv 4 754 816 Segmentation 8c94d231-4851-4575-9c60-ca29851ca4e8 62

The DataSegments obj has a number of APIs which are very useful for manipulating, visualizing, and comparing segments. To see a comparison between the ground truth segments and the predicted segments use the confusion_matrix API

[30]:
pred_segs.confusion_matrix(gt_segs)
Pred Session:
Model:
-----------  ---------  ---------  ---  -------  -------
             Gesture A  Gesture B  UNK  Support  Sense %
Gesture A            7          0    0        7    100.0
Gesture B           10          0    0       10        0
UNK                  3          0    0        3        0
-----------  ---------  ---------  ---  -------  -------
Total               20          0    0       20
Pos_Pred(%)       35.0          0    0   Acc(%)     35.0
-----------  ---------  ---------  ---  -------  -------
[30]:

4 Visualize the segments against the raw sensor data using the plot_segments API

The DCLProject plot_segment_labels API can be used to visualize the Session segments overlayed on the raw sensor data. You can also use the DataSegments plot_segments API to plot only the labels directly from a DataSegments object.

[10]:
from ipywidgets import widgets
interactive_plot=widgets.interact_manual.options(manual_name="plot")


capture_length = dcl.get_capture(FILENAME).shape[0]
@interactive_plot(x_start=(0, capture_length), window=(1,capture_length))
def plot(x_start=0, window=capture_length):
    x_max = min(x_start+window, capture_length)

    x_lim = (x_start,x_max)
    _ = dcl.plot_segment_labels(FILENAME, PREDICTED_SESSION, xlim=x_lim, figsize=(30,4))
    _ = dcl.plot_segment_labels(FILENAME, GROUND_TRUTH_SESSION, xlim=x_lim, figsize=(30,4))

5 Update the predicted segments with the closest labels from the “Ground Truth” Session

For this project, the Ground Truth was labeled manually and the Predicted comes from a segmentation algorithm. To update the predicted segments with true labels from ground truth labels use the DataSegments nearest_labels API. This API returns a new DataSegments object that has updated label_values. This is used quickly update new segmentation algorithms with the correct ground truth labels without needing to manually set them.

[49]:
new_segs = pred_segs.nearest_labels(gt_segs, verbose=True)
print(f"\n\nnew segment: {len(new_segs)}")
print(f"ground truth: {len(gt_segs)}")
No matching label found for 3 setting to None
Updating 8 from Gesture A to Gesture B
Updating 9 from Gesture A to Gesture B
Updating 10 from Gesture A to Gesture B
Updating 11 from Gesture A to Gesture B
Updating 12 from Gesture A to Gesture B
Updating 13 from Gesture A to Gesture B
No matching label found for 14 setting to None
Updating 15 from Gesture A to Gesture B
Updating 16 from Gesture A to Gesture B
Updating 17 from Gesture A to Gesture B
Updating 18 from Gesture A to Gesture B
No matching label found for 19 setting to None


new segment: 20
ground truth: 17

Comparing the confusion matrix of the new segments to the ground truth gives 100% Sensitivity for Gesture A and Gesture B. The original predicted session had 0% for Gesture B. The Predicted session 20 segments and the ground truth has 17. The difference can be seen in the UNK row of the confusion matrix that has 3 in the Unknown column.

[45]:
new_segs.confusion_matrix(gt_segs)
Pred Session:
Model:
-----------  ---------  ---------  -------  ---  -------  -------
             Gesture A  Gesture B  Unknown  UNK  Support  Sense %
Gesture A            7          0        0    0        7    100.0
Gesture B            0         10        0    0       10    100.0
Unknown              0          0        0    0        0        0
UNK                  0          0        3    0        3        0
-----------  ---------  ---------  -------  ---  -------  -------
Total                7         10        3    0       20
Pos_Pred(%)      100.0      100.0        0    0   Acc(%)     85.0
-----------  ---------  ---------  -------  ---  -------  -------
[45]:

Use the DataSegments plot_segments API to plot only the labels directly from a DataSegments object.

[15]:
@interactive_plot(x_start=(0, new_segs[-1].end), window=(1,new_segs[-1].end))
def plot(x_start=0, window=new_segs[-1].end):
    x_max = min(x_start+window, capture_length)
    if x_max > capture_length:
        x_max=capture_length

    x_lim = (x_start,x_max)
    new_segs.plot_segments(figsize=(30,4), xlim=x_lim)
    gt_segs.plot_segments(figsize=(30,4), xlim=x_lim, labels=new_segs.label_values)

6 Upload the new predicted segments

The DataSegments objects upload API can be used to directly upload segments. Use this to update the Predicted Session with the correct label values.

NOTE: The Python SDK DCLProject API is READ-ONLY so the .DCLPROJ database will not be updated. To update the .DCLPROJ database with the updated labels open the Data Studio and refresh.

[ ]:
new_datasegments.upload(client, 'Unknown')

Another option is to export the segments into the .dcli format and import them into the Data Studio. The .dcli file format is an open format for versioning datasets. See the documentation for more information.

[11]:
dcli = new_datasegments.to_dcli(filename=f'{PREDICTED_SESSION}.dcli', session='Test Upload')
writing dcli file to SensiML Session.dcli

For a project with multiple files you can use the below code to update all of the segments across multiple files

[ ]:
from sensiml.dclproj import DataSegments

new_datasegments = DataSegments()
filenames = dcl.list_capture_segments(sessions=[GROUND_TRUTH_SESSION]).capture.unique().tolist()
for filename in filenames:
    gt_segs = dcl.get_capture_segments(filename, GROUND_TRUTH_SESSION)
    pred_segs = dcl.get_capture_segments(filename, PREDICTED_SESSION).nearest_labels(gt_segs, verbose=False)
    new_datasegments += pred_segs

new_datasegments.upload(client, 'Unknown')

This concludes the tutorial for using the DCLProject and DataSegments APIs to explore and manipulate segment objects programmatically using the Python SDK.