Apply Ground Truth to Generated Segments

In this tutorial you will learn how to apply labels from a ground truth session to labels generated from a segmentation algorithm. In your own workflow, you may generate an autosegmentation algorithm and label all of the segments as Unknown. Then use this workflow to update them with the ground truth values. This makes experimenting with new segmentation algorithms much quicker.

Try It Yourself

You can download the Notebook file here to follow along with this tutorial in your own environment.

Prerequisites

Download the Demo Project and extract into this folder
To see the visualizations install ipywidgets
Update to the latest SensiML Python SDK

[ ]:

!pip install ipywidgets sensiml -U

Connecting to a Data Studio project

The python SDK can be used to interface directly with a .dclproj file in READ only mode. In this tutorial we will

Load a .DCLPROJ file into the Python SDK using the DCLProject API
Upload a .DCLPROJ file as a new project to the server using the Client
Compare the segments in the “Training Session” and “Predicted” Session using the confusion_matrix DataSegments API
Visualize the segments against the raw sensor data using the plot_segments API
Update the predicted segments with the closest labels from the “Ground Truth” Session
Upload the new predicted segments

1 Load a .DCLPROJ file into the Python SDK using the DCLProject API

The DCLProject API provides read-only access to a .DCLPROJ file which is the native database format of a Data Studio project. To load the project

[46]:

from sensiml.dclproj import DCLProject

DCLP_PROJECT_PATH = "DemoProject/DemoProject.dclproj"

dcl = DCLProject(path=DCLP_PROJECT_PATH)

2 Upload a .DCLPROJ file using the Python SDK Client

You can use the client to upload either a .DCLPROJ or .DCLI file. The .DCLPROJ format is a database format used by the Data Studio.

[ ]:

from sensiml import Client
client = Client()

client.upload_project("Segment Demo Project", DCLP_PROJECT_PATH)

3 Compare the segments in the “Training Session” and “Predicted Session”

We will use the DCLProject API to get access to the segment information that is stored in the .DCLPROJ database.

To see the list of captures in the database use the .list_captures API

[21]:

dcl.list_captures()

[21]:

	uuid	name	file_size	number_samples	set_sample_rate	created_at	local_status	last_modified	capture_configuration	set	Device
0	adb3f19a-cf85-4b9c-b3e5-8bbf393bfb99	File1.csv	124043	3402	119	2023-02-24 00:16:27.740546	Synced	2023-02-24 00:30:28.177732	Nano 33 BLE Sense IMU	train	Nano33 BLE Sense - 127.0.0.1:5555

To see a list of sessions in the database us the .list_sessions API

[22]:

dcl.list_sessions()

[22]:

	id	name	parameters	custom	preprocess	created_at	local_status	last_modified
0	1	Training Session	None	1	None	2023-02-24 00:15:18.719975	Synced	2023-02-24 00:28:45.751122
1	2	Segmentation	{"inputs":{"input_data":"","first_column_of_in...	0	{"0":{"name":"MagnitudeGxGyGz","actual_name":"...	2023-02-24 00:15:19.560820	Synced	2023-02-24 00:29:02.792429

select and set the variables for the GROUND_TRUTH_SESSION, PREDICTED_SESSION, and FILENAME from the .dcproj database

[28]:

GROUND_TRUTH_SESSION = "Training Session"
PREDICTED_SESSION = "Segmentation"
FILENAME="File1.csv"

Load the ground truth and predicted segments as DataSegments objects using the get_capture_segments API

[29]:

gt_segs = dcl.get_capture_segments(FILENAME, GROUND_TRUTH_SESSION)
pred_segs = dcl.get_capture_segments(FILENAME, PREDICTED_SESSION)
pred_segs.head()

[29]:

	label_value	capture	segment_id	capture_sample_sequence_start	capture_sample_sequence_end	session	uuid	segment_length
0	Gesture A	File1.csv	0	305	365	Segmentation	c3b730c0-15f2-4e36-ac23-45c94d35bad3	60
1	Gesture A	File1.csv	1	434	494	Segmentation	f1c880e4-d276-4409-8983-84db8abd7072	60
2	Gesture A	File1.csv	2	519	618	Segmentation	ff092a08-823b-4c56-b0dc-5f44776ee48d	99
3	Gesture A	File1.csv	3	619	679	Segmentation	7a97141f-1497-4932-a1ab-93141ff2c966	60
4	Gesture A	File1.csv	4	754	816	Segmentation	8c94d231-4851-4575-9c60-ca29851ca4e8	62

The DataSegments obj has a number of APIs which are very useful for manipulating, visualizing, and comparing segments. To see a comparison between the ground truth segments and the predicted segments use the confusion_matrix API

[30]:

pred_segs.confusion_matrix(gt_segs)

Pred Session:
Model:
-----------  ---------  ---------  ---  -------  -------
             Gesture A  Gesture B  UNK  Support  Sense %
Gesture A            7          0    0        7    100.0
Gesture B           10          0    0       10        0
UNK                  3          0    0        3        0
-----------  ---------  ---------  ---  -------  -------
Total               20          0    0       20
Pos_Pred(%)       35.0          0    0   Acc(%)     35.0
-----------  ---------  ---------  ---  -------  -------

[30]:

4 Visualize the segments against the raw sensor data using the plot_segments API

The DCLProject plot_segment_labels API can be used to visualize the Session segments overlayed on the raw sensor data. You can also use the DataSegments plot_segments API to plot only the labels directly from a DataSegments object.

[10]:

from ipywidgets import widgets
interactive_plot=widgets.interact_manual.options(manual_name="plot")


capture_length = dcl.get_capture(FILENAME).shape[0]
@interactive_plot(x_start=(0, capture_length), window=(1,capture_length))
def plot(x_start=0, window=capture_length):
    x_max = min(x_start+window, capture_length)

    x_lim = (x_start,x_max)
    _ = dcl.plot_segment_labels(FILENAME, PREDICTED_SESSION, xlim=x_lim, figsize=(30,4))
    _ = dcl.plot_segment_labels(FILENAME, GROUND_TRUTH_SESSION, xlim=x_lim, figsize=(30,4))

5 Update the predicted segments with the closest labels from the “Ground Truth” Session

For this project, the Ground Truth was labeled manually and the Predicted comes from a segmentation algorithm. To update the predicted segments with true labels from ground truth labels use the DataSegments nearest_labels API. This API returns a new DataSegments object that has updated label_values. This is used quickly update new segmentation algorithms with the correct ground truth labels without needing to manually set them.

[49]:

new_segs = pred_segs.nearest_labels(gt_segs, verbose=True)
print(f"\n\nnew segment: {len(new_segs)}")
print(f"ground truth: {len(gt_segs)}")

No matching label found for 3 setting to None
Updating 8 from Gesture A to Gesture B
Updating 9 from Gesture A to Gesture B
Updating 10 from Gesture A to Gesture B
Updating 11 from Gesture A to Gesture B
Updating 12 from Gesture A to Gesture B
Updating 13 from Gesture A to Gesture B
No matching label found for 14 setting to None
Updating 15 from Gesture A to Gesture B
Updating 16 from Gesture A to Gesture B
Updating 17 from Gesture A to Gesture B
Updating 18 from Gesture A to Gesture B
No matching label found for 19 setting to None


new segment: 20
ground truth: 17

Comparing the confusion matrix of the new segments to the ground truth gives 100% Sensitivity for Gesture A and Gesture B. The original predicted session had 0% for Gesture B. The Predicted session 20 segments and the ground truth has 17. The difference can be seen in the UNK row of the confusion matrix that has 3 in the Unknown column.

[45]:

new_segs.confusion_matrix(gt_segs)

Pred Session:
Model:
-----------  ---------  ---------  -------  ---  -------  -------
             Gesture A  Gesture B  Unknown  UNK  Support  Sense %
Gesture A            7          0        0    0        7    100.0
Gesture B            0         10        0    0       10    100.0
Unknown              0          0        0    0        0        0
UNK                  0          0        3    0        3        0
-----------  ---------  ---------  -------  ---  -------  -------
Total                7         10        3    0       20
Pos_Pred(%)      100.0      100.0        0    0   Acc(%)     85.0
-----------  ---------  ---------  -------  ---  -------  -------

[45]:

Use the DataSegments plot_segments API to plot only the labels directly from a DataSegments object.

[15]:

@interactive_plot(x_start=(0, new_segs[-1].end), window=(1,new_segs[-1].end))
def plot(x_start=0, window=new_segs[-1].end):
    x_max = min(x_start+window, capture_length)
    if x_max > capture_length:
        x_max=capture_length

    x_lim = (x_start,x_max)
    new_segs.plot_segments(figsize=(30,4), xlim=x_lim)
    gt_segs.plot_segments(figsize=(30,4), xlim=x_lim, labels=new_segs.label_values)

6 Upload the new predicted segments

The DataSegments objects upload API can be used to directly upload segments. Use this to update the Predicted Session with the correct label values.

NOTE: The Python SDK DCLProject API is READ-ONLY so the .DCLPROJ database will not be updated. To update the .DCLPROJ database with the updated labels open the Data Studio and refresh.

[ ]:

new_datasegments.upload(client, 'Unknown')

Another option is to export the segments into the .dcli format and import them into the Data Studio. The .dcli file format is an open format for versioning datasets. See the documentation for more information.

[11]:

dcli = new_datasegments.to_dcli(filename=f'{PREDICTED_SESSION}.dcli', session='Test Upload')

writing dcli file to SensiML Session.dcli

For a project with multiple files you can use the below code to update all of the segments across multiple files

[ ]:

from sensiml.dclproj import DataSegments

new_datasegments = DataSegments()
filenames = dcl.list_capture_segments(sessions=[GROUND_TRUTH_SESSION]).capture.unique().tolist()
for filename in filenames:
    gt_segs = dcl.get_capture_segments(filename, GROUND_TRUTH_SESSION)
    pred_segs = dcl.get_capture_segments(filename, PREDICTED_SESSION).nearest_labels(gt_segs, verbose=False)
    new_datasegments += pred_segs

new_datasegments.upload(client, 'Unknown')

This concludes the tutorial for using the DCLProject and DataSegments APIs to explore and manipulate segment objects programmatically using the Python SDK.