Apply Ground Truth to Generated Segments
In this tutorial you will learn how to apply labels from a ground truth session to labels generated from a segmentation algorithm. In your own workflow, you may generate an autosegmentation algorithm and label all of the segments as Unknown. Then use this workflow to update them with the ground truth values. This makes experimenting with new segmentation algorithms much quicker.
Try It Yourself
You can download the Notebook file here to follow along with this tutorial in your own environment.
Prerequisites
Download the Demo Project and extract into this folder
To see the visualizations install ipywidgets
Update to the latest SensiML Python SDK
[ ]:
!pip install ipywidgets sensiml -U
Connecting to a Data Studio project
The python SDK can be used to interface directly with a .dclproj file in READ only mode. In this tutorial we will
Load a .DCLPROJ file into the Python SDK using the DCLProject API
Upload a .DCLPROJ file as a new project to the server using the Client
Compare the segments in the “Training Session” and “Predicted” Session using the confusion_matrix DataSegments API
Visualize the segments against the raw sensor data using the plot_segments API
Update the predicted segments with the closest labels from the “Ground Truth” Session
Upload the new predicted segments
1 Load a .DCLPROJ file into the Python SDK using the DCLProject API
The DCLProject API provides read-only access to a .DCLPROJ file which is the native database format of a Data Studio project. To load the project
[46]:
from sensiml.dclproj import DCLProject
DCLP_PROJECT_PATH = "DemoProject/DemoProject.dclproj"
dcl = DCLProject(path=DCLP_PROJECT_PATH)
2 Upload a .DCLPROJ file using the Python SDK Client
You can use the client to upload either a .DCLPROJ or .DCLI file. The .DCLPROJ format is a database format used by the Data Studio.
[ ]:
from sensiml import Client
client = Client()
client.upload_project("Segment Demo Project", DCLP_PROJECT_PATH)
3 Compare the segments in the “Training Session” and “Predicted Session”
We will use the DCLProject API to get access to the segment information that is stored in the .DCLPROJ database.
To see the list of captures in the database use the .list_captures API
[21]:
dcl.list_captures()
[21]:
uuid | name | file_size | number_samples | set_sample_rate | created_at | local_status | last_modified | capture_configuration | set | Device | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | adb3f19a-cf85-4b9c-b3e5-8bbf393bfb99 | File1.csv | 124043 | 3402 | 119 | 2023-02-24 00:16:27.740546 | Synced | 2023-02-24 00:30:28.177732 | Nano 33 BLE Sense IMU | train | Nano33 BLE Sense - 127.0.0.1:5555 |
To see a list of sessions in the database us the .list_sessions API
[22]:
dcl.list_sessions()
[22]:
id | name | parameters | custom | preprocess | created_at | local_status | last_modified | |
---|---|---|---|---|---|---|---|---|
0 | 1 | Training Session | None | 1 | None | 2023-02-24 00:15:18.719975 | Synced | 2023-02-24 00:28:45.751122 |
1 | 2 | Segmentation | {"inputs":{"input_data":"","first_column_of_in... | 0 | {"0":{"name":"MagnitudeGxGyGz","actual_name":"... | 2023-02-24 00:15:19.560820 | Synced | 2023-02-24 00:29:02.792429 |
select and set the variables for the GROUND_TRUTH_SESSION, PREDICTED_SESSION, and FILENAME from the .dcproj database
[28]:
GROUND_TRUTH_SESSION = "Training Session"
PREDICTED_SESSION = "Segmentation"
FILENAME="File1.csv"
Load the ground truth and predicted segments as DataSegments objects using the get_capture_segments API
[29]:
gt_segs = dcl.get_capture_segments(FILENAME, GROUND_TRUTH_SESSION)
pred_segs = dcl.get_capture_segments(FILENAME, PREDICTED_SESSION)
pred_segs.head()
[29]:
label_value | capture | segment_id | capture_sample_sequence_start | capture_sample_sequence_end | session | uuid | segment_length | |
---|---|---|---|---|---|---|---|---|
0 | Gesture A | File1.csv | 0 | 305 | 365 | Segmentation | c3b730c0-15f2-4e36-ac23-45c94d35bad3 | 60 |
1 | Gesture A | File1.csv | 1 | 434 | 494 | Segmentation | f1c880e4-d276-4409-8983-84db8abd7072 | 60 |
2 | Gesture A | File1.csv | 2 | 519 | 618 | Segmentation | ff092a08-823b-4c56-b0dc-5f44776ee48d | 99 |
3 | Gesture A | File1.csv | 3 | 619 | 679 | Segmentation | 7a97141f-1497-4932-a1ab-93141ff2c966 | 60 |
4 | Gesture A | File1.csv | 4 | 754 | 816 | Segmentation | 8c94d231-4851-4575-9c60-ca29851ca4e8 | 62 |
The DataSegments obj has a number of APIs which are very useful for manipulating, visualizing, and comparing segments. To see a comparison between the ground truth segments and the predicted segments use the confusion_matrix API
[30]:
pred_segs.confusion_matrix(gt_segs)
Pred Session:
Model:
----------- --------- --------- --- ------- -------
Gesture A Gesture B UNK Support Sense %
Gesture A 7 0 0 7 100.0
Gesture B 10 0 0 10 0
UNK 3 0 0 3 0
----------- --------- --------- --- ------- -------
Total 20 0 0 20
Pos_Pred(%) 35.0 0 0 Acc(%) 35.0
----------- --------- --------- --- ------- -------
[30]:
4 Visualize the segments against the raw sensor data using the plot_segments API
The DCLProject plot_segment_labels API can be used to visualize the Session segments overlayed on the raw sensor data. You can also use the DataSegments plot_segments API to plot only the labels directly from a DataSegments object.
[10]:
from ipywidgets import widgets
interactive_plot=widgets.interact_manual.options(manual_name="plot")
capture_length = dcl.get_capture(FILENAME).shape[0]
@interactive_plot(x_start=(0, capture_length), window=(1,capture_length))
def plot(x_start=0, window=capture_length):
x_max = min(x_start+window, capture_length)
x_lim = (x_start,x_max)
_ = dcl.plot_segment_labels(FILENAME, PREDICTED_SESSION, xlim=x_lim, figsize=(30,4))
_ = dcl.plot_segment_labels(FILENAME, GROUND_TRUTH_SESSION, xlim=x_lim, figsize=(30,4))
5 Update the predicted segments with the closest labels from the “Ground Truth” Session
For this project, the Ground Truth was labeled manually and the Predicted comes from a segmentation algorithm. To update the predicted segments with true labels from ground truth labels use the DataSegments nearest_labels API. This API returns a new DataSegments object that has updated label_values. This is used quickly update new segmentation algorithms with the correct ground truth labels without needing to manually set them.
[49]:
new_segs = pred_segs.nearest_labels(gt_segs, verbose=True)
print(f"\n\nnew segment: {len(new_segs)}")
print(f"ground truth: {len(gt_segs)}")
No matching label found for 3 setting to None
Updating 8 from Gesture A to Gesture B
Updating 9 from Gesture A to Gesture B
Updating 10 from Gesture A to Gesture B
Updating 11 from Gesture A to Gesture B
Updating 12 from Gesture A to Gesture B
Updating 13 from Gesture A to Gesture B
No matching label found for 14 setting to None
Updating 15 from Gesture A to Gesture B
Updating 16 from Gesture A to Gesture B
Updating 17 from Gesture A to Gesture B
Updating 18 from Gesture A to Gesture B
No matching label found for 19 setting to None
new segment: 20
ground truth: 17
Comparing the confusion matrix of the new segments to the ground truth gives 100% Sensitivity for Gesture A and Gesture B. The original predicted session had 0% for Gesture B. The Predicted session 20 segments and the ground truth has 17. The difference can be seen in the UNK row of the confusion matrix that has 3 in the Unknown column.
[45]:
new_segs.confusion_matrix(gt_segs)
Pred Session:
Model:
----------- --------- --------- ------- --- ------- -------
Gesture A Gesture B Unknown UNK Support Sense %
Gesture A 7 0 0 0 7 100.0
Gesture B 0 10 0 0 10 100.0
Unknown 0 0 0 0 0 0
UNK 0 0 3 0 3 0
----------- --------- --------- ------- --- ------- -------
Total 7 10 3 0 20
Pos_Pred(%) 100.0 100.0 0 0 Acc(%) 85.0
----------- --------- --------- ------- --- ------- -------
[45]:
Use the DataSegments plot_segments API to plot only the labels directly from a DataSegments object.
[15]:
@interactive_plot(x_start=(0, new_segs[-1].end), window=(1,new_segs[-1].end))
def plot(x_start=0, window=new_segs[-1].end):
x_max = min(x_start+window, capture_length)
if x_max > capture_length:
x_max=capture_length
x_lim = (x_start,x_max)
new_segs.plot_segments(figsize=(30,4), xlim=x_lim)
gt_segs.plot_segments(figsize=(30,4), xlim=x_lim, labels=new_segs.label_values)
6 Upload the new predicted segments
The DataSegments objects upload API can be used to directly upload segments. Use this to update the Predicted Session with the correct label values.
NOTE: The Python SDK DCLProject API is READ-ONLY so the .DCLPROJ database will not be updated. To update the .DCLPROJ database with the updated labels open the Data Studio and refresh.
[ ]:
new_datasegments.upload(client, 'Unknown')
Another option is to export the segments into the .dcli format and import them into the Data Studio. The .dcli file format is an open format for versioning datasets. See the documentation for more information.
[11]:
dcli = new_datasegments.to_dcli(filename=f'{PREDICTED_SESSION}.dcli', session='Test Upload')
writing dcli file to SensiML Session.dcli
For a project with multiple files you can use the below code to update all of the segments across multiple files
[ ]:
from sensiml.dclproj import DataSegments
new_datasegments = DataSegments()
filenames = dcl.list_capture_segments(sessions=[GROUND_TRUTH_SESSION]).capture.unique().tolist()
for filename in filenames:
gt_segs = dcl.get_capture_segments(filename, GROUND_TRUTH_SESSION)
pred_segs = dcl.get_capture_segments(filename, PREDICTED_SESSION).nearest_labels(gt_segs, verbose=False)
new_datasegments += pred_segs
new_datasegments.upload(client, 'Unknown')
This concludes the tutorial for using the DCLProject and DataSegments APIs to explore and manipulate segment objects programmatically using the Python SDK.