DataSegments
A DataSegments object is a collection of DataSegment objects. Each DataSegment object encapsulates its raw sensor data along with metadata. The DataSegments API supports plotting, manipulating, and importing/exporting to a variety of formats including DataFrames, Audacity Labels or DCLI.
Getting Segments from DCLProject
You can get all the segments from captures in any DCLProject along with the sensor data (as long as the senor data has been downloaded) using the get_capture_segments API.
from sensiml.dclproj import DCLProject
# Put the path to dclproj file for the project you would like to connect to
dclproj_path = r'<path-to.dclproj>'
dcl = DCLProject(path=dclproj_path)
capture= "<capture-name>"
gt_session = "<ground-truth-session-name>"
gt_segs = dcl.get_capture_segments([capture], sessions=[gt_session])
Manipulating Segments
There are a number of built in filters that you can use to clean up the results. You may need to build your own to post-process the dataset before computing the confusion matrix
join_segments: Join neighboring segments who are within a delta distance of each other
filter_segments: Remove segments whose length is smaller than a min width
merge_segments: Merge overlapping or near segments with the same label
remove_overlap: Remove the overlap between two segments by shrinking the overlapping segments so they are neighboring segments
from sensiml.dclproj.visualizations import plot_segment_labels
# merge overlapping segments and then filter them so only segments
# that are greater than 4000 will be kept. Results are returned as DataFrame
merged_filtered_segments = gt_segs.merge_segments().filter_segments(min_length=4000)
data = dcl.get_capture(capture)
# plot the resulting segments against the capture file
plot_segments_labels(
merged_filtered_segments,
data=data,
labels=gt_segs.label_values,
title="Merged then filtered Segments",
)
Computing Confusion Matrix
You can compute the confusion matrix between DataSegments from two sessions across the same capture using the confusion_matrix API
gt_segs = dcl.get_capture_segments([capture], sessions=[gt_session])
pred_segs = dcl.get_capture_segments([capture], sessions=[pred_session])
pred_segs.confusion_matrix(gt_segs)
Uploading Segments
You can also upload the segments in a DataSegments object using the upload API. We often use this in cases where want to modify segment label_values. In this example, we modify the predicted segments by updating the labels with the nearest overlapping label from the ground truth. Then we upload the updated label values.
gt_segs = dcl.get_capture_segments([capture], sessions=[gt_session])
pred_segs = dcl.get_capture_segments([capture], sessions=[pred_session])
pred_segs.nearest_label(gt_segs)
pred_segs.upload(client, default='Unknown')
Exporting to DataFrame
You can convert any DataSegments object into a DataFrame object using the to_dataframe API.
final_segs_df = merged_filtered_segments.to_dataframe()
Importing from DataFrame
In the previous section, we ended up with a DataFrame of segments, it is straight forward to convert that back into a DataSegments object.
from sensiml.dclproj import segment_list_to_datasegments
final_segs = segment_list_to_datasegments(final_segs_df, dcl=dcl)
Exporting to DCLI Format
You can convert any DataSegments object into a DCLI file so they can be imported into the Data Studio.
final_segs.to_dcli('final_segs.dcli')
Working with Audacity
Another tool that is often used when working audio data is Audacity. We provide APIs to make it easy to import/export Audacity labels into DataSegment objects.
Exporting to Audacity
You can export a DataSegments to Audacity labels using the to_audacity API. The to_audacity API creates multiple files with the naming convention file_{capture_name}_session_{session_name}.txt. These can be imported into Audacity directly going to File->Import->Labels in Audacity
gt_segs.to_audacity()
Importing Audacity labels
Audacity labels can also be loaded as DataSegment objects. The following example reloads the Audacity labels we just created.
audacity_segs = audacity_to_datasegments(
dcl,
capture_name=capture,
file_path="file_{capture_name}_session_{session_name}.txt".format(
capture_name=capture, session_name=session
),
)
Visualize Audacity labels
After loading the DataSegments object, you can continue to use them as if they were native dcl segments.
data = dcl.get_capture(capture)
plot_segments_labels(audacity_segs, data=data, title='Audacity labels')
DataSegment API
DataSegments API
- sensiml.dclproj.datasegments.DataSegments.apply(self, func, **kwargs) DataFrame
Apply a function to all the segments in the DataSegments object and return a DataFrame of the resulting generated features for each datasegment.
- Parameters
func (_type_) – and function object which takes a DataSegment as its first input and kwargs as the following
- Returns
A DataFrame of the generated features from the applied function
- Return type
DataFrame
- sensiml.dclproj.datasegments.DataSegments.confusion_matrix(self, ground_truth: dict, overlap_pct: float = 0.5) ConfusionMatrix
Generate confusion matrix with this segments and overlapping segments
- Parameters
ground_truth (DataSegments) – DataSegments to use as the ground truth
overlap_pct (float, optional) – amount of overlap to consider the segments overlapping. Defaults to 0.5.
- Returns
confusion matrix object
- Return type
- sensiml.dclproj.datasegments.DataSegments.filter_by_metadata(self, filter: dict, exclude: bool = False)
Applies a filter to the datasegments and returns a filtered version
- Parameters
filter (dict) – dictionary of of lists, where the key is the metadata name to filter and the values are a list of metadata values to filter by
exclude (bool) – the filter is exclusive if True, otherwise the filter is inclusive
- sensiml.dclproj.datasegments.DataSegments.filter_segments(self, min_length: int = 10000)
Merges data segments that are within a distance delta of each other and have the same class name.
- sensiml.dclproj.datasegments.DataSegments.join_segments(self, delta: Optional[int] = None, inplace: bool = False)
Joins adjacent segments so that there is no empty space between segments.
- Parameters
dcl (DCLProject) – A DCLProject object that is connected to the DCLI file
delta (int) – Segments outside this range will not be joined. If None, all neighboring segments will be merged regardless of the distance. Default is None.
- Returns
A DataSegments object consisting of the merged segments
- Return type
DataSegments
- sensiml.dclproj.datasegments.DataSegments.merge_label_values(self, data_segments: dict) List
Merges label values between to data segments.
- Parameters
data_segments (dict) – The datasegment object to merge label values with
- Returns
The sorted union of the label values from both datasegments
- Return type
List
- sensiml.dclproj.datasegments.DataSegments.merge_segments(self, delta: int = 1, verbose=False)
Merge segments that overlap or are within delta of each other.
- Parameters
delta (int, optional) – The distance between two nonoverlapping segments where they will still be merged. Defaults to 1.
- Returns
A DataFrame consisting of the merged segments
- Return type
DataFrame
- sensiml.dclproj.datasegments.DataSegments.nearest_labels(self, ground_truth_segments, overlap_pct: float = 0.5, verbose=False, keep_default=False) dict
Computes the nearest labels in the current DataSegment to a ground truth DataSegments and updates the labels with the ground truth label.
- Parameters
pred_segments (list) – list of segments from prediction file
ground_truth_segments (list) – list of segments from manually labeled file
- Returns
A DataSegments object with the new updated labels
- Return type
DataSegment
- sensiml.dclproj.datasegments.DataSegments.remove_overlap(self, verbose: bool = False, inplace: bool = False)
Removes the overlap between segments by setting the segment start and end of overlapping segments to the same point, halfway between the overlapping edges.
- Parameters
dcl (DCLProject) – A DCLProject object that is connected to the DCLI file
- Returns
A DataSegments object consisting of the merged segments
- Return type
DataSegments
- sensiml.dclproj.datasegments.DataSegments.to_audacity(self, rate: int = 16000) List
Creates multiple files with the naming convention file_{capture_name}session{session_name}.txt.
These can be imported into Audacity directly going to File->Import->Labels
- Parameters
rate (int) – Audacity uses the actual time and note number of samples. Set the rate to the sample rate for the captured date. Default is 16000.
- sensiml.dclproj.datasegments.DataSegments.to_dcli(self, filename: Optional[str] = None, session: Optional[str] = None, verification_id: Optional[str] = None, session_parameters: Optional[dict] = None) List
Creates a .dcli file describing the segment information that can be imported into the Data Capture Lab
- Parameters
filename (Optional[str], optional) – The name of the file to save it to, if None no file is created.. Defaults to None.
session (Optional[str], optional) – The name of a session to use when creating the DCLI file. if None the session from the DataSegment objects are used.. Defaults to None.
- Returns
DCLI formatted segments
- Return type
List
- sensiml.dclproj.datasegments.DataSegments.to_timeseries(self)
Converts the datasegments into a timeseries object used by tsfresh
- sensiml.dclproj.datasegments.DataSegments.upload(self, client, default_label: str, verbose: bool = True)
Upload the segments
- Parameters
client (client) – Client logged in and connected to the target project
default_label (str) – default label to use
verbose (bool, optional) – prints out info. Defaults to True.
DataSegments Loader API
- sensiml.dclproj.loaders.import_audacity(capture_name, file_path: str, session: str = '', rate: int = 16000)
Converts labels exported from Audacity into a datasegment object.
- Parameters
capture_name (str) – The name of the capture file to import
file_path (DataFrame) – The file path to the Audacity Label
session (str, optional) – The session to set the segments too. Defaults to “”.
rate (int) – Audacity uses the actual time and note number of samples. Set the rate to the sample rate for the captured date. Default is 16000.
data (DataFrame) – The data associated with the audacity labels
- Returns
Datasegments
- sensiml.dclproj.loaders.import_segment_list(labels: DataFrame, session: str = '', dcl: Optional[object] = None)
Converts a DataFrame of segments into a DataSegments object
- Parameters
labels (DataFrame) – A dataframe containing the segment information
session (str, optional) – The session to set the segments too. Defaults to “”.
dcl (DCLProject) – A DCLProject object that is connected to the DCLI file, If this is passed in the data property of the DataSegment objects will be filled with sensor data
- Returns
DataSegments
DataSegments Segmenter API
- sensiml.dclproj.segmentation.sliding_window(input_data: DataSegments, window_size: int, delta: int, label: str = 'Unknown') DataSegments
Returns the sliding window of datasegments across all datasegments in the input_data
- Parameters
input_data (DataSegments) – Datasegments to apply the sliding window too
window_size (int) – size of the sliding window
delta (int) – slide of the sliding window
label (str, optional) – Default label to use when creating new segments if None exists. Defaults to “Unknown”.
- Returns
DataSegments