DataSegments

A DataSegments object is a collection of DataSegment objects. Each DataSegment object encapsulates its raw sensor data along with metadata. The DataSegments API supports plotting, manipulating, and importing/exporting to a variety of formats including DataFrames, Audacity Labels or DCLI.

Getting Segments from DCLProject

You can get all the segments from captures in any DCLProject along with the sensor data (as long as the senor data has been downloaded) using the get_capture_segments API.

from sensiml.dclproj import DCLProject

# Put the path to dclproj file for the project you would like to connect to
dclproj_path = r'<path-to.dclproj>'

dcl = DCLProject(path=dclproj_path)

capture= "<capture-name>"
gt_session = "<ground-truth-session-name>"

gt_segs = dcl.get_capture_segments([capture], sessions=[gt_session])

Manipulating Segments

There are a number of built in filters that you can use to clean up the results. You may need to build your own to post-process the dataset before computing the confusion matrix

  • join_segments: Join neighboring segments who are within a delta distance of each other

  • filter_segments: Remove segments whose length is smaller than a min width

  • merge_segments: Merge overlapping or near segments with the same label

  • remove_overlap: Remove the overlap between two segments by shrinking the overlapping segments so they are neighboring segments

from sensiml.dclproj.visualizations import plot_segment_labels

# merge overlapping segments and then filter them so only segments
# that are greater than 4000 will be kept. Results are returned as DataFrame
merged_filtered_segments = gt_segs.merge_segments().filter_segments(min_length=4000)

data = dcl.get_capture(capture)

# plot the resulting segments against the capture file
plot_segments_labels(
    merged_filtered_segments,
    data=data,
    labels=gt_segs.label_values,
    title="Merged then filtered Segments",
)

Computing Confusion Matrix

You can compute the confusion matrix between DataSegments from two sessions across the same capture using the confusion_matrix API

gt_segs = dcl.get_capture_segments([capture], sessions=[gt_session])
pred_segs = dcl.get_capture_segments([capture], sessions=[pred_session])

pred_segs.confusion_matrix(gt_segs)

Uploading Segments

You can also upload the segments in a DataSegments object using the upload API. We often use this in cases where want to modify segment label_values. In this example, we modify the predicted segments by updating the labels with the nearest overlapping label from the ground truth. Then we upload the updated label values.

gt_segs = dcl.get_capture_segments([capture], sessions=[gt_session])
pred_segs = dcl.get_capture_segments([capture], sessions=[pred_session])

pred_segs.nearest_label(gt_segs)

pred_segs.upload(client, default='Unknown')

Exporting to DataFrame

You can convert any DataSegments object into a DataFrame object using the to_dataframe API.

final_segs_df = merged_filtered_segments.to_dataframe()

Importing from DataFrame

In the previous section, we ended up with a DataFrame of segments, it is straight forward to convert that back into a DataSegments object.

from sensiml.dclproj import segment_list_to_datasegments

final_segs = segment_list_to_datasegments(final_segs_df, dcl=dcl)

Exporting to DCLI Format

You can convert any DataSegments object into a DCLI file so they can be imported into the Data Studio.

final_segs.to_dcli('final_segs.dcli')

Working with Audacity

Another tool that is often used when working audio data is Audacity. We provide APIs to make it easy to import/export Audacity labels into DataSegment objects.

Exporting to Audacity

You can export a DataSegments to Audacity labels using the to_audacity API. The to_audacity API creates multiple files with the naming convention file_{capture_name}_session_{session_name}.txt. These can be imported into Audacity directly going to File->Import->Labels in Audacity

gt_segs.to_audacity()

Importing Audacity labels

Audacity labels can also be loaded as DataSegment objects. The following example reloads the Audacity labels we just created.

audacity_segs = audacity_to_datasegments(
    dcl,
    capture_name=capture,
    file_path="file_{capture_name}_session_{session_name}.txt".format(
        capture_name=capture, session_name=session
    ),
)

Visualize Audacity labels

After loading the DataSegments object, you can continue to use them as if they were native dcl segments.

data = dcl.get_capture(capture)
plot_segments_labels(audacity_segs, data=data,  title='Audacity labels')
../../_images/audacity_labels.png

DataSegment API

DataSegments API

sensiml.dclproj.datasegments.DataSegments.apply(self, func, **kwargs) DataFrame

Apply a function to all the segments in the DataSegments object and return a DataFrame of the resulting generated features for each datasegment.

Parameters

func (_type_) – and function object which takes a DataSegment as its first input and kwargs as the following

Returns

A DataFrame of the generated features from the applied function

Return type

DataFrame

sensiml.dclproj.datasegments.DataSegments.confusion_matrix(self, ground_truth: dict, overlap_pct: float = 0.5) ConfusionMatrix

Generate confusion matrix with this segments and overlapping segments

Parameters
  • ground_truth (DataSegments) – DataSegments to use as the ground truth

  • overlap_pct (float, optional) – amount of overlap to consider the segments overlapping. Defaults to 0.5.

Returns

confusion matrix object

Return type

ConfusionMatrix

sensiml.dclproj.datasegments.DataSegments.filter_by_metadata(self, filter: dict, exclude: bool = False)

Applies a filter to the datasegments and returns a filtered version

Parameters
  • filter (dict) – dictionary of of lists, where the key is the metadata name to filter and the values are a list of metadata values to filter by

  • exclude (bool) – the filter is exclusive if True, otherwise the filter is inclusive

sensiml.dclproj.datasegments.DataSegments.filter_segments(self, min_length: int = 10000)

Merges data segments that are within a distance delta of each other and have the same class name.

sensiml.dclproj.datasegments.DataSegments.join_segments(self, delta: Optional[int] = None, inplace: bool = False)

Joins adjacent segments so that there is no empty space between segments.

Parameters
  • dcl (DCLProject) – A DCLProject object that is connected to the DCLI file

  • delta (int) – Segments outside this range will not be joined. If None, all neighboring segments will be merged regardless of the distance. Default is None.

Returns

A DataSegments object consisting of the merged segments

Return type

DataSegments

sensiml.dclproj.datasegments.DataSegments.merge_label_values(self, data_segments: dict) List

Merges label values between to data segments.

Parameters

data_segments (dict) – The datasegment object to merge label values with

Returns

The sorted union of the label values from both datasegments

Return type

List

sensiml.dclproj.datasegments.DataSegments.merge_segments(self, delta: int = 1, verbose=False)

Merge segments that overlap or are within delta of each other.

Parameters

delta (int, optional) – The distance between two nonoverlapping segments where they will still be merged. Defaults to 1.

Returns

A DataFrame consisting of the merged segments

Return type

DataFrame

sensiml.dclproj.datasegments.DataSegments.nearest_labels(self, ground_truth_segments, overlap_pct: float = 0.5, verbose=False, keep_default=False) dict

Computes the nearest labels in the current DataSegment to a ground truth DataSegments and updates the labels with the ground truth label.

Parameters
  • pred_segments (list) – list of segments from prediction file

  • ground_truth_segments (list) – list of segments from manually labeled file

Returns

A DataSegments object with the new updated labels

Return type

DataSegment

sensiml.dclproj.datasegments.DataSegments.remove_overlap(self, verbose: bool = False, inplace: bool = False)

Removes the overlap between segments by setting the segment start and end of overlapping segments to the same point, halfway between the overlapping edges.

Parameters

dcl (DCLProject) – A DCLProject object that is connected to the DCLI file

Returns

A DataSegments object consisting of the merged segments

Return type

DataSegments

sensiml.dclproj.datasegments.DataSegments.to_audacity(self, rate: int = 16000) List

Creates multiple files with the naming convention file_{capture_name}session{session_name}.txt.

These can be imported into Audacity directly going to File->Import->Labels

Parameters

rate (int) – Audacity uses the actual time and note number of samples. Set the rate to the sample rate for the captured date. Default is 16000.

sensiml.dclproj.datasegments.DataSegments.to_dcli(self, filename: Optional[str] = None, session: Optional[str] = None, verification_id: Optional[str] = None, session_parameters: Optional[dict] = None) List

Creates a .dcli file describing the segment information that can be imported into the Data Capture Lab

Parameters
  • filename (Optional[str], optional) – The name of the file to save it to, if None no file is created.. Defaults to None.

  • session (Optional[str], optional) – The name of a session to use when creating the DCLI file. if None the session from the DataSegment objects are used.. Defaults to None.

Returns

DCLI formatted segments

Return type

List

sensiml.dclproj.datasegments.DataSegments.to_timeseries(self)

Converts the datasegments into a timeseries object used by tsfresh

sensiml.dclproj.datasegments.DataSegments.upload(self, client, default_label: str, verbose: bool = True)

Upload the segments

Parameters
  • client (client) – Client logged in and connected to the target project

  • default_label (str) – default label to use

  • verbose (bool, optional) – prints out info. Defaults to True.

DataSegments Loader API

sensiml.dclproj.loaders.import_audacity(capture_name, file_path: str, session: str = '', rate: int = 16000)

Converts labels exported from Audacity into a datasegment object.

Parameters
  • capture_name (str) – The name of the capture file to import

  • file_path (DataFrame) – The file path to the Audacity Label

  • session (str, optional) – The session to set the segments too. Defaults to “”.

  • rate (int) – Audacity uses the actual time and note number of samples. Set the rate to the sample rate for the captured date. Default is 16000.

  • data (DataFrame) – The data associated with the audacity labels

Returns

Datasegments

sensiml.dclproj.loaders.import_segment_list(labels: DataFrame, session: str = '', dcl: Optional[object] = None)

Converts a DataFrame of segments into a DataSegments object

Parameters
  • labels (DataFrame) – A dataframe containing the segment information

  • session (str, optional) – The session to set the segments too. Defaults to “”.

  • dcl (DCLProject) – A DCLProject object that is connected to the DCLI file, If this is passed in the data property of the DataSegment objects will be filled with sensor data

Returns

DataSegments

DataSegments Segmenter API

sensiml.dclproj.segmentation.sliding_window(input_data: DataSegments, window_size: int, delta: int, label: str = 'Unknown') DataSegments

Returns the sliding window of datasegments across all datasegments in the input_data

Parameters
  • input_data (DataSegments) – Datasegments to apply the sliding window too

  • window_size (int) – size of the sliding window

  • delta (int) – slide of the sliding window

  • label (str, optional) – Default label to use when creating new segments if None exists. Defaults to “Unknown”.

Returns

DataSegments