A DataSegments object is a collection of DataSegment objects. Each DataSegment object encapsulates its raw sensor data along with metadata. The DataSegments API supports plotting, manipulating, and importing/exporting to a variety of formats including DataFrames, Audacity Labels or DCLI.

Getting Segments from DCLProject

You can get all the segments from captures in any DCLProject along with the sensor data (as long as the senor data has been downloaded) using the get_capture_segments API.

from sensiml.dclproj import DCLProject

# Put the path to dclproj file for the project you would like to connect to
dclproj_path = r'<path-to.dclproj>'

dcl = DCLProject(path=dclproj_path)

capture= "<capture-name>"
gt_session = "<ground-truth-session-name>"

gt_segs = dcl.get_capture_segments([capture], sessions=[gt_session])

Manipulating Segments

There are a number of built in filters that you can use to clean up the results. You may need to build your own to post-process the dataset before computing the confusion matrix

  • join_segments: Join neighboring segments who are within a delta distance of each other

  • filter_segments: Remove segments whose length is smaller than a min width

  • merge_segments: Merge overlapping or near segments with the same label

  • remove_overlap: Remove the overlap between two segments by shrinking the overlapping segments so they are neighboring segments

from sensiml.dclproj.visualizations import plot_segment_labels

# merge overlapping segments and then filter them so only segments
# that are greater than 4000 will be kept. Results are returned as DataFrame
merged_filtered_segments = gt_segs.merge_segments().filter_segments(min_length=4000)

data = dcl.get_capture(capture)

# plot the resulting segments against the capture file
    title="Merged then filtered Segments",

Computing Confusion Matrix

You can compute the confusion matrix between DataSegments from two sessions across the same capture using the confusion_matrix API

gt_segs = dcl.get_capture_segments([capture], sessions=[gt_session])
pred_segs = dcl.get_capture_segments([capture], sessions=[pred_session])


Uploading Segments

You can also upload the segments in a DataSegments object using the upload API. We often use this in cases where want to modify segment label_values. In this example, we modify the predicted segments by updating the labels with the nearest overlapping label from the ground truth. Then we upload the updated label values.

gt_segs = dcl.get_capture_segments([capture], sessions=[gt_session])
pred_segs = dcl.get_capture_segments([capture], sessions=[pred_session])


pred_segs.upload(client, default='Unknown')

Exporting to DataFrame

You can convert any DataSegments object into a DataFrame object using the to_dataframe API.

final_segs_df = merged_filtered_segments.to_dataframe()

Importing from DataFrame

In the previous section, we ended up with a DataFrame of segments, it is straight forward to convert that back into a DataSegments object.

from sensiml.dclproj import segment_list_to_datasegments

final_segs = segment_list_to_datasegments(final_segs_df, dcl=dcl)

Exporting to DCLI Format

You can convert any DataSegments object into a DCLI file so they can be imported into the Data Studio.


Working with Audacity

Another tool that is often used when working audio data is Audacity. We provide APIs to make it easy to import/export Audacity labels into DataSegment objects.

Exporting to Audacity

You can export a DataSegments to Audacity labels using the to_audacity API. The to_audacity API creates multiple files with the naming convention file_{capture_name}_session_{session_name}.txt. These can be imported into Audacity directly going to File->Import->Labels in Audacity


Importing Audacity labels

Audacity labels can also be loaded as DataSegment objects. The following example reloads the Audacity labels we just created.

audacity_segs = audacity_to_datasegments(
        capture_name=capture, session_name=session

Visualize Audacity labels

After loading the DataSegments object, you can continue to use them as if they were native dcl segments.

data = dcl.get_capture(capture)
plot_segments_labels(audacity_segs, data=data,  title='Audacity labels')

DataSegment API

DataSegments API

sensiml.dclproj.datasegments.DataSegments.apply(self, func, **kwargs) DataFrame

Apply a function to all the segments in the DataSegments object and return a DataFrame of the resulting generated features for each datasegment.


func (_type_) – and function object which takes a DataSegment as its first input and kwargs as the following


A DataFrame of the generated features from the applied function

Return type


sensiml.dclproj.datasegments.DataSegments.confusion_matrix(self, ground_truth: dict, overlap_pct: float = 0.5) ConfusionMatrix

Generate confusion matrix with this segments and overlapping segments

  • ground_truth (DataSegments) – DataSegments to use as the ground truth

  • overlap_pct (float, optional) – amount of overlap to consider the segments overlapping. Defaults to 0.5.


confusion matrix object

Return type


sensiml.dclproj.datasegments.DataSegments.filter_by_metadata(self, filter: dict, exclude: bool = False)

Applies a filter to the datasegments and returns a filtered version

  • filter (dict) – dictionary of of lists, where the key is the metadata name to filter and the values are a list of metadata values to filter by

  • exclude (bool) – the filter is exclusive if True, otherwise the filter is inclusive

sensiml.dclproj.datasegments.DataSegments.filter_segments(self, min_length: int = 10000)

Merges data segments that are within a distance delta of each other and have the same class name.

sensiml.dclproj.datasegments.DataSegments.join_segments(self, delta: Optional[int] = None, inplace: bool = False)

Joins adjacent segments so that there is no empty space between segments.

  • dcl (DCLProject) – A DCLProject object that is connected to the DCLI file

  • delta (int) – Segments outside this range will not be joined. If None, all neighboring segments will be merged regardless of the distance. Default is None.


A DataSegments object consisting of the merged segments

Return type


sensiml.dclproj.datasegments.DataSegments.merge_label_values(self, data_segments: dict) List

Merges label values between to data segments.


data_segments (dict) – The datasegment object to merge label values with


The sorted union of the label values from both datasegments

Return type


sensiml.dclproj.datasegments.DataSegments.merge_segments(self, delta: int = 1, verbose=False)

Merge segments that overlap or are within delta of each other.


delta (int, optional) – The distance between two nonoverlapping segments where they will still be merged. Defaults to 1.


A DataFrame consisting of the merged segments

Return type


sensiml.dclproj.datasegments.DataSegments.nearest_labels(self, ground_truth_segments, overlap_pct: float = 0.5, verbose=False, keep_default=False) dict

Computes the nearest labels in the current DataSegment to a ground truth DataSegments and updates the labels with the ground truth label.

  • pred_segments (list) – list of segments from prediction file

  • ground_truth_segments (list) – list of segments from manually labeled file


A DataSegments object with the new updated labels

Return type


sensiml.dclproj.datasegments.DataSegments.remove_overlap(self, verbose: bool = False, inplace: bool = False)

Removes the overlap between segments by setting the segment start and end of overlapping segments to the same point, halfway between the overlapping edges.


dcl (DCLProject) – A DCLProject object that is connected to the DCLI file


A DataSegments object consisting of the merged segments

Return type


sensiml.dclproj.datasegments.DataSegments.to_audacity(self, rate: int = 16000) List

Creates multiple files with the naming convention file_{capture_name}session{session_name}.txt.

These can be imported into Audacity directly going to File->Import->Labels


rate (int) – Audacity uses the actual time and note number of samples. Set the rate to the sample rate for the captured date. Default is 16000.

sensiml.dclproj.datasegments.DataSegments.to_dcli(self, filename: Optional[str] = None, session: Optional[str] = None, verification_id: Optional[str] = None, session_parameters: Optional[dict] = None) List

Creates a .dcli file describing the segment information that can be imported into the Data Capture Lab

  • filename (Optional[str], optional) – The name of the file to save it to, if None no file is created.. Defaults to None.

  • session (Optional[str], optional) – The name of a session to use when creating the DCLI file. if None the session from the DataSegment objects are used.. Defaults to None.


DCLI formatted segments

Return type



Converts the datasegments into a timeseries object used by tsfresh

sensiml.dclproj.datasegments.DataSegments.upload(self, client, default_label: str, verbose: bool = True)

Upload the segments

  • client (client) – Client logged in and connected to the target project

  • default_label (str) – default label to use

  • verbose (bool, optional) – prints out info. Defaults to True.

DataSegments Loader API

sensiml.dclproj.loaders.import_audacity(capture_name, file_path: str, session: str = '', rate: int = 16000)

Converts labels exported from Audacity into a datasegment object.

  • capture_name (str) – The name of the capture file to import

  • file_path (DataFrame) – The file path to the Audacity Label

  • session (str, optional) – The session to set the segments too. Defaults to “”.

  • rate (int) – Audacity uses the actual time and note number of samples. Set the rate to the sample rate for the captured date. Default is 16000.

  • data (DataFrame) – The data associated with the audacity labels



sensiml.dclproj.loaders.import_segment_list(labels: DataFrame, session: str = '', dcl: Optional[object] = None)

Converts a DataFrame of segments into a DataSegments object

  • labels (DataFrame) – A dataframe containing the segment information

  • session (str, optional) – The session to set the segments too. Defaults to “”.

  • dcl (DCLProject) – A DCLProject object that is connected to the DCLI file, If this is passed in the data property of the DataSegment objects will be filled with sensor data



DataSegments Segmenter API

sensiml.dclproj.segmentation.sliding_window(input_data: DataSegments, window_size: int, delta: int, label: str = 'Unknown') DataSegments

Returns the sliding window of datasegments across all datasegments in the input_data

  • input_data (DataSegments) – Datasegments to apply the sliding window too

  • window_size (int) – size of the sliding window

  • delta (int) – slide of the sliding window

  • label (str, optional) – Default label to use when creating new segments if None exists. Defaults to “Unknown”.