DataSegments

A DataSegments object is a collection of DataSegment objects. Each DataSegment object encapsulates its raw sensor data along with metadata. The DataSegments API supports plotting, manipulating, and importing/exporting to a variety of formats including DataFrames, Audacity Labels or DCLI.

Getting Segments from DCLProject

You can get all the segments from captures in any DCLProject along with the sensor data (as long as the senor data has been downloaded) using the get_capture_segments API.

from sensiml.dclproj import DCLProject

dcl = DCLProject()
# Put the path to dclproj file for the project you would like to connect to
dclproj_path = r'<path-to.dclproj>'
dcl.create_connection(dclproj_path)

capture= "<capture-name>"
gt_session = "<ground-truth-session-name>"

gt_segs = dcl.get_capture_segments([capture], sessions=[gt_session])

Manipulating Segments

There are a number of built in filters that you can use to clean up the results. You may need to build your own to post-process the dataset before computing the confusion matrix

  • join_segments: Join neighboring segments who are within a delta distance of each other

  • filter_segments: Remove segments whose length is smaller than a min width

  • merge_segments: Merge overlapping or near segments with the same label

  • remove_overlap: Remove the overlap between two segments by shrinking the overlapping segments so they are neighboring segments

from sensiml.dclproj.datasegments import (filter_segments, join_segments,
                                        merge_segments, remove_overlap)

# merge overlapping segments and then filter them so only segments
# that are greater than 4000 will be kept. Results are returned as DataFrame
filter_merged_sg_df = filter_segments(
    merge_segments(gt_segs.to_dataframe), min_length=4000
)

data = dcl.get_capture(capture)

# plot the resulting segments against the capture file
plot_segments_labels(
    filter_merged_sg_df,
    data=data,
    labels=gt_segs.label_values,
    title="Merged then filtered Segments",
)

Importing from DataFrame

In the previous section, we ended up with a DataFrame of segments, it is straight forward to convert that back into a DataSegments object.

from sensiml.dclproj import segment_list_to_datasegments

final_segs = segment_list_to_datasegments(dcl, filter_merged_sg_df)

Exporting to DataFrame

You can convert any DataSegments object into a DataFrame object using the to_dataframe API.

final_segs_df = final_segs.to_dataframe()

Exporting to DCLI Format

You can convert any DataSegments object into a DCLI file so they can be imported into the Data Capture Lab.

final_segs.to_dcli('final_segs.dcli')

Working with Audacity

Another tool that is often used when working audio data is Audacity. We provide APIs to make it easy to import/export Audacity labels into DataSegment objects.

Exporting to Audacity

You can export a DataSegments to Audacity labels using the to_audacity API. The to_audacity API creates multiple files with the naming convention file_{capture_name}_session_{session_name}.txt. These can be imported into Audacity directly going to File->Import->Labels in Audacity

gt_segs.to_audacity()

Importing Audacity labels

Audacity labels can also be loaded as DataSegment objects. The following example reloads the Audacity labels we just created.

audacity_segs = audacity_to_datasegments(
    dcl,
    capture_name=capture,
    file_path="file_{capture_name}_session_{session_name}.txt".format(
        capture_name=capture, session_name=session
    ),
)

Visualize Audacity labels

After loading the DataSegments object, you can continue to use them as if they were native dcl segments.

data = dcl.get_capture(capture)
plot_segments_labels(audacity_segs, data=data,  title='Audacity labels')
../../_images/audacity_labels.png

DataSegment API

sensiml.dclproj.datasegments.DataSegment.plot_frequency(self, channel: str, sample_freq: int = 16000, figsize: Tuple = (30, 8), **kwargs)

Plots the signal data, the spectrogram and the MFCC spectrogram.

Parameters
  • channel (str) – the channel/column of sensor data to use

  • sample_freq (int, optional) – The frequency of the sample data. Defaults to 1600.

  • figsize (Tuple, optional) – the size of the figure that will be created. Defaults to (30, 4).

sensiml.dclproj.datasegments.DataSegment.plot_mfcc(self, channel: str, sample_freq: int = 16000, figsize: Tuple = (30, 4))

Plots the MFCC spectrogram for the signal.

Parameters
  • channel (str) – the channel/column of sensor data to use

  • sample_freq (int, optional) – The frequency of the sample data. Defaults to 1600.

  • figsize (Tuple, optional) – the size of the figure that will be created. Defaults to (30, 4).

sensiml.dclproj.datasegments.DataSegment.plot_spectrogram(self, channel: str, fft_length: int = 512, figsize: Tuple = (30, 4), **kwargs)

Plots the spectrogram for the signal.

Parameters
  • channel (str) – the channel/column of sensor data to use

  • fft_length (int, optional) – The size of the FTT length to use when computing the spectrogram. Defaults to 512.

  • figsize (Tuple, optional) – the size of the figure that will be created. Defaults to (30, 4).

DataSegments API

DataSegments is a dictionary of DataSegment objects with additional APIs

sensiml.dclproj.datasegments.DataSegments.export(self, folder: str = 'segment_export')

Exports all the segments to the specified folder as individual <UUID>.csv files. The metadata is stored in a metadata.json file which has all the info about each segment.

Parameters

folder (str, optional) – The folder to export to. Defaults to “segment_export”.

sensiml.dclproj.datasegments.DataSegments.filter_segments(self, dcl, min_length: int)

Filters out segments below a minimum length

Parameters
  • dcl (DCLProject) – A DCLProject object that is connected to the DCLI file

  • min_length (int) – Segments below this will be filtered

Returns

A DataSegments object consisting of the merged segments

Return type

DataSegments

sensiml.dclproj.datasegments.DataSegments.filter_segments_df(self, min_length: int) pandas.core.frame.DataFrame

Filters out segments below a minimum length

Parameters

min_length (int) – Segments below this will be filtered

Returns

A DataFrame consisting of the segments not filtered

Return type

DataFrame

sensiml.dclproj.datasegments.DataSegments.join_segments(self, dcl, delta: Optional[int] = None)

Joins adjacent segments so that there is no empty space between segments.

Parameters
  • dcl (DCLProject) – A DCLProject object that is connected to the DCLI file

  • delta (int) – Segments outside this range will not be joined. If None, all neighboring segments will be merged regardless of the distance. Default is None.

Returns

A DataSegments object consisting of the merged segments

Return type

DataSegments

sensiml.dclproj.datasegments.DataSegments.join_segments_df(self, delta: Optional[int] = None) pandas.core.frame.DataFrame

Joins adjacent segments so that there is no empty space between segments.

Parameters

delta (int) – Segments outside this range will not be joined. If None, all neighboring segments will be merged regardless of the distance. Default is None.

Returns

A DataFrame consisting of the segments not filtered

Return type

DataFrame

sensiml.dclproj.datasegments.DataSegments.merge_label_values(self, data_segments: dict) List

Merges label values between to data segments.

Parameters

data_segments (dict) – The datasegment object to merge label values with

Returns

The sorted union of the label values from both datasegments

Return type

List

sensiml.dclproj.datasegments.DataSegments.merge_segments(self, dcl, delta: int = 0)

Merge segments that overlap or are within delta of each other.

Parameters
  • dcl (DCLProject) – A DCLProject object that is connected to the DCLI file

  • delta (int, optional) – The distance between two nonoverlapping segments where they will still be merged. Defaults to 1.

Returns

A DataSegments object consisting of the merged segments

Return type

DataSegments

sensiml.dclproj.datasegments.DataSegments.merge_segments_df(self, delta: int = 1) pandas.core.frame.DataFrame

Merge segments that overlap or are within delta of each other.

Parameters

delta (int, optional) – The distance between two nonoverlapping segments where they will still be merged. Defaults to 1.

Returns

A DataFrame consisting of the merged segments

Return type

DataFrame

sensiml.dclproj.datasegments.DataSegments.remove_overlap(self, dcl)

Removes the overlap between segments by setting the segment start and end of overlapping segments to the same point, halfway between the overlapping edges.

Parameters

dcl (DCLProject) – A DCLProject object that is connected to the DCLI file

Returns

A DataSegments object consisting of the merged segments

Return type

DataSegments

sensiml.dclproj.datasegments.DataSegments.remove_overlap_df(self) pandas.core.frame.DataFrame

Removes the overlap between segments by setting the segment start and end of overlapping segments to the same point, halfway between the overlapping edges.

Returns

A DataFrame consisting of the segments not filtered

Return type

DataFrame

sensiml.dclproj.datasegments.DataSegments.to_audacity(self, rate: int = 16000) List

Creates multiple files with the naming convention file_{capture_name}session{session_name}.txt.

These can be imported into Audacity directly going to File->Import->Labels

Parameters

rate (int) – Audacity uses the actual time and note number of samples. Set the rate to the sample rate for the captured date. Default is 16000.

sensiml.dclproj.datasegments.DataSegments.to_dataframe(self) pandas.core.frame.DataFrame

Returns a dataframe representation of the segment information.

sensiml.dclproj.datasegments.DataSegments.to_dcli(self, filename: Optional[str] = None, session: Optional[str] = None) List

Creates a .dcli file describing the segment information that can be imported into the Data Capture Lab

Parameters
  • filename (Optional[str], optional) – The name of the file to save it to, if None no file is created.. Defaults to None.

  • session (Optional[str], optional) – The name of a session to use when creating the DCLI file. if None the session from the DataSegment objects are used.. Defaults to None.

Returns

DCLI formatted segments

Return type

List

sensiml.dclproj.datasegments.DataSegments.to_dict(self, orient: str = 'records') Dict

Returns a dictionary representation of the DataSegments object

Parameters

orient (str) – defaults to records

DataSegments Support

sensiml.dclproj.datasegments.audacity_to_datasegments(dcl, capture_name, file_path: str, session: str = '', rate: int = 16000)

Converts labels exported from Audacity into a datasegment object.

Parameters
  • dcl (DCLProject) – A DCLProject object that is connected to the DCLI file

  • capture_name (str) – The name of the capture file to import

  • file_path (DataFrame) – The file path to the Audacity Label

  • session (str, optional) – The session to set the segments too. Defaults to “”.

  • rate (int) – Audacity uses the actual time and note number of samples. Set the rate to the sample rate for the captured date. Default is 16000.

Returns

Datasegment

sensiml.dclproj.datasegments.filter_segments(segments, min_length=10000) pandas.core.frame.DataFrame

Merges data segments that are within a distance delta of each other and have the same class name.

sensiml.dclproj.datasegments.join_segments(segments, delta: Optional[int] = None) pandas.core.frame.DataFrame

If there are any gaps between two segments, this will bring them together so there are no unlabeled regions of data.

sensiml.dclproj.datasegments.merge_segments(segments: pandas.core.frame.DataFrame, delta: int = 10) pandas.core.frame.DataFrame

Merges data segments that are within a distance delta of each other and have the same class name.

Parameters
  • segments (DataFrame) – A DataFrame of segments

  • delta (int, optional) – The distance between two nonoverlapping segments where they will still be merged. Defaults to 10.

Returns

DataFrame containing the merged segments

Return type

DataFrame

sensiml.dclproj.datasegments.remove_overlap(segments) pandas.core.frame.DataFrame

Removes the overlap between segments by setting the segment start and end of overlapping segments to the same point, halfway between the overlapping edges.

sensiml.dclproj.datasegments.segment_list_to_datasegments(dcl, labels: pandas.core.frame.DataFrame, session: str = '')

Converts a DataFrame of segments into a DataSegments object

Parameters
  • dcl (DCLProject) – A DCLProject object that is connected to the DCLI file

  • labels (DataFrame) – A dataframe containing the segment information

  • session (str, optional) – The session to set the segments too. Defaults to “”.

Returns

DataSegments