Guitar Note Audio Recognition

In this tutorial, we are going to build a model to classify guitar tuning notes that can run entirely on a microcontroller using the SensiML Analytics Toolkit. This tutorial will provide you with the knowledge to build an audio recognition model. You will learn how to

  1. Collect and annotate audio data

  2. Applying signal preprocessing

  3. Train a classification algorithm

  4. Create firmware optimized for the resource budget of an edge device

What you need to get started

We will use the SensiML Analytics Toolkit to handle collecting and annotating sensor data, creating a sensor preprocessing pipeline, and generating the firmware. Before you start, sign up for SensiML Community Edition to get access to the SensiML Analytics Toolkit.

The Software

Hardware

Firmware

  • If you are using one of our supported platforms, you can find the instructions for getting the firmware

Collecting and Annotating Sensor Data

Building real-world Edge AI applications requires high-quality annotated data. The process can be expensive, time-consuming, and open-ended (how much data is enough?). Further, applying the correct annotations to time series data is complicated without a clear data collection protocol and tooling.

The SensiML Data Capture Lab makes it easy to collect, annotate and explore their time-series sensor data. We will use the Data Capture Lab to collect and annotate audio data in this tutorial.

Upload Project

Download the Guitar Tuning Classification project and upload the template project to your account using the Data Capture Lab.

../_images/image20.png

This project includes some labeled audio files. You can view the audio files by opening them in Project Explorer.

../_images/image26.png

Collect Data

To collect new data, click the Switch Modes button and select Capture.

../_images/image45.png

Select a sensor configuration for your device and then you can immediately start collecting audio data. Next, we are going to connect the board over USB Serial. Then click the connection settings to set the COM port to connect over.

../_images/dcl-connection-settings.png

Click scan and select the COM port assigned by your computer for your board.

../_images/image40.png

The Data Capture Lab will connect to the board and stream audio data.

../_images/image28.png

For this demo, we used this YouTube video https://www.youtube.com/watch?v=DlMrl3EQ1bs to record the audio from our speakers. Begin playing the video through your speakers and click the Start Recording button in the data capture lab to capture the audio data from the microphone.

../_images/image29.png

Click the Stop Recording button to finish the recording. Review the confirmation screen and update any information, then click Save

../_images/image3.png

Annotate Data

Go to the Project Explorer and open the newly captured file. You can add a Segment by right-clicking and dragging around the area of interest. Then you can apply a label by selecting clicking ctrl+e or clicking on the edit label button under the segment explorer. Once you have labeled the file, click Save.

Note

You can label more than one segment at the same time. Select multiple segments in the Segment Explorer, click Ctrl + E, and select the label.

../_images/image51.png

For more instructions on the capabilities, see the Data Capture Lab documentation https://sensiml.com/documentation/data-capture-lab/index.html

Building a Model

Now we are going to build a model to classify the guitar notes. To build the model we will use the SensiML Analytics Studio. Go to https://app.sensiml.com and sign in. Then open the project Guitar Tuning Classification project by clicking on the icon .image9

../_images/image34.png

Once the project opens, you will see the overview screen. Here, you can see a high-level view of this project. You can also add notes (with markdown formatting) to the project description.

../_images/image52.png

Create a Query

We will create a query to select the training data for our machine learning pipeline. Click on the Prepare Data tab on the left to start building a query.

To create a query

  1. Click the Add New Query button

  2. Set the fields to match the image below

  3. Click the Save button.

../_images/image35.png

Note

You can build the cache for this query by clicking the build cache button at the top. If you don’t create the cache now, it will build during the pipeline creation. The cache will not change until you rebuild, even if you change the project data. You can rebuild the cache at the Project Summary -> Query Tab.

Create a Pipeline

Now we will build a pipeline that will train a machine learning model on the training data. Click on the Build Model tab on the left. Then click on the Create New Pipeline button under the Create New Pipeline card.

../_images/analytics-studio-pipeline-create1.png

For this tutorial, we will use TensorFlow to build a neural network. To do that,

  1. Click disable SensiML AutoML toggle

  2. Select the box for TensorFlow Lite for Microcontrollers

  3. Enter a pipeline name.

  4. Click Build

This creates a template pipeline that we can edit.

../_images/image2.png

The first thing this screen will ask is that you select a query. Select the Query that you just created in the prepare data screen.

../_images/image44.png

Next, the Segmenter screen will open. We will select the sliding window segmentation algorithm. Set window size to 400 and set slide to 400, then click the Save button at the bottom of the screen.

../_images/image46.png

The next step is to add a filter and some feature extractors to the Pipeline. Click the Advanced Pipeline Settings toggle at the top of the pipeline, which will expand out all of the steps.

../_images/image10.png

We will go ahead and remove the Strip transform that is there and replace it with an Energy Threshold Filter. To do that, click the trash icon on the Strip card.image18

../_images/image37.png

To add the Energy Filter Function, click the image20icon between the Segmenter and Feature Generator.

../_images/image4.png

Then

  1. Select Segment Filter

  2. Click +Create

  3. Select Segment Energy Threshold Filter

  4. Set the Threshold to 275

  5. Click Save

This Transform will filter out segments not above a specific threshold, preventing classification from running when the sounds are not loud enough.

Next, click the edit icon on the Feature Generator. We want to remove all of the features here and only add the MFCC. To do that

  1. Uncheck all of the boxes

  2. Click the Clear Unselected button

../_images/image30.png
  1. Click on the +Feature Generators button at the topimage23

  2. Expand the Frequency Feature generators

  3. Check the MFCC box

../_images/image39.png
  1. Click the Add button

  2. Click the Save button

Next, remove the Isolation Forest Filtering Step and the Feature Selector step in the pipeline by clicking the icons.image25

Your pipeline should now have the following steps.

../_images/image13.png

Click the optimize button at the bottom of the pipeline to start training the machine learning model.

../_images/image31.png

You will see some status messages printed in the LOGS on the right side. These can be used to see where the pipeline is in the building process.

../_images/image18.png

Once the pipeline completes, you will see a model pop up in the Results tab. You can click on the icon to see more detailed information about the model.image29

Note

Intermediate pipeline steps are stored in a cache. If you change any pipeline step parameters, the pipeline will start from cached values and only steps after your changes will run.

../_images/image6.png

Explore Model

This button will open up the Model Explore tab. The Model Explore tab has information about how the model performed on the training and validation data sets. In this case, the trained model had good accuracy on the cross-fold validation. The final model, however, performed poorly. We will retrain this model with modifications to the pipeline to get better results.

../_images/image11.png

Retrain Model

Go back to the Build Model tab to train a new model. Instead of just retraining, we will increase the duration of time that the model uses. The current pipeline only used a window with 400 samples, a small fraction of the signal. We will create a spectrogram to look at a longer fraction of the audio signal. To do that, add a Feature Cascade step between the Min-Max Scale and the Classifier steps.

../_images/image1.png

To add the Feature Cascade step to the pipeline

  1. Click the + button

  2. Select Feature Transform

../_images/image23.png
  1. Click the +Create button

  2. Select Feature Cascade

  3. Set Num Cascades to 2

  4. Set Slide to enabled

../_images/image14.png
  1. Click the Save button

Setting Num Cascades to 2 feeds data from 800 samples into the classifier. You can calculate this as Num Samples = Window Size x Num Cascades. The features from each segment window are placed into a feature bank. Features banks are stacked together before being fed to the classifier. With Slide enabled, the feature banks will act as a circular buffer, where the last one will be dropped when a new one is added, and classification will occur on every new Feature Bank.

Now that we have made that change, the modified pipeline should look like this.

../_images/image21.png

Go ahead and rerun the model training by clicking the Optimize button. You can continue tuning the parameters until you are satisfied with the model.

Model Validation and Testing

Offline Model Testing

Next, go to the Test Model tab and select the file with metadata Test in Set. This file was excluded from our training data by the Query we created. Then click the Compute Accuracy button to see how the model performs on this capture.

../_images/image8.png

The results of the test will be displayed below. This model performs reasonably well on our test data, and it is worth running it on live data to see how it performs on the device.

Note

You can also run it against multiple files simultaneously and see the combined confusion matrix.

Note

When we run on the device, we will add a post-processing filter that performs majority voting over N classifications. The post-processing filter will remove noise from the classification, improving the overall accuracy.

../_images/image48.png

Real-Time Live Testing

Before building the firmware, we can use the Data Capture Lab to see the results in real-time. In the Data Capture Lab, switch back to Capture mode and connect to your board for data streaming again. After that, click on Test Model in the upper right.

../_images/image9.png

Then select the change Knowledge Pack from the dropdown.

../_images/image5.png

Select the Knowledge Pack you just trained from the table and click the Next button.

../_images/image38.png

Choose the TestModel Session and click Next.

../_images/image7.png

After that, you can connect to the Knowledge Pack.

image42image43

Now play the YouTube video, and the Data Capture Lab will run the model against the live stream data. The model classifications results are added to the graph in real-time.

../_images/image16.png

If you are happy with the performance, it is time to put the model onto the device and test its performance in real-time.

Download/Flash Model Firmware

In the Analytics Studio, select your HW platform and download the Knowledge Pack Library.

../_images/download-kp-generic.png

You can find instructions for flashing the knowledge pack to your specific device here.

Real-Time On Device Inference

To see classification results use a terminal emulator such as Tera Term or the SensiML Open Gateway. For additional documentation see running a model on your embedded device.

If you have the Open Gateway installed, start it up, and select the recognition radio button and connection type serial. Scan for the correct COM port and set the baud rate to 1000000. Then Connect to the device.

../_images/image32.png

Switch to Test mode, click the Upload Model JSON button and select the model.json file from the Knowledge Pack.

Set the Post Processing Buffer slider to 6 and click the Start Stream button. Then you can play the video and see the model classification from the device in real-time.

../_images/image25.png