Guitar Note Audio Recognition
In this tutorial, we are going to build a model to classify guitar tuning notes that can run entirely on a microcontroller using the SensiML Analytics Toolkit. This tutorial will provide you with the knowledge to build an audio recognition model. You will learn how to
Collect and annotate audio data
Applying signal preprocessing
Train a classification algorithm
Create firmware optimized for the resource budget of an edge device
What you need to get started
We will use the SensiML Analytics Toolkit to handle collecting and annotating sensor data, creating a sensor preprocessing pipeline, and generating the firmware. Before you start, sign up for SensiML Community Edition to get access to the SensiML Analytics Toolkit.
If you are using one of our supported platforms, you can find the instructions for getting the firmware
Collecting and Annotating Sensor Data
Building real-world Edge AI applications requires high-quality annotated data. The process can be expensive, time-consuming, and open-ended (how much data is enough?). Further, applying the correct annotations to time series data is complicated without a clear data collection protocol and tooling.
The SensiML Data Capture Lab makes it easy to collect, annotate and explore their time-series sensor data. We will use the Data Capture Lab to collect and annotate audio data in this tutorial.
Download the Guitar Tuning Classification project and import the template project to your account using the Data Capture Lab.
This project includes some labeled audio files. You can view the audio files by opening them in Project Explorer.
To collect new data, click the Switch Modes button and select Capture.
Select a sensor configuration for your device and then you can immediately start collecting audio data. Next, we are going to connect the board over USB Serial. Then click the connection settings to set the COM port to connect over.
Click scan and select the COM port assigned by your computer for your board.
The Data Capture Lab will connect to the board and stream audio data.
For this demo, we used this YouTube video https://www.youtube.com/watch?v=DlMrl3EQ1bs to record the audio from our speakers. Begin playing the video through your speakers and click the Start Recording button in the data capture lab to capture the audio data from the microphone.
Click the Stop Recording button to finish the recording. Review the confirmation screen and update any information, then click Save
Go to the Project Explorer and open the newly captured file. You can add a Segment by right-clicking and dragging around the area of interest. Then you can apply a label by selecting clicking ctrl+e or clicking on the edit label button under the segment explorer. Once you have labeled the file, click Save.
You can label more than one segment at the same time. Select multiple segments in the Segment Explorer, click Ctrl + E, and select the label.
For more instructions on the capabilities, see the Data Capture Lab documentation https://sensiml.com/documentation/data-capture-lab/index.html
Building a Model
Now we are going to build a model to classify the guitar notes. To build the model we will use the SensiML Analytics Studio. Go to https://app.sensiml.com and sign in. Then open the project Guitar Tuning Classification project by clicking on the icon .
Once the project opens, you will see the overview screen. Here, you can see a high-level view of this project. You can also add notes (with markdown formatting) to the project description.
Create a Query
We will create a query to select the training data for our machine learning pipeline. Click on the Prepare Data tab on the left to start building a query.
To create a query
Click the Add New Query button
Set the fields to match the image below
Click the Save button.
You can build the cache for this query by clicking the build cache button at the top. If you don’t create the cache now, it will build during the pipeline creation. The cache will not change until you rebuild, even if you change the project data. You can rebuild the cache at the Project Summary -> Query Tab.
Create a Pipeline
Now we will build a pipeline that will train a machine learning model on the training data. Click on the Build Model tab on the left. Then click on the Create New Pipeline button under the Create New Pipeline card.
For this tutorial, we will use TensorFlow to build a neural network. To do that,
Click disable SensiML AutoML toggle
Select the box for TensorFlow Lite for Microcontrollers
Enter a pipeline name.
This creates a template pipeline that we can edit.
The first thing this screen will ask is that you select a query. Select the Query that you just created in the prepare data screen.
Next, the Segmenter screen will open. We will select the sliding window segmentation algorithm. Set window size to 400 and set slide to 400, then click the Save button at the bottom of the screen.
The next step is to add a filter and some feature extractors to the Pipeline. Click the Advanced Pipeline Settings toggle at the top of the pipeline, which will expand out all of the steps.
Select Segment Filter
Select Segment Energy Threshold Filter
Set the Threshold to 275
This Transform will filter out segments not above a specific threshold, preventing classification from running when the sounds are not loud enough.
Based on the ambient noise and your device sensitivity, you might need to try a different threshold level. Typically, the adopted threshold should be slightly larger than the maximum amplitude of the signal in regions outside the labeled area.
Next, click the edit icon on the Feature Generator. We want to remove all of the features here and only add the MFCC. To do that
Uncheck all of the boxes
Click the Clear Unselected button
Expand the Frequency Feature generators
Check the MFCC box
Click the Add button
Click the Save button
Your pipeline should now have the following steps.
Click the optimize button at the bottom of the pipeline to start training the machine learning model.
You will see some status messages printed in the LOGS on the right side. These can be used to see where the pipeline is in the building process.
Intermediate pipeline steps are stored in a cache. If you change any pipeline step parameters, the pipeline will start from cached values and only steps after your changes will run.
This button will open up the Model Explore tab. The Model Explore tab has information about how the model performed on the training and validation data sets. In this case, the trained model had good accuracy on the cross-fold validation. The final model, however, performed poorly. We will retrain this model with modifications to the pipeline to get better results.
Go back to the Build Model tab to train a new model. Instead of just retraining, we will increase the duration of time that the model uses. The current pipeline only used a window with 400 samples, a small fraction of the signal. We will create a spectrogram to look at a longer fraction of the audio signal. To do that, add a Feature Cascade step between the Min-Max Scale and the Classifier steps.
To add the Feature Cascade step to the pipeline
Click the + button
Select Feature Transform
Click the +Create button
Select Feature Cascade
Set Num Cascades to 2
Set Slide to enabled
Click the Save button
Setting Num Cascades to 2 feeds data from 800 samples into the classifier. You can calculate this as Num Samples = Window Size x Num Cascades. The features from each segment window are placed into a feature bank. Features banks are stacked together before being fed to the classifier. With Slide enabled, the feature banks will act as a circular buffer, where the last one will be dropped when a new one is added, and classification will occur on every new Feature Bank.
Now that we have made that change, the modified pipeline should look like this.
Go ahead and rerun the model training by clicking the Optimize button. You can continue tuning the parameters until you are satisfied with the model.
Model Validation and Testing
Offline Model Testing
Next, go to the Test Model tab and select the file with metadata Test in Set. This file was excluded from our training data by the Query we created. Then click the Compute Accuracy button to see how the model performs on this capture.
The results of the test will be displayed below. This model performs reasonably well on our test data, and it is worth running it on live data to see how it performs on the device.
You can also run it against multiple files simultaneously and see the combined confusion matrix.
When we run on the device, we will add a post-processing filter that performs majority voting over N classifications. The post-processing filter will remove noise from the classification, improving the overall accuracy.
Real-Time Live Testing
Before building the firmware, we can use the Data Capture Lab to see the results in real-time. In the Data Capture Lab, switch back to Capture mode and connect to your board for data streaming again. After that, click on Test Model in the upper right.
Then select the change Knowledge Pack from the dropdown.
Select the Knowledge Pack you just trained from the table and click the Next button.
Choose the TestModel Session and click Next.
After that, you can connect to the Knowledge Pack.
Now play the YouTube video, and the Data Capture Lab will run the model against the live stream data. The model classifications results are added to the graph in real-time.
If you are happy with the performance, it is time to put the model onto the device and test its performance in real-time.
Download/Flash Model Firmware
In the Analytics Studio, select your HW platform and download the Knowledge Pack Library.
You can find instructions for flashing the knowledge pack to your specific device here.
Real-Time On Device Inference
If you have the Open Gateway installed, start it up, and select the recognition radio button and connection type serial. Scan for the correct COM port and set the baud rate to 1000000. Then Connect to the device.
Switch to Test mode, click the Upload Model JSON button and select the model.json file from the Knowledge Pack.
Set the Post Processing Buffer slider to 6 and click the Start Stream button. Then you can play the video and see the model classification from the device in real-time.