SensiML Blog: SensiML to Open-Source Analytics Studio AutoML Engine

At SensiML^™ we are taking a bold step down the open-source path by announcing plans to contribute our core IP, SensiML Analytics Studio, as the foundation for a new open-source community collaboration project. Analytics Studio is our server-based AutoML engine that rapidly generates sensor-based inference models from user-supplied ML datasets and optimizes the resulting embedded code for IoT edge devices to create TinyML^®models. In addition to automating and speeding up the model-building process, the AutoML capability in Analytics Studio allows users of all data science skill levels to successfully create accurate sensor inference code for their bespoke IoT device applications.

SensiML Open Sources Analytics Studio AutoML

Analytics Studio has evolved over many years as a proprietary application and cloud SaaS service supporting a broad range of target endpoint devices from multiple hardware vendors. Focused on time-series sensors, SensiML’s Analytics Studio can quickly create self-standing C code suitable for a variety of applications such as:

With this announcement, we will make the core engine of Analytics Studio available under an open-source license option that users can freely download and implement for private server implementations of the core technology we offer – and will continue offering – in SensiML’s managed and supported SaaS cloud service.

Why is SensiML Open-Sourcing its Core Software?

When SensiML first introduced our open-source initiative several years back, we focused primarily on providing greater transparency for our sensor data interface protocols and resulting IoT edge inference models. After hearing customers express concerns about maintaining and supporting products containing algorithms they didn’t explicitly program, we concluded this was a necessary step towards better model transparency and explainability. Addressing the ‘black-box’ issue of AI / ML was thus our first objective in open-sourcing and resulted in our SensiML Embedded SDK and data protocols being published and made available to anyone in full source form while maintaining our development tools themselves as proprietary software.

Fast forward to today and we see additional TinyML ecosystem challenges and corresponding open-source opportunities that have inspired us to expand our open-source initiative to include our core AutoML engine Analytics Studio. We have come to believe the existence of a vibrant open-source software tools community is critical to moving the TinyML ecosystem forward and are therefore taking a leadership position to offer SensiML’s time-proven codebase as the first such foundation for collaborative open-source innovation.

According to a 2023 OSS survey conducted by The Linux Foundation, AI / Machine Learning was listed by worldwide IT managers as the most valuable open-source technology to the future of their industry.

The Linux Foundation OSS IT Survey 2023

When we consider leading open-source AI / ML projects and technologies, much of what exists are frameworks, libraries, and model definition formats (e.g. TensorFlow, PyTorch, Scikit-learn, OpenCV, and ONNX to name a few) but not complete end-to-end toolchains much less those focused on the intersecting complexity of AI / ML and embedded IoT inference code optimization. SensiML believes this is a key gap and an opportunity to help accelerate the adoption and democratization of the complex steps required by developers not steeped in data science.

Many IT managers choose open-source software over proprietary options for several common reasons. These reasons also guided our decision to make Analytics Studio open-source:

Innovation and Agility: The collaborative nature of open-source projects can accelerate innovation, as developers around the world contribute new features and functionalities. This can help open-source software and enterprises that use it stay competitive by rapidly adopting new technologies and approaches.

Avoidance of Vendor Lock-in: Open source provides an alternative to being locked into a specific vendor’s ecosystem and product roadmap. Enterprises can choose from a variety of support options and are not dependent on a single vendor for updates, improvements, or security patches.

Community and Support: Open source projects typically have active communities that offer responsive support networks, extensive documentation, and a wealth of shared knowledge. Users can leverage these communities for troubleshooting, advice, and best practices.

Quality and Security: Many open-source projects are developed in a highly transparent, peer-reviewed environment that can lead to high-quality and secure software solutions. The ability for anyone to examine and improve the code helps identify and fix security vulnerabilities quickly.

Talent Attraction and Retention: Many developers prefer working with open-source technologies. Adopting open-source software can help enterprises attract and retain talent, particularly those who value transparency and community involvement in their work

Strategic Advantage: Finally, using open source software can provide strategic advantages, such as fostering an innovative corporate culture, aligning with industry trends towards open technologies, and participating in broader digital transformation initiatives.

Relating Open-Source Benefits to the TinyML Ecosystem

The OSS tenants above are generally well-understood across open-source adopters but are also somewhat abstract. To put these benefits into context for specific challenges faced by the TinyML ecosystem, let’s delve a bit deeper into a couple of these and examine how they relate specifically to problems faced by current TinyML adopters.

Challenge #1 – The dataset bottleneck unique to TinyML sensor applications: The use of deep learning techniques to create accurate predictive models relies on the availability of sufficient model training data to cover the sources and ranges of variance that can be expected in actual use. Such training dataset requirements can thus be quite large. Well-known extreme cases are large language models (LLMs) with trillions of model parameters, hundreds of thousands of GPU training hours, and training datasets that approach the total available human text available from the internet.

TinyML models involve much smaller training datasets, but the nature of sensor-derived input data makes the dataset challenge arguably a more intractable problem than for LLMs. While LLMs are enormously large in scale, they at least benefit from a scalable data source of human language text acquired through the readily automated scraping of texts, documents, and Wiki pages off the internet. For sensor applications, there is typically no such equivalent readily scalable data source.

TinyML Sensor Dataset Bottleneck Imagine crawling the web for sufficient raw sensor data needed to predict large-frame DC motor failure states for specific motor loads and from location-dependent vibration sensor inputs and microphones as dictated by your actual use case requirements. You almost certainly won’t find data appropriate to the needs of a given application without resorting to devising experiments of your own.

This dataset bottleneck problem spans the majority of use cases within the TinyML realm. It demands that developers invest substantial time, effort, and cost to collect empirical data specific to their desired use case. They must do so in sufficient quantity and over a diverse enough set of conditions to effectively train the model for the full range of conditions that could be expected in actual use. In our motor example, a large multinational motor manufacturer may possess or have the means to produce enough data to develop robust models, but smaller companies and innovators lacking such resources are limited to simpler models. The result is constrained user adoption for TinyML due to the high adoption barrier of acquiring train/test data for many such applications.

How Open-Source TinyML Tools Can Help: Current active research into reducing the training dataset bottleneck shows promise and includes techniques such as transfer learning, data augmentation, synthetic data generation from simulations and GANs, semi-supervised learning, and model compression. Such methods are evolving rapidly and effective approaches differ across the many use cases encompassed within TinyML. As an example, data augmentation for image recognition would typically involve rotations, translations, scaling, or chromatic shifts whereas audio data would involve a completely different set of transforms for pitch, timbre, cadence, and noise superposition. Faced with the pace of rapidly changing state-of-the-art methods and approaches that differ widely by application, the need for open-source community-based collaboration is critical. The open-source development model brings scale and diversity of insight to the problem in a way that simply cannot be matched by closed development teams. By opening up a common TinyML development platform for community contribution and improvement, SensiML believes the ecosystem can benefit much faster from the collective efforts to overcome the dataset bottleneck.

Challenge #2 – TinyML software tools fragmentation and lock-in: Over the past several years we’ve witnessed many of our AutoML dev tool competitors being acquired by hardware vendors seeking to lock users into their silicon offerings by creating high switching costs associated with a captive ML development tool. While that motivation is understandable from the silicon vendor’s point of view, the resulting fragmented ecosystem is far from ideal from the IoT developer’s standpoint. Want toolkit X but need to use silicon Y for other design or business reasons? With these captive solutions, users are faced with difficult choices between software tool functionality and hardware selection criteria such as datasheet specs, cost, and second-source alternatives. When the two goals conflict, the all-too-common result is that IoT developers will simply push out planned ML features until ML tool maturity and feature support exists for the specific required hardware and application needs.

How Open-Source TinyML Tools Can Help: Rather than being tied to the offerings of select hardware vendors, SensiML believes that providing TinyML implementers with choice and flexibility better serves users’ needs. This flexibility can even be seen as a strategic decision by preserving value for invested efforts in developing ML tool skills and datasets that can be ported across hardware and specific tool implementations. By contributing a baseline AutoML toolchain to open-source, SensiML envisions the potential for a de facto open and flexible platform in much the same way that Eclipse serves as a common IDE technology behind both many vendor-specific implementations as well as that maintained by the Eclipse Foundation itself.

How does this news impact SensiML’s future business plans?

Our primary motivation for open-sourcing SensiML’s core AutoML application is to benefit from the faster pace of innovation that comes with the collaborative open-source development model. Beyond code contributions, this includes enhanced code quality, integrations with new hardware, additional pre-trained model templates, example applications, improved documentation, QA testing, and bug submissions.

In parallel, SensiML will continue to offer its existing managed cloud SaaS service plans and provide user consultation and custom engineering services for TinyML model development to customers who desire an enhanced level of support. Similar to RedHat’s model for Linux, SensiML will continue to offer a traditional enterprise license option under a dual licensing strategy. We believe there is sufficient value in SensiML’s support, the full backing of the technology usage, complementary offerings, and cloud service management to serve a significant segment of the user base while at the same time providing a free open-source alternative for those inclined to implement the tools themselves.

I share that vision, how can I get involved?

In the coming weeks, SensiML will provide updates on the rollout of our open-source project, GitHub repo, and OSS project website planned for early this summer.

Those interested in getting involved (either as a user or a contributor) can receive updates on our project launch progress, provide feedback on which advancements they deem most important, and gain early access to the codebase before our general release date. To sign up, click the button below and submit your contact information.

We hope you find this news as exciting and potentially impactful as we do. Only through the collective interest of the developer and user community will such an open-source project for TinyML tooling grow to benefit all those involved!