Queries

In most cases, the first step to building an experiment or sandbox pipeline is to design a query. The query API is a powerful tool that selects the data you want to use to train your model. See the tutorial in Getting Started with the SensiML Python SDK for a practical introduction to queries.

Queries can be created in the Analytics Studio UI and also programmatically using the create query API.

Examples:

client.create_query('my_query', columns = ['AccelX', 'AccelY', 'AccelZ'],
                             metadata_columns = ['Subject'],
                             label_columns=['Label']
                             metadata_filter = '[Subject] IN [User001, User002]',
                             force = True)

client.pipeline.set_input_query('my_query')

Managing the query cache

# list the queries in the current project
client.list_queries()

# get a query that you have already created
q = client.get_query("<query-name>")

# check if there is a cache and how many partitions there are
print(q.cache)

# update the cache for the query from the latest information in the project
q.cache_query()

# check the status of the query
q.cache_query_status()

# stop the current query cache operation
q.cache_query_stop()
class sensiml.datamanager.queries.Queries(connection: Connection, project: Project)

Base class for a collection of queries.

build_query_list() dict

Populates the function_list property from the server.

create_query(name: str, columns: Optional[list[str]] = None, metadata_columns: Optional[list[str]] = None, metadata_filter: str = '', label_column: str = '') Query

Creates a query with the given input properties and inserts it onto the server

Parameters
  • name (str) – name of the query

  • columns (list[str]) – sensor columns to select

  • metadata_columns (list(str]) – metadata columns to select

  • metadata_filter (str) – specifies one or more metadata filter conditions

Returns

query object

get_or_create_query(name: str) Query

Calls the REST API and gets the query by name, if it doesn’t exist insert a new query

Parameters

name (str) – name of the query

Returns

query object

get_queries() list[sensiml.datamanager.query.Query]

Gets all project queries from the server and creates corresponding local query objects

Returns

list[query]

get_query_by_name(name: str, raise_exception: bool = False) Query

Retrieves a query by name from the server, if it exists

Parameters

name (str) –

Returns

query object or None

get_query_by_uuid(uuid: str) Query

Retrieves a query by name from the server, if it exists

Parameters

name (str) –

Returns

query object or None

new_query() Query

Initializes a new query for the project, but does not assign property values or insert it into the server

class sensiml.datamanager.query.Query(connection: Connection, project: Project)

Base class for a query.

Queries extract project data, or a subset of project data, for use in a pipeline. The query must specify which columns of data to extract and what filter conditions to apply.

cache_query(renderer=None)

Caches the current version of the query.

cache_query_status(renderer=None)

Gest the status of the current caching of the query

cache_query_stop(renderer=None)

Kills the job for the currently executing query

check_query_cache_up_to_date(renderer=None)

Checks if the current cached query is up to date with the current training data.

The sensor data in a query is cached when the query is built. If the segments or metadata have changed since the last time the sensor data was cached then in order for a query to use the new data it needs to be rebuilt. This API returns whether or not the sensor data has changed since the last time the query was cached.

property columns: list[str]

Sensor columns to include in the query result

Note

Columns must correspond to actual project sensor columns or the reserved word ‘SequenceID’ for the original sample index.

property combine_labels: dict

Combine label values into new value to use in the query result

Label = Gesture Label_Values = A,B,C,D,E combine_labels = {‘Group1’:[‘A’,’B’,C’],’Group2’:[‘D’,’E’]}

the labels that will be returned will be group1 and group2

property created_at: datetime

Date of the Pipeline creation

data(partition: int = 0) DataFrame

Calls the REST API for query execution and returns the result.

Note

Intended for previewing the query result before creating a query call object and using it in a sandbox step. The resulting DataFrame is not cached on the server, but once it is used in a sandbox it may be cached.

delete(renderer=None) Response

Calls the REST API and deletes the query object from the server.

get_feature_statistics()

Returns metadata statistics for the query.

get_statistics_summary(renderer=None)

Returns metadata statistics for the query.

initialize_from_dict(data)

Reads a json dict and populates a single query.

insert(renderer=None) Response

Calls the REST API and inserts a new query.

property label_column: str

Label columns to use in the query result

Note

Columns must correspond to actual project label column

property metadata_columns: list[str]

Metadata columns to include in the query result

Note

Columns must correspond to actual project metadata columns.

property metadata_filter: str

Filter criteria of the query

Parameters

value (str) – similar to a SQL WHERE clause, the string can contain any number of AND-concatenated expressions where square brackets surround the column name and comparison value, with the operator in between. Supported operators: >, >=, <, <=, =, !=, IN

Examples:

metadata_filter = '[Subject] > [5] AND [Subject] <= [15]'
metadata_filter = '[Gender] = [Female] AND [Activity] != [Walking]'
metadata_filter = '[Subject] IN [5, 7, 9, 11, 13, 15]'

Note

Queries do not support OR-concatenation between expressions, but often the IN operator can be used to achieve OR-like functionality on a single column. For example:

[Gesture] IN [A, M, L]

is equivalent to:

[Gesture] = [A] OR [Gesture] = [M] OR [Gesture] = [L]
property name: str

Name of the query

plot_statistics(renderer=None, **kwargs)

Generates a bar plot of the query statistics

post_feature_statistics(window_size: Optional[int] = None)

Returns metadata statistics for the query.

refresh()

Calls the REST API and self populate using the uuid.

property segmenter: int

Segmenter to use for the query

Parameters

value (int) – ID of segmenter.

size()

Returns the size of the dataframe which would result from the query.

statistics_segments(renderer=None) DataFrame

Returns metadata statistics for the query.

property summary_statistics: dict

Name of the query

update(renderer=None) Response

Calls the REST API and updates the query object on the server.