Queries

In most cases, the first step to building an experiment or sandbox pipeline is to design a query. The query API is a powerful tool that selects the data you want to use to train your model. See the tutorial in Getting Started with the SensiML Python SDK for a practical introduction to queries.

Queries can be created in the Analytics Studio UI and also programmatically using the create query API.

Examples:

client.create_query('my_query', columns = ['AccelX', 'AccelY', 'AccelZ'],
                             metadata_columns = ['Subject'],
                             label_columns=['Label']
                             metadata_filter = '[Subject] IN [User001, User002]',
                             force = True)

client.pipeline.set_input_query('my_query')

Managing the query cache

# list the queries in the current project
client.list_queries()

# get a query that you have already created
q = client.get_query("<query-name>")

# check if there is a cache and how many partitions there are
print(q.cache)

# update the cache for the query from the latest information in the project
q.cache_query()

# check the status of the query
q.cache_query_status()

# stop the current query cache operation
q.cache_query_stop()

class sensiml.datamanager.queries.Queries(connection: Connection, project: Project)

Base class for a collection of queries.

build_query_list() → dict: Populates the function_list property from the server.

create_query(name: str, columns: Optional[list[str]] = None, metadata_columns: Optional[list[str]] = None, metadata_filter: str = '', label_column: str = '') → Query

Creates a query with the given input properties and inserts it onto the server

Parameters

name (str) – name of the query
columns (list[str]) – sensor columns to select
metadata_columns (list(str]) – metadata columns to select
metadata_filter (str) – specifies one or more metadata filter conditions

Returns

query object

get_or_create_query(name: str) → Query

Calls the REST API and gets the query by name, if it doesn’t exist insert a new query

Parameters: name (str) – name of the query
Returns: query object

get_queries() → list[sensiml.datamanager.query.Query]

Gets all project queries from the server and creates corresponding local query objects

Returns: list[query]

get_query_by_name(name: str, raise_exception: bool = False) → Query

Retrieves a query by name from the server, if it exists

Parameters: name (str) –
Returns: query object or None

get_query_by_uuid(uuid: str) → Query

Retrieves a query by name from the server, if it exists

Parameters: name (str) –
Returns: query object or None

new_query() → Query: Initializes a new query for the project, but does not assign property values or insert it into the server

class sensiml.datamanager.query.Query(connection: Connection, project: Project)

Base class for a query.

Queries extract project data, or a subset of project data, for use in a pipeline. The query must specify which columns of data to extract and what filter conditions to apply.

cache_query(renderer=None): Caches the current version of the query.

cache_query_status(renderer=None): Gest the status of the current caching of the query

cache_query_stop(renderer=None): Kills the job for the currently executing query

check_query_cache_up_to_date(renderer=None)

Checks if the current cached query is up to date with the current training data.

The sensor data in a query is cached when the query is built. If the segments or metadata have changed since the last time the sensor data was cached then in order for a query to use the new data it needs to be rebuilt. This API returns whether or not the sensor data has changed since the last time the query was cached.

property columns: list[str]: Sensor columns to include in the query result

Note

Columns must correspond to actual project sensor columns or the reserved word ‘SequenceID’ for the original sample index.

property combine_labels: dict

Combine label values into new value to use in the query result

Label = Gesture Label_Values = A,B,C,D,E combine_labels = {‘Group1’:[‘A’,’B’,C’],’Group2’:[‘D’,’E’]}

the labels that will be returned will be group1 and group2

property created_at: datetime: Date of the Pipeline creation

data(partition: int = 0) → DataFrame: Calls the REST API for query execution and returns the result.

Note

Intended for previewing the query result before creating a query call object and using it in a sandbox step. The resulting DataFrame is not cached on the server, but once it is used in a sandbox it may be cached.

delete(renderer=None) → Response: Calls the REST API and deletes the query object from the server.

get_feature_statistics(): Returns metadata statistics for the query.

get_statistics_summary(renderer=None): Returns metadata statistics for the query.

initialize_from_dict(data): Reads a json dict and populates a single query.

insert(renderer=None) → Response: Calls the REST API and inserts a new query.

property label_column: str: Label columns to use in the query result

Note

Columns must correspond to actual project label column

property metadata_columns: list[str]: Metadata columns to include in the query result

Note

Columns must correspond to actual project metadata columns.

property metadata_filter: str

Filter criteria of the query

Parameters: value (str) – similar to a SQL WHERE clause, the string can contain any number of AND-concatenated expressions where square brackets surround the column name and comparison value, with the operator in between. Supported operators: >, >=, <, <=, =, !=, IN

Examples:

metadata_filter = '[Subject] > [5] AND [Subject] <= [15]'
metadata_filter = '[Gender] = [Female] AND [Activity] != [Walking]'
metadata_filter = '[Subject] IN [5, 7, 9, 11, 13, 15]'

Note

Queries do not support OR-concatenation between expressions, but often the IN operator can be used to achieve OR-like functionality on a single column. For example:

[Gesture] IN [A, M, L]

is equivalent to:

[Gesture] = [A] OR [Gesture] = [M] OR [Gesture] = [L]

property name: str: Name of the query

plot_statistics(renderer=None, **kwargs): Generates a bar plot of the query statistics

post_feature_statistics(window_size: Optional[int] = None): Returns metadata statistics for the query.

refresh(): Calls the REST API and self populate using the uuid.

property segmenter: int

Segmenter to use for the query

Parameters: value (int) – ID of segmenter.

size(): Returns the size of the dataframe which would result from the query.

statistics_segments(renderer=None) → DataFrame: Returns metadata statistics for the query.

property summary_statistics: dict: Name of the query

update(renderer=None) → Response: Calls the REST API and updates the query object on the server.