Queries
In most cases, the first step to building an experiment or sandbox pipeline is to design a query. The query API is a powerful tool that selects the data you want to use to train your model. See the tutorial in Getting Started with the SensiML Python SDK for a practical introduction to queries.
Queries can be created in the Analytics Studio UI and also programmatically using the create query API.
Examples:
client.create_query('my_query', columns = ['AccelX', 'AccelY', 'AccelZ'],
metadata_columns = ['Subject'],
label_columns=['Label']
metadata_filter = '[Subject] IN [User001, User002]',
force = True)
client.pipeline.set_input_query('my_query')
Managing the query cache
# list the queries in the current project
client.list_queries()
# get a query that you have already created
q = client.get_query("<query-name>")
# check if there is a cache and how many partitions there are
print(q.cache)
# update the cache for the query from the latest information in the project
q.cache_query()
# check the status of the query
q.cache_query_status()
# stop the current query cache operation
q.cache_query_stop()
- class sensiml.datamanager.queries.Queries(connection: Connection, project: Project)
Base class for a collection of queries.
- build_query_list() dict
Populates the function_list property from the server.
- create_query(name: str, columns: Optional[list[str]] = None, metadata_columns: Optional[list[str]] = None, metadata_filter: str = '', label_column: str = '') Query
Creates a query with the given input properties and inserts it onto the server
- Parameters
name (str) – name of the query
columns (list[str]) – sensor columns to select
metadata_columns (list(str]) – metadata columns to select
metadata_filter (str) – specifies one or more metadata filter conditions
- Returns
query object
- get_or_create_query(name: str) Query
Calls the REST API and gets the query by name, if it doesn’t exist insert a new query
- Parameters
name (str) – name of the query
- Returns
query object
- get_queries() list[sensiml.datamanager.query.Query]
Gets all project queries from the server and creates corresponding local query objects
- Returns
list[query]
- get_query_by_name(name: str, raise_exception: bool = False) Query
Retrieves a query by name from the server, if it exists
- Parameters
name (str) –
- Returns
query object or None
- class sensiml.datamanager.query.Query(connection: Connection, project: Project)
Base class for a query.
Queries extract project data, or a subset of project data, for use in a pipeline. The query must specify which columns of data to extract and what filter conditions to apply.
- cache_query(renderer=None)
Caches the current version of the query.
- cache_query_status(renderer=None)
Gest the status of the current caching of the query
- cache_query_stop(renderer=None)
Kills the job for the currently executing query
- check_query_cache_up_to_date(renderer=None)
Checks if the current cached query is up to date with the current training data.
The sensor data in a query is cached when the query is built. If the segments or metadata have changed since the last time the sensor data was cached then in order for a query to use the new data it needs to be rebuilt. This API returns whether or not the sensor data has changed since the last time the query was cached.
- property columns: list[str]
Sensor columns to include in the query result
Note
Columns must correspond to actual project sensor columns or the reserved word ‘SequenceID’ for the original sample index.
- property combine_labels: dict
Combine label values into new value to use in the query result
Label = Gesture Label_Values = A,B,C,D,E combine_labels = {‘Group1’:[‘A’,’B’,C’],’Group2’:[‘D’,’E’]}
the labels that will be returned will be group1 and group2
- property created_at: datetime
Date of the Pipeline creation
- data(partition: int = 0) DataFrame
Calls the REST API for query execution and returns the result.
Note
Intended for previewing the query result before creating a query call object and using it in a sandbox step. The resulting DataFrame is not cached on the server, but once it is used in a sandbox it may be cached.
- delete(renderer=None) Response
Calls the REST API and deletes the query object from the server.
- get_feature_statistics()
Returns metadata statistics for the query.
- get_statistics_summary(renderer=None)
Returns metadata statistics for the query.
- initialize_from_dict(data)
Reads a json dict and populates a single query.
- insert(renderer=None) Response
Calls the REST API and inserts a new query.
- property label_column: str
Label columns to use in the query result
Note
Columns must correspond to actual project label column
- property metadata_columns: list[str]
Metadata columns to include in the query result
Note
Columns must correspond to actual project metadata columns.
- property metadata_filter: str
Filter criteria of the query
- Parameters
value (str) – similar to a SQL WHERE clause, the string can contain any number of AND-concatenated expressions where square brackets surround the column name and comparison value, with the operator in between. Supported operators: >, >=, <, <=, =, !=, IN
Examples:
metadata_filter = '[Subject] > [5] AND [Subject] <= [15]' metadata_filter = '[Gender] = [Female] AND [Activity] != [Walking]' metadata_filter = '[Subject] IN [5, 7, 9, 11, 13, 15]'
Note
Queries do not support OR-concatenation between expressions, but often the IN operator can be used to achieve OR-like functionality on a single column. For example:
[Gesture] IN [A, M, L]
is equivalent to:
[Gesture] = [A] OR [Gesture] = [M] OR [Gesture] = [L]
- property name: str
Name of the query
- plot_statistics(renderer=None, **kwargs)
Generates a bar plot of the query statistics
- post_feature_statistics(window_size: Optional[int] = None)
Returns metadata statistics for the query.
- refresh()
Calls the REST API and self populate using the uuid.
- property segmenter: int
Segmenter to use for the query
- Parameters
value (int) – ID of segmenter.
- size()
Returns the size of the dataframe which would result from the query.
- statistics_segments(renderer=None) DataFrame
Returns metadata statistics for the query.
- property summary_statistics: dict
Name of the query
- update(renderer=None) Response
Calls the REST API and updates the query object on the server.