List of Expectations
Fashion Data Expectations
Temporal continuity
Key constraints
primarykey_named(dataset, tablename, column, threshold)
Expect a column in a table to respect a pseudo Primary Key constraint.
This procedure checks that every value of the specified column is not null and unique within the current table. This is enforced by counting the total number of rows within the table and comparing it to the number of distinct element in the column. A threshold percentage can be provided, so the test is passed if the number of rejected rows divided by the table total row count represents less than the threshold. Use 0 if no rejected row is allowed.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name, or an sql operation that creates a pseudo column that can be inserted into a count distinct
threshold (FLOAT64) – The threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
primarykey(dataset, tablename, threshold)
Expect a table to have a column that respects a pseudo Primary Key constraint.
This procedure looks for a column with a name that starts wiht 'PK' or with a desdcription that contains '#PK'. Then it checks that every value of this column is not null and unique within the current table. This is enforced by counting the total number of rows within the table and comparing it to the number of distinct element in the column. A threshold percentage can be provided, so the test is passed if the number of rejected rows divided by the table total row count represents less than the threshold. Use 0 if no rejected row is allowed.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
threshold (FLOAT64) – The threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
primarykey_where(dataset, tablename, threshold, where_clause)
Expect a table to have a column that respects a pseudo Primary Key constraint. Apply a WHERE condition before testing the primary key constraint to limit the number of rows requested.
The test performed is the as with the primarykey expectation. See more detail above.
Parameters
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
threshold (FLOAT64) – The threshold to use to trigger an assertion failure
where_clause (STRING) – A proper WHERE clause that will filter the table before applying the test.
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
foreignkey(dataset, tablename, column, target_dataset, target_tablename, target_column, threshold)
Expect a column in a table to respect a pseudo Foreign Key constraint.
This procedure checks that every non-null value in the column can be found in the values of the target column of the reference target table. A threshold percentage can be provided, so the test is passed if the number of rejected rows divided by the table total row count is less than the threshold. Use 0 if no rejected row is allowed.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name
target_dataset (STRING) – The target dataset of the foreign table (and its GCP project)
target_tablename (STRING) – The foreign table name
target_column (STRING) – The foreign key column of the foreign table
threshold (FLOAT64) – The threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
foreignkey_where(dataset, tablename, column, filter_condition, target_dataset, target_tablename, target_column, threshold)
Expect a column in a table to respect a pseudo Foreign Key constraint.
Add a filter condition to the WHERE clause before testing the foreign key constraint to limit the number of rows requested.
This procedure checks that every non-null value in the column can be found in the values of the target column of the reference target table. A threshold percentage can be provided, so the test is passed if the number of rejected rows divided by the table total row count is less than the threshold. Use 0 if no rejected row is allowed.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name
filter_condition (STRING) – The filter condition that will be applied on the table to check
target_dataset (STRING) – The target dataset of the foreign table (and its GCP project)
target_tablename (STRING) – The foreign table name
target_column (STRING) – The foreign key column of the foreign table
threshold (FLOAT64) – The threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
Temporal continuity
everyday_since(dataset, tablename, column, start_date, exception, minimum)
Expect a table to have a minimum number of rows per day since a start date. An exception list can be provided to avoid an error when a date has no data for a good reason.
This procedure counts the number of rows of the specified table grouped by date. If a day between the specified start_date and today is missing, or if a daily count is below minimum, then the test fails, except if the date is specified in the exception list.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name
start_date (STRING) – The date to start the control
exception (ARRAY<DATE>) – An array that contains dates that will not be checked
minimum (INT64) – The minimum amount of lines per date expected.
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
everyday_increasing_since(dataset, tablename, value)
Expect a table to have a daily number of rows continuously increasing since a predefined date.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
value (DATE) – The starting date to check for increase in value
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
everyweek_since(dataset, tablename, column, start_date, exception, minimum)
Expect a table to have a date column with a date every week since start_date, and containing a minimum number of rows. An exception list can be provided to avoid an error when a date has no data for a good reason.
This procedure generates a date array containing the start_date and the same day for every week until the current date. Then it counts the rows of the table grouped by date. If a day between the specified start date and today is missing, or if a daily count is below minimum, or if an extra date is in the table but does not fit in the monthly pattern, then the test fails, except if the date is specified in the exception list.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name
start_date (STRING) – The date to start the control
exception (ARRAY) – An array that contains dates that will not be checked
minimum (INT64) – The minimum amount of lines per date expected.
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
everymonth_since(dataset, tablename, column, start_date, exception, minimum)
Expect a table to have a date column with a date every month since start_date, and containing a minimum number of rows. An exception list can be provided to avoid an error when a date has no data for a good reason.
This procedure generates a date array containing the start_date and the same day for every month until the current date. Then it counts the rows of the table grouped by date. If a day between the specified start date and today is missing, or if a daily count is below minimum, or if an extra date is in the table but does not fit in the monthly pattern, then the test fails, except if the date is specified in the exception list.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)of the table
tablename (STRING) – The table name
column (STRING) – The column name
start_date (STRING) – The date to start the control
exception (ARRAY<DATE>) – An array that contains dates that will not be checked
minimum (INT64) – The minimum amount of lines per date expected.
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
Row count
table_count_greater(dataset, tablename, value)
Expect a table to have a count greater than or equal to a predefined value.
A threshold percentage can be provided, so the test is passed if the number of rejected rows divided by the table total row count represents less than the threshold. Use 0 if no rejected row is allowed.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
value (INT64) – The value the table count must be greater to
threshold (FLOAT64) – The threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
table_count_between(dataset, tablename, value)
Expect a table to have a number of rows to be between two values.
The values for the comparison must be provided as string and will be cast to integer during the assertion. The order in the array is important as we use the “between” predicat function to enforce this expectation: the lower value must be before the upper value.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
value (ARRAY<STRING>) – The array of values that will be used to check the table
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
table_count_equal(dataset, tablename, value)
Expect a table to have a count equal to a predefined value.
A threshold percentage can be provided, so the test is passed if the number of rejected rows divided by the table total row count represents less than the threshold. Use 0 if no rejected row is allowed.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
value (INT64) – the value the table count must be equal to
threshold (FLOAT64) – the threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
table_count_equal_other_table(dataset, tablename, target_dataset, target_tablename, threshold)
Expect a table to have the same number of lines than another table.
A threshold percentage can be provided, so the test is passed if the number of rejected rows divided by the table total row count represents less than the threshold. Use 0 if no rejected row is allowed.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
target_dataset (STRING) – The target dataset
target_tablename (STRING) – The target table name
threshold (FLOAT64) – The threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
Column properties
unique(dataset, tablename, column)
Expect every value in the column to be unique.
This procedure checks that the number of distinct value of the specified column is equal to the total number of lines in the table. Null values are part of the process (so one line can be null but it must be the only one).
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
not_null(dataset, tablename, column, threshold)
Expect a table to have a column to never be null.
This procedure counts the number of null in the specified column. A threshold percentage can be provided, so the test is passed if the number of rejected rows divided by the table total row count is less than the threshold. Use 0 if no rejected row is allowed.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name to check for value
threshold (FLOAT64) – the threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
null(dataset, tablename, column, threshold)
Expect a table to have a column to be fully null.
This procedure counts the number of non-null values in the specified column. A threshold percentage can be provided, so the test is passed if the number of rejected rows divided by the table total row count is less than the threshold. Use 0 if no rejected row is allowed.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name to check for value
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
type(dataset, tablename, column, type)
Expect a table to have a column that can be casted as the predefined type with no error.
This procedure checks that a safe casted (to the wanted type) non-snull value will not be null. All BigQuery types are allowed (see BigQuery documentation here).
Parameters
dataset (STRING) – The dataset of the table (and its related GCP project)
tablename (STRING) – The table name
column (STRING) – The column name
type (STRING) – The type of the column to check
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
values_to_contain(dataset, tablename, column, value, minimum, threshold)
Expect a table to have a column to contain a certain value at a certain minimum level with a threshold.
The authorized value type must be in a string (even for numeric values) as there is a safe_cast to string in the verification predicat. A threshold might be specified so marginal value might not trigger any assertion exception.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name
value (STRING) – The value of the predefined set
minimum (INT64) – The minimum value to have for the column
threshold (FLOAT64) – the threshold to use to trigger an assertion failure (as a percentage)
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
values_to_be_between(dataset, tablename, column, value, threshold)
Expect a table to have a column to be between two values.
The authorized value type may be integer, float or dates to work properly. The between predicate requires the parameter to be included and in the proper order (for exemple for a set of date, the first date must be before the second date). A threshold might be specified so marginal value might not trigger any assertion exception.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name
value (ARRAY) – The array that contains the two values range
threshold (FLOAT64) – the threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
values_to_be_in_set(dataset, tablename, column, value, threshold)
Expect a table to have a column to be in a predefined set.
The authorized value type must be in an array as string as there is a cast in the verification predicat. A threshold might be specified so marginal value might not trigger any assertion exception.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name
value (ARRAY) – The values array of the predefined set
threshold (FLOAT64) – the threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
values_to_not_be_in_set(dataset, tablename, column, value, threshold)
Expect a table to have a column to NOT be in a predetermined set of values.
The not authorized value type must be in a array as string as there is a cast in the verification predicat. A threshold might be specified so marginal value might not trigger any assertion exception.
Parameters
dataset (STRING) – The dataset of the table (and its GCP project)
tablename (STRING) – The table name
column (STRING) – The column name
value (STRING) – The target project of the foreign table
threshold (FLOAT64) – the threshold to use to trigger an assertion failure
Returns
The last SQL job of the procedure returns a line as described here. When this expectation is embedded in a table-to-table data operation, then this line is inserted into the tailer_common.expectation_results table in your GCP project.
Last updated