Validator

The DPSQL+ Validator parses SQL strings (in a Spark/SQLite-like dialect) and performs static checks:

  • Registration of private tables (ALTER PRIVATE_TABLE ...)

  • Uniqueness and propagation of the privacy unit column in CTEs (WITH clause)

  • Allowed join constraints (only equality joins on privacy unit columns)

  • Column and aggregation constraints in the final SELECT

  • Extraction of privacy parameters (PRIVATE_QUERY OPTIONS ...)

  • Detection of unsupported constructs (e.g. PIVOT / UNPIVOT / LATERAL VIEW)

If you already understand the syntax, see the raw grammar: sqlgrammar.lark.

Validation Pipeline & Privacy Constraints

  1. ALTER PRIVATE_TABLE

  • Capture (optional) database name, table name, and exactly one PRIVACY_UNIT_COLUMN.

  • Reject duplicate registrations.

  1. WITH (CTEs)

  • Parse each CTE SELECT core.

  • If private tables are referenced, enforce exactly one privacy unit column.

  • For set operations (e.g. UNION / INTERSECT), the privacy unit column name must match on both sides.

  1. FROM / JOIN in CTEs

  • Disallow CROSS JOIN between private tables.

  • Allow only equality predicates (table_a.col = table_b.col) chained with AND for private tables.

  • Forbid duplicate equality predicates and self-joins on the same privacy unit column.

  1. PRIVATE_QUERY OPTIONS

  • Extract privacy parameter literals (EPSILON, DELTA, MIN_FREQUENCY, CONTRIBUTION_BOUND).

  • Validate ranges (EPSILON > 0, 0 < DELTA <= 1, MIN_FREQUENCY >= 1, CONTRIBUTION_BOUND >= 1).

  1. Final SELECT

  • Forbid projecting the raw privacy unit column.

  • Forbid including the privacy unit column in GROUP BY.

  • Forbid using the privacy unit column inside aggregation arguments.

  • DISTINCT only allowed inside aggregation functions (e.g. COUNT(DISTINCT col)).

  • Disallow * wildcard.

  1. Final Consistency

  • Ensure a single lineage of the privacy unit column remains.

Primary Exception Classes

Each may include a context dict and a corrective hint.

  • QueryParseError - Grammar / token / structural ordering errors

  • PrivacyConstraintError - Privacy constraint violations (raw projection, GROUP BY usage, disallowed JOIN)

  • ValidationError - General logical/state validation failures (missing private table, unmatched counts)

  • UnsupportedQueryError - Explicitly unsupported constructs (e.g. PIVOT)

PRIVATE_QUERY OPTIONS Parameters

Supported literals:

  • EPSILON: Positive float (e.g. 1.0)

  • DELTA: Float in (0, 1] (e.g. 1e-5)

  • MIN_FREQUENCY (optional): Positive integer (e.g. 10)

  • CONTRIBUTION_BOUND: Positive integer (e.g. 1)

Aggregation Argument Form

  • Clipping bounds are supplied inline for certain aggregations where required: SUM(column, lower_bound, upper_bound), AVG(column, lower_bound, upper_bound), STDDEV_SAMP(column, lower_bound, upper_bound), VAR_SAMP(column, lower_bound, upper_bound), COVAR_SAMP(column_a, column_b, lower_bound, upper_bound, lower_bound, upper_bound).

  • Bounds must satisfy lower_bound <= upper_bound.

  • COUNT / COUNT(DISTINCT ...) take no clipping parameters.

Examples

Simple private aggregation:

ALTER PRIVATE_TABLE sales
  OPTIONS PRIVACY_UNIT_COLUMN(sales.user_id); -- Equivalent to dpsql.engine.Engine.register_database.

WITH filtered AS (
  SELECT user_id, amount
  FROM sales
  WHERE amount > 0
)
SELECT PRIVATE_QUERY OPTIONS (
  EPSILON = 1.0,
  DELTA = 1e-5,
  CONTRIBUTION_BOUND = 1
) -- Equivalent to epsilon, delta, and contribution_bound in dpsql.dp_params.DPParams.
COUNT(DISTINCT user_id) AS user_cnt,
    SUM(amount, 0, 15) AS total_amount -- Equivalent to clipping_thresholds in dpsql.dp_params.DPParams.
  FROM filtered
  ORDER BY total_amount DESC;

Allowed join (equality):

ALTER PRIVATE_TABLE users
  OPTIONS PRIVACY_UNIT_COLUMN(users.uid); -- Equivalent to dpsql.engine.Engine.register_database.
ALTER PRIVATE_TABLE sessions
  OPTIONS PRIVACY_UNIT_COLUMN(sessions.uid); -- Equivalent to dpsql.engine.Engine.register_database.

WITH joined AS (
  SELECT s.uid, s.duration
  FROM users AS u
  JOIN sessions AS s
    ON u.uid = s.uid
)
SELECT PRIVATE_QUERY OPTIONS (
  EPSILON = 1.0,
  DELTA = 1e-5,
  MIN_FREQUENCY = 5,
  CONTRIBUTION_BOUND = 1
) -- Equivalent to dpsql.dp_params.DPParams.
COUNT(DISTINCT uid) AS active_users,
    SUM(duration, 0, 100) AS total_duration -- Equivalent to clipping_thresholds in dpsql.dp_params.DPParams.
  FROM joined;

Violation (GROUP BY includes privacy unit column):

ALTER PRIVATE_TABLE joined
  OPTIONS PRIVACY_UNIT_COLUMN(joined.uid);

SELECT PRIVATE_QUERY OPTIONS (
  EPSILON = 1.0,
  DELTA = 1e-5,
  MIN_FREQUENCY = 5,
  CONTRIBUTION_BOUND = 1
) uid, SUM(duration, 0, 100)
  FROM joined
  GROUP BY uid;  -- INVALID

Raises PrivacyConstraintError (e.g. “Privacy unit column in GROUP BY is disallowed”).