Overview

Architecture

Main Components

Engine is the main component of the system. It receives queries from users and sends each query to the Validator to check whether it satisfies the privacy constraints. If the query is valid, the Engine inquires the Accountant to check whether sufficient privacy budget remains. If the budget is sufficient, the Engine sends the query to the Backend for execution on the database.

Validator is the component that validates the query. It receives the query from the Engine and checks if it satisfies the privacy constraints.

Accountant is the component that manages the privacy budget. Before the system starts, the Admin sets the total privacy budget, which corresponds to the acceptable privacy leakage. The Accountant calculates the privacy budget consumed each time a query is executed and checks if it exceeds the limit.

Backend is the component that executes the query on the database. It executes the query received from the Engine and adds noise to the results to ensure the differential privacy.

Diagram

        graph LR
   User([👤 User]) --> Engine[⚙️ Engine]
   Engine --> Validator[✅ Validator]
   Engine --> Accountant[🔐 Accountant]
   Engine --> Backend[🔌 Backend]

   subgraph db["Spark / SQLite3 / DuckDB"]
      Backend --> Database[(Database)]
   end

   subgraph pbm["Privacy Budget Management"]
      Admin([🛠️ Admin]) --> Accountant
   end
    

Flow of the System

        sequenceDiagram
   participant Admin as 🛠️ Admin
   participant User as 👤 User
   participant Engine as ⚙️ Engine
   participant Validator as ✅ Validator
   participant Accountant as 🔐 Accountant
   participant Backend as 🔌 Backend

   Admin ->> Engine: Register tables and privacy units
   Admin ->> Accountant: Set privacy budget
   User ->> Engine: Execute query
   Engine ->> Validator: Validate query
   Validator ->> Validator: Check whether it satisfies privacy constraints
   Validator -->> Engine: Validation result
   Engine ->> Accountant: Update privacy budget
   Accountant ->> Accountant: Check whether privacy budget is sufficient
   Accountant -->> Engine: Check result
   Engine ->> Backend: Execute query
   Backend ->> Backend: Create intermediate tables
   Backend ->> Backend: Execute final select
   Backend ->> Backend: Add noise
   Backend -->> Engine: Results
   Engine -->> User: Results
    

Query Syntax

DPSQL+ admits a subset of SQL queries. Specifically, queries must adhere to the following syntax:

query_statement ::= [cte, ...] final_select
cte ::= WITH cte_name AS (select_statement)
select_statement ::= SELECT expr [,...] FROM table_reference [WHERE expr] [GROUP BY expr [,...]] [HAVING expr]
final_select ::= SELECT [PRIVATE_QUERY OPTIONS (...)] [agg_expr, ...] FROM table_name | cte_name [GROUP BY column_name]
table_reference ::= table_name | cte_name | table_reference JOIN table_reference ON condition
agg_expr ::= agg_func([DISTINCT] expr)
agg_func ::= COUNT | SUM | AVG ... etc

Particular attention should be given to the following points:

  • The query must start with a sequence of Common Table Expressions (CTEs) followed by a final select statement.

  • CTEs must emit the privacy unit if they involve data that should be protected.

  • In CTEs, joins between tables that need to be protected are only permitted as equi-joins.

  • The final select statement can only include aggregation functions such as COUNT and SUM to protect the privacy of the data.

For detailed syntax and privacy constraints, see Validator.