Overview
Architecture
Main Components
Engine is the main component of the system. It receives queries from users and sends each query to the Validator to check whether it satisfies the privacy constraints. If the query is valid, the Engine inquires the Accountant to check whether sufficient privacy budget remains. If the budget is sufficient, the Engine sends the query to the Backend for execution on the database.
Validator is the component that validates the query. It receives the query from the Engine and checks if it satisfies the privacy constraints.
Accountant is the component that manages the privacy budget. Before the system starts, the Admin sets the total privacy budget, which corresponds to the acceptable privacy leakage. The Accountant calculates the privacy budget consumed each time a query is executed and checks if it exceeds the limit.
Backend is the component that executes the query on the database. It executes the query received from the Engine and adds noise to the results to ensure the differential privacy.
Diagram
graph LR
User([👤 User]) --> Engine[⚙️ Engine]
Engine --> Validator[✅ Validator]
Engine --> Accountant[🔐 Accountant]
Engine --> Backend[🔌 Backend]
subgraph db["Spark / SQLite3 / DuckDB"]
Backend --> Database[(Database)]
end
subgraph pbm["Privacy Budget Management"]
Admin([🛠️ Admin]) --> Accountant
end
Flow of the System
sequenceDiagram
participant Admin as 🛠️ Admin
participant User as 👤 User
participant Engine as ⚙️ Engine
participant Validator as ✅ Validator
participant Accountant as 🔐 Accountant
participant Backend as 🔌 Backend
Admin ->> Engine: Register tables and privacy units
Admin ->> Accountant: Set privacy budget
User ->> Engine: Execute query
Engine ->> Validator: Validate query
Validator ->> Validator: Check whether it satisfies privacy constraints
Validator -->> Engine: Validation result
Engine ->> Accountant: Update privacy budget
Accountant ->> Accountant: Check whether privacy budget is sufficient
Accountant -->> Engine: Check result
Engine ->> Backend: Execute query
Backend ->> Backend: Create intermediate tables
Backend ->> Backend: Execute final select
Backend ->> Backend: Add noise
Backend -->> Engine: Results
Engine -->> User: Results
Query Syntax
DPSQL+ admits a subset of SQL queries. Specifically, queries must adhere to the following syntax:
query_statement ::= [cte, ...] final_select
cte ::= WITH cte_name AS (select_statement)
select_statement ::= SELECT expr [,...] FROM table_reference [WHERE expr] [GROUP BY expr [,...]] [HAVING expr]
final_select ::= SELECT [PRIVATE_QUERY OPTIONS (...)] [agg_expr, ...] FROM table_name | cte_name [GROUP BY column_name]
table_reference ::= table_name | cte_name | table_reference JOIN table_reference ON condition
agg_expr ::= agg_func([DISTINCT] expr)
agg_func ::= COUNT | SUM | AVG ... etc
Particular attention should be given to the following points:
The query must start with a sequence of Common Table Expressions (CTEs) followed by a final select statement.
CTEs must emit the privacy unit if they involve data that should be protected.
In CTEs, joins between tables that need to be protected are only permitted as equi-joins.
The final select statement can only include aggregation functions such as
COUNTandSUMto protect the privacy of the data.
For detailed syntax and privacy constraints, see Validator.