Quick start
===========

This quick start guide helps you get up and running with DPSQL+.

Tutorial
--------
We recommend following the tutorial in the ``examples`` directory to better understand how to use DPSQL+.


How to Use
----------

To begin, install the project by following :doc:`./installation`.

To use DPSQL+, prepare the following:

- A standard SQL query where the final ``SELECT`` clause consists of aggregation expressions
- Privacy budget parameters (epsilon, delta)
- A Spark session object with Python bindings (SQLite3 and DuckDB backends are also supported)
- The database and table names used in the query
- The names of the columns representing user IDs (e.g., ``privacy_unit``) for each table, where applicable


1. **Initialize components**:

    Import and initialize the necessary components. For detailed information about each component, refer to the :doc:`./overview`.

    .. code-block:: python

        from dpsql.backend import SparkSQLBackend
        from dpsql.engine import Engine
        from dpsql.validator import Validator
        from dpsql.accountant import RenyiAccountant

        sql_backend = SparkSQLBackend(spark)
        validator = Validator()
        accountant = RenyiAccountant(epsilon, delta)
        engine = Engine(accountant, sql_backend, validator)

2. **Register databases**:

    Register the databases and tables you want to query with the engine, and specify privacy unit columns:

    .. code-block:: python

        engine.register_database("TITANIC", privacy_unit_columns={"passengers_features": "id", "passengers_survived": "id"})
        engine.register_database("SEX", privacy_unit_columns={})        

3. **Set query parameters**:

    Define the parameters for the query:

    .. code-block:: python

        from dpsql.dp_params import DPParams

        contribution_bound = 1
        min_frequency = 10
        epsilon_per_query = 0.5
        delta_per_query = 5e-5
        clipping = 1

        dpparams = DPParams(
            contribution_bound=contribution_bound, 
            min_frequency=min_frequency, 
            epsilon=epsilon_per_query, 
            delta=delta_per_query, 
            clipping_thresholds=[None, [(0, clipping)]]
        )
4. **Run the query**:

    Execute the query with the specified parameters:

    .. code-block:: python

        sql = """
        WITH combined_data_tmp AS (
            SELECT 
                e.id,
                e.who,
                s.survived
            FROM 
                TITANIC.passengers_features AS e
            JOIN 
                TITANIC.passengers_survived AS s ON e.id = s.id
        ), combined_data AS (
            SELECT
                c.id,
                c.who,
                c.survived,
                w.adult_male
            FROM
                combined_data_tmp AS c
            JOIN
                SEX.is_adult_male AS w ON c.who = w.who
        )
        SELECT 
            adult_male, COUNT(adult_male), SUM(survived)
        FROM 
            combined_data
        GROUP BY 
            adult_male
        """

        result = engine.execute_query(sql, dpparams)
        result.show()


Additional Usage
-----------
- **Specify sigma values directly**:

    You can specify sigma values directly instead of epsilon and delta:

    .. code-block:: python

        from dpsql.dp_params import DPParams

        contribution_bound = 1
        min_frequency = 10
        tau = 100
        sigma = 20
        sigma_for_thresholding = sigma

        dpparams = DPParams(contribution_bound=contribution_bound, min_frequency=min_frequency, tau=tau, sigma_for_thresholding=sigma_for_thresholding, sigmas=[sigma], clipping_thresholds=[None])