# Sending Data from OE to Celonis

Value streams capture information in events throughout their lifecycle. The information captured in the events can be imported into Celonis using a Data Extraction job. To send the data, you need an extractor that brings data from Orchestration Engine to Celonis. For OE, you can customize the extractor so that it shares the data that you want to send to the Celonis system.

## Prerequisite

To extract the data from OE to Celonis, you have to create a Data Pool in your Celonis account. Celonis uses Data Pools to collect and cluster data information to set up an integration flow. To learn about the details, see the [Celonis Data Pools](https://docs.celonis.com/en/creating-and-managing-data-pools.html) documentation.

## Extracting data from OE value streams

### Creating a data connection

You can create your own data extractor based on an Emporix template. The steps below show how to work with the extractor template and how to use it for your custom configuration.

Start with creating a connection to your Celonis Data Pool that links your OE as the Data Source:

1. In your Celonis account, go to **Data Integration** -> **Your Data Pool** -> **Data Connections** and choose **Add Data Connection**.
2. Choose **Connect to Data Source**.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-79d87135472664c1ea6ca9a2219114dcf21f9341%2Fconnect_ds.png?alt=media" alt="" width="512"><figcaption></figcaption></figure>

3. Go to the **Custom** section and choose **Build custom connection** -> **Import from file**.
4. Enter the **Name** for your custom extractor. Optionally, you can also add a **Description** for it.
5. Download the Value Stream Event Log Extractor and then upload it as a JSON file in your Celonis extractor builder:

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-1f0e361ee0de71315be3874b6820d14d9f7c3911%2Fextractor_upload.png?alt=media" alt=""><figcaption></figcaption></figure>

6. Choose **Save and Continue**.
7. Check the parameters that should be used for the connection and then continue to the next step. The parameters were configured automatically with the uploaded extractor JSON file.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-ddc3e9c60509daa12fd8036a1ef3961d56e625da%2Fconnection_params.png?alt=media" alt=""><figcaption></figcaption></figure>

8. Check the authentication method, for Emporix it's the OAuth2 (Client Credentials) method with the following endpoint and type details:
   * **Get Token Endpoint**: {Connection.AUTH\_URL}/oauth/token
   * **Grant Type**: client\_credentials\
     Don't modify this configuration. Continue to the next step.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-8a252d5e56c3289a5a4e37f84da892045ab6e689%2Fauth_method.png?alt=media" alt=""><figcaption></figcaption></figure>

9. Click on the process context endpoint to check and customize its configuration.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-a77828e71e97fd5680bff385a91d4704629a9be8%2Fdata_endpoint.png?alt=media" alt=""><figcaption></figcaption></figure>

The response that is visible in the endpoint configuration, is the part that you have to customize. In the JSON file, enter all the tenant and process context data information that is needed for your custom connection and for getting the proper response.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-55f2cb7afe0e40cbad314675fa2c8785a442a90d%2Fresponse_config.png?alt=media" alt=""><figcaption></figcaption></figure>

{% hint style="warning" %}
Make sure the `context$processid` is checked as a **Primary Key**. Without the `context$processid` it is not possible to link the child tables back to the parent.
{% endhint %}

10. Choose **Finish** to save your custom connection configuration.
11. Go to the **Data Connections** list again and choose **Add Data Connection** -> **Connect to Data Source** -> **Custom**. Your newly created connection is visible there, under Custom connections.
12. Choose your connection and check its configuration details to make sure all the authorization details like **Client ID** or **Client Secret** are added.\
    Save your changes.

{% hint style="warning" %}
You can find the Client ID and Client Secret in Emporix Developer Portal - [Manage API Keys](https://app.gitbook.com/s/bTY7EwZtYYQYC6GOcdTj/getting-started/developer-portal/manage-apikeys).
{% endhint %}

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-ba9ec733336c8d4f7df3a13c22de6370a4818ec3%2Fconnection_details.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

## Customizing data jobs

Having the connection, you can also adjust the data that will be imported to your Celonis account.

1. To start with the data job configuration, go to **Data Pools** -> *Your Data Pool* -> **Data Jobs** and choose **Add Data Job**. Add a name for the **Data Job** and choose your **Data Connection**.
2. In the **Custom Data Job**, choose **+Add Extraction** and select all the available tables: `account`, `digital_process`, `event_type`, `process_context`.

Now you can add additional filters to set some limitations on the load context and it's content, see the examples below that you can use for reference:

### Getting data from the finished value streams only

To get the data only from the **finished value stream runs** (and to exclude gathering the data of the value streams that are running), you can add a filter that fetches data only from the value streams with the **finished** status.

1. Go to **Data Pools** -> *Your Data Pool* -> **Data Jobs** -> **Extractions** -> *Your Extraction* -> `process_context`.
2. In the **Filter Statement** section add `status = 'finished'`. This is an optional field, but it makes sure you only get data from the completed processes.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-94bf29a219f7a2bcc8ff8d1099eee5ac684eb6b6%2Fadditional_filter.png?alt=media" alt=""><figcaption></figcaption></figure>

### Getting data from the value stream finished in a specified time frame

To get the data only from the **finished value stream runs** and to make sure only the **changed data** is loaded, you can use a filter for the **finished** status and **modified time**.

1. Go to **Data Pools** -> *Your Data Pool* -> **Data Jobs** -> `process_context`.
2. Create a new parameter for the modified time - `<%=max_modified_time%>`.

   Go to *Your Extraction* -> **Parameters** tab and choose **New Parameter** and provide the following values in the fields:

   * **type:** dynamic
   * **table**: process\_context
   * **column**: metadata$updatedAt
   * **operation type**: FIND\_MAX
   * **data type**: date
   * **default date**: any date from the past

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-5c6cc01d81950de87db411faed3f61cd336915e0%2Fparameters.png?alt=media" alt=""><figcaption></figcaption></figure>

3. Go to `process_context` and in the **Delta Filter Statement** section add `status = 'finished' AND updatedSince = <%=max_modified_time%>`.

### Limiting data load to a period from a specified date

If you want to limit the date from when you load the post action feedback data, use the `createdSince` filter.

1. In Celonis, go to **Data Pools** in Celonis -> **Data Processing** section and choose **Data Jobs**.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-8e2d5e383f27c9eeb22f817aa898f4ab884ddd0b%2Fdata_jobs.png?alt=media" alt=""><figcaption></figcaption></figure>

2. Go to **Data Pools** -> *Your Data Pool* -> **Data Jobs** -> *Your Data Job* -> **Extractions** and choose *Your Extraction* task.
   * Open the `process_context` configuration.
   * Go to **Time Filter** and configure the filter to customize the creation period.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-5fb1af9dd2e7ac49b5a2da9471cb8f83a626a8fb%2Fload_context.png?alt=media" alt=""><figcaption></figcaption></figure>

### Setting up process context scope to live or draft value stream instances

You can customize the process context extractor to include live, draft or live and draft value stream instance.

1. Go to **Data Pools** -> *Your Data Pools* -> **Data Jobs** -> Choose *Your Data Job* -> **Extractions** and choose the extraction task.
2. Go to **Parameters** and choose **New parameter**.
3. Set up the new parameter to include draft, live or draft and live scopes. You can check the example:
   * **Placeholder**: includeScopes
   * **Name**: Include Scopes
   * **Description**: you can add your text about the parameter here for future reference
   * **Type**: private
   * **Data type**: text
   * **Values**: you can decide if you want to limit the data to live or draft only
     * **draft** includes the instances of draft value stream versions
     * **live** includes the instances that are published and active
     * **live,draft** includes both live and draft instances

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-6abb18d79f2ceaa2cca81ca7c740f1eb21158cc9%2Fextractor_scopes.png?alt=media" alt="" width="563"><figcaption></figcaption></figure>

After creating the parameter you have to create a filter for it in the process context.

4. In the **Data Jobs** -> *Your Data Job* -> **Extractions** -> **Table Configuration** choose `process_context`.
5. Go to **Additional Filter** section and add filters for the Included Scopes:
   * Filter Statement - `includeScopes = '<%=includeScopes%>'`
   * Delta Filter Statement - `updatedSince = <%=max_modified_time%> and includeScopes = '<%=includeScopes%>'` - this example includes setups for both time and scope modifications.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-7cdad66b79dcaafc317a8eab04e52008b79c57b4%2Fextractor_scope2.png?alt=media" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
For more details and instructions on how the Celonis data model and transformation look, see [Celonis Data Jobs](https://docs.celonis.com/en/data-jobs.html) documentation.
{% endhint %}

## Enabling activity log

To make it possible to generate an activity log in Celonis

1. Go to **Data Pools** -> *Your Extractor* -> **Data Jobs** -> **Transformations** -> **Extract Event Log** and use the script below.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-d8ac63e2e6fe58784d8afed80df7aa9c01047187%2Fevent_logs.png?alt=media" alt=""><figcaption></figcaption></figure>

The script that is prepared, creates an activity log table based on the name of the value stream - it supports delta loads. The script creates a single activity for each event that is triggered in a value stream.

```json
CREATE TABLE IF NOT EXISTS activity_log(
  digital_process_name varchar(256), 
  event_name varchar(256), 
  event_key  varchar(256) PRIMARY KEY, 
  event_id varchar(26),
  case_id varchar(256), 
  occurred_at TIMESTAMP, 
  read_at TIMESTAMP, 
  sorting BIGINT);

DROP TABLE IF EXISTS activity_log_new;
CREATE TABLE activity_log_new(
  digital_process_name varchar(256), 
  event_name varchar(256), 
  event_key  varchar(256),
  event_id varchar(26) PRIMARY KEY, 
  case_id varchar(256), 
  occurred_at TIMESTAMP, 
  read_at TIMESTAMP, 
  sorting BIGINT);

INSERT INTO activity_log_new
SELECT "digital_process"."name" as "digital_process_name",  
       "event_type"."name" as "event_name", 
       "event_type"."key" as "event_key", 
      "process_context_event"."id" as "event_id",
       "process_context"."context$processid" as "case_id", 
       "process_context_event"."occurredAt" as "occurred_at",
       "process_context_event"."readAt" as "read_at",
       TIMESTAMPDIFF(MILLISECOND, '1970-01-01 00:00:00', "process_context_event"."readAt")  as "sorting"
FROM  "digital_process", "process_context", "process_context_event", "event_type"
where "process_context"."context$executionTemplateID" = "digital_process"."id" 
and "process_context_event"."process_context$context$processid" = "process_context"."context$processid"
and "event_type"."key" = "process_context_event"."eventType" ;

MERGE INTO activity_log 
USING activity_log_new 
ON activity_log_new.event_id = activity_log.event_id
WHEN NOT MATCHED THEN 
  INSERT (digital_process_name, event_name, event_key, event_id, case_id, occurred_at, read_at, sorting) 
  VALUES (activity_log_new.digital_process_name, 
          activity_log_new.event_name, 
          activity_log_new.event_key, 
          activity_log_new.event_id,
          activity_log_new.case_id, 
          activity_log_new.occurred_at, 
          activity_log_new.read_at,
          activity_log_new.sorting);

DROP TABLE IF EXISTS activity_log_new;
```

2. Set up the **Data Model** to establish relations between all of the extractor's components.

* Go to *Your Extractor* -> **Data Model** and choose **Add Data Model**.

  * Select all the items to be added in the **Activity Table** and choose **Next**.
  * In the **Activity Table** setup, you can configure all the activity columns. Choose the `activity_log` , select the following in the columns and then confirm with **Finish**.
  * **Case ID** - value stream ID
    * **Activity name** - it's an `event_name`
    * **Time stamp** - for `occurred at`
    * **Sorting** - for a better data display
    * **Read\_at** - means a point when an event is saved, later used for sorting

  <figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-e32acc81949e04ee6b1293a09d5dd8d77b2ff523%2Factivity_table.png?alt=media" alt=""><figcaption></figcaption></figure>

7. Set up all the **Foreign Keys** for your activity table.\
   For example, set the ID key for the **Account**.

   * Choose **New foreign key** in the account settings.

   <figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-ebe9d5e80b4d47e34ab647f86cad996ce712e8df%2Faccount_new_key.png?alt=media" alt="" width="187"><figcaption></figcaption></figure>

   * Connect the `ID` field from a **Dimension table** to a `digital_process` `id` in a **Fact table**.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-59d81d60ab400bb9e712c8f60d1d5eff3bb182e7%2Faccount_key.png?alt=media" alt=""><figcaption></figcaption></figure>

**Mandatory relations**:

* Link Process Context (Dimension) with Activity Log (Fact) by linking `activity_log.case_id` and `process_context.context$processid`.
* Link Value Stream (Dimension) with Process Context (Fact) by linking `digital_process.id` and `process_context.context$executionTemplateID`.

**Recommended relations**:

* Use Event Log as activity table
* Link Process Context (Dimension) with the additional tables (Fact) using `process_context.context$processid` to load any additional data as a part of your post action feedback event

**Optional relations**:

* Link Value Stream (Dimension) with Value Stream Triggers (Fact) by linking `digital_process.id` and `digital_process$trigger.digital_process_id`.
* If you want OE multi tenant separation of data, link Account (Dimension) with Value Stream (Fact) by linking `account.id` and `digital_process.tenant`.

To see an example of a configured data model, check the below sample **Data Model**:

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-137606ad95d3d0cf4c199791d07245bb5069a158%2Fdata_model2.png?alt=media" alt=""><figcaption></figcaption></figure>

* **Process Context**: it's the central component, it belongs to one value stream
* **Value Stream**: it can have many process contexts and many value stream's trigger events, but it can have only one account
* **Event log**: it's a 1:1 relationship with the process context
* **Account**: means the OE tenant, it can have many value streams

{% hint style="info" %}
To learn more about the configuration of Data Models in Celonis, see the [Data Model Definition](https://docs.celonis.com/en/data-model-definition.html) documentation.
{% endhint %}

## Executing a data load

Execute a data load based on your configuration and created connection.

1. Go to **Data Pools** -> **Data Jobs** and choose **Execute Data Job**. See the example:

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-99765819d8bfe75e40b589423caa491fdffd1b50%2Fdata_job.png?alt=media" alt=""><figcaption></figcaption></figure>

2. Choose **Execute Selection**.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-ad32161abea8f5e6be8fc240a3e0c1f0218b4716%2Fexecute_selection.png?alt=media" alt=""><figcaption></figcaption></figure>

The job starts and you can already see the process logs.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-c022533e15e9d5fa1a9fbbe82f144ca6c1aa1627%2Frunning_extractor.png?alt=media" alt=""><figcaption></figcaption></figure>

When the process is finished, you can check the logs details.

<figure><img src="https://2734760799-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F8dAaH7DfB59pzZwLxmur%2Fuploads%2Fgit-blob-cf8c730410dc1ebbbe57c4beac416f4b40e3d9f5%2Flogs_history.png?alt=media" alt=""><figcaption></figcaption></figure>
