# User Configurations

The user configuration: config.yaml can be found in GP2 root under /user/ directory.

The user configurations is the place where to configure Green Paths 2.0. It uses YML fileformat. For more on YML syntax see [YML Basics](https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html). 

User configuratinos can be thought to belong into five groups (YML dictionaries):

1. project
2. osm_network
3. data_sources
4. routing
5. analysing

These groups names need to be included in the user configurations, and the names need to be the ones described here!
Example of the groups (YML dictionaries) and formatting:

```yaml
project:
    some_project_key: 123
osm_network:
    some_osm_key: "123"
data_sources:
    - some_data_source_name: "some name"
      some_data_source_key: 123

    - some_data_source_name2: "some name 2"
      some_data_source_key2: 456
routing:
    some_routing_key: 123
analysing:
    some_analysing_key: True
```

```{attention}
When not using optional configuration e.g. data_buffer, remove both, the key and the value!
```

```{hint}
Remember to only choose one if multiple examples given here for the user configurations!
```
```{tip}
All strings can be written with or without apostrophes / quotation marks e.g. aqi, 'aqi' or "aqi".
```


## User Configuration Validation and Descriptor
When filling the user configurations user can use Descriptor command to describe the exposure data sources etc. 
NOTE: the Descriptor currently demands filled and valid user configurations, so it should be improved!

Users should use "validate" command to check the validity of user configurations, when filling it.

See User [Inteface Commands](#cli_user_interface) for more on using these commands.

---

## Project Group

Project group configures the project wide settings.

Groups name in YML: project

### project_crs
  - **Type**: integer
  - **Required**: mandatory
  - **Explanation**: The coordinate reference system (CRS) code to which all spatial data will be reprojected to. Needs metric unit CRS projection, not degrees.
  - **Examples**: 3879
  ```{warning}
  CRS should be projected and using meters as units, not degrees.
  ```

<div class="separator_line"></div>

### datas_coverage_safety_percentage
  - **Type**: integer | float
  - **Required**: optional
  - **Explanation**: percentage of which all exposure data needs to cover on the OSM road segments. Is calculated by simply dividing covered segments by all segments in OSM pbf. The extent of OSM street network will thus affect the coverage percentage.
  - **Default**: 33
  - **Example**: 50

## Project YAML Group Examples

**Example of the project group configurations, with mandatory configurations.**
**Dismissing all the optional "key: value" fields.**
```yaml
project:
  project_crs: 3879
```

**Example of the project group configurations**
```yaml
project:
  project_crs: 3879
  datas_coverage_safety_percentage: 75
```

---

## OSM Network Group

OSM Network configures the OSM PBF settings.

Name in YML: osm_network


### osm_pbf_file_path
  - **Type**: string
  - **Required**: mandatory
  - **Explanation**: file path to the OSM pbf file.
  - **Examples**: user_data_dir/osm/hki.osm.pbf
  ```{hint}
  Filepath can be relative if file located within this project root directory. Otherwise use absolute path.
  ```

<div class="separator_line"></div>

### original_crs
  - **Type**: integer
  - **Required**: mandatory
  - **Explanation**: The original CRS code of the OSM network. This will be transformed to project crs, if not the same.
  - **Examples**: 4326

<div class="separator_line"></div>

### segment_sampling_points_amount
  - **Type**: integer
  - **Required**: optional
  - **Explanation**: force a segment sampling points amount. If not given, the sampling points will be created based on the length of the segment (recommended).
  - **Default**: Generated by using segment length and raster cell size.
  - **Examples**: 5
  ```{hint}
  Recomended not to use this and go with the default length based value, unless good reason!
  ```


**Example of the osm_network group configurations, with mandatory configurations.**
**Dismissing all the optional "key: value" fields.**
```yaml
osm_network:
  osm_pbf_file_path: /user_data_dir/osm/hki.osm.pbf
  original_csr: 4326
```

**Example of the osm network group configurations with optional configurations**
```yaml
osm_network:
  osm_pbf_file_path: /user_data_dir/osm/hki.osm.pbf
  original_csr: 4326
  segmented_sampling_points_amount: 10
```

---

## Data Sources

Data Sources configures the exposure data sources and their individual settings.

These items are YML list items, so they start with character "-". There can be 1-n data sources. See example below.

Groups name in YML: data_sources

### name
  - **Type**: string
  - **Required**: mandatory
  - **Explanation**: name for the exposure data source, can be anything but needs to be the same for the same data throught out user configurations. Prefer short names!
  - **Example**: aqi
  ```{warning}
  Data source name needs to be the same in routing and analysing configurations.
  ```

<div class="separator_line"></div>

### filepath:
  - **Type**: string
  - **Required**: mandatory
  - **Explanation**: filepath to the data.
  - **Example**: /user_data_dir/data/gvi_green.shp
  ```{hint}
  Filepath can be relative if file located within this project root directory. Otherwise use absolute path.
  ```

<div class="separator_line"></div>

### data_type
  - **Type**: string
  - **Required**: optional, recommended
  - **Explanation**: Data type of the exposure data, can be: "raster" or "vector".
  - **Example**: raster
  ```{hint}
  This field is not mandatory, data type will be determined from file name if not given. Recomended to give for more robust and solid execution.
  ```

<div class="separator_line"></div>

### original_crs
  - **Type**: integer
  - **Required**: mandatory
  - **Explanation**: Th original CRS code of the data. This will be transformed to project crs, if not the same.
  - **Examples**: 3879

<div class="separator_line"></div>

### min_data_value
  - **Type**: integer | float
  - **Required**: mandatory
  - **Explanation**: The theoretical minimum value of the data source.
  - **Examples**: 0.0, 1
   ```{hint}
  Use theoretical value so that the data does not scew the results! 
  ```

<div class="separator_line"></div>

### max_data_value
  - **Type**: integer | float
  - **Required**: mandatory
  - **Explanation**: The theoretical maximum value of the data source.
  - **Examples**: 5, 97.9
  ```{hint}
  Use theoretical value so that the data does not scew the results! 
  ```

<div class="separator_line"></div>

### good_exposure
  - **Type**: boolean
  - **Required**: mandatory
  - **Explanation**: Determines if the exposure values are treated as positive (bad) or negative (good) weights for the road segments. Adding cost to segments makes it more expensive and vice versa.
  - **Examples**: True
  ```{hint}
  Make sure this is correct! True means good exposure like greenery (decreasing traversal cost), False means bad exposure like noise or air quality (increasing traversal cost).
  ```
  ```{warning}
  As air quality should be bad exposure, but low value in air quality e.g. 1.2 (form 1-5 scale) is actually clean air, but it will be slightly penalized, compared to segments that do not have any value.
  This should only be problem with sparse exposure data.
  ```

<div class="separator_line"></div>

### data_buffer
  - **Type**: integer | float
  - **Required**: optional
  - **Explanation**: Isotropic buffer for vector data, in meters. Can be used to increase the effect of points or lines etc. Should be used with caution and with a good reason, as it can twist the results.
  - **Example**: 5
  ```{warning}
  Use only with good reason, know what you are doing.
  ```

<div class="separator_line"></div>

### data_column
  - **Type**: string
  - **Required**: mandatory (vector) | not used for raster
  - **Explanation**: The name of the data field "column" in the data source.
  - **Example**: db_hi

<div class="separator_line"></div>


### no_data_value
  - **Type**: integer | float
  - **Required**: optional
  - **Explanation**: Value to be used for no data values (no exposure raster for segment found). If this is given, the segments with no data value do not get any good or bad weighting from exposure data sources that are not found for them.
  - **Examples**: 0.0, 1
  - **Note**: Set this if the data has some specific value for no_data, e.g. -999. The no data will be filtered out and not used for routing or analysing exposure. 

<div class="separator_line"></div>

### layer_name
  - **Type**: string
  - **Required**: optional (vector), recommended
  - **Explanation**: For vector data that might have multiple layers (e.g. GPKG), the name of the layer. If not given, will take first layer if only one layer available. Otherwise will cause error.
  - **Example**: comb_gvi

<div class="separator_line"></div>

### raster_cell_resolution
  - **Type**: integer | float
  - **Required**: mandatory (vector), optional (raster)
  - **Explanation**: The resolution (in meters) that the exposure raster will have. If this is given to raster data source, will reproject to this cell resolution.
  - **Example**: 20

<div class="separator_line"></div>

### save_raster_file
  - **Type**: boolean
  - **Required**: optional
  - **Explanation**: Decides if the exposure raster should be saved to cache for inspections etc.
  - **Default**: False
  - **Example**: True

<div class="separator_line"></div>

### custom_processing_function
  - **Type**: string
  - **Required**: optional, experimental
  - **Explanation**: Experimental: if a data set needs some pre-pre-processing, a function needs to be manually written to globals in custom_functions.py and this given the name. It is recommended to process the exposure data sources so that no pre-pre-processin is needed. This is mainly done for AQI .nc data for Helsinki.


## Data sources YAML Group Examples

**Example of the Data sources group configurations, with mandatory configurations.**
**Dismissing all the optional "key: value" fields.**

*gvi_lines is vector, aqi is raster data*
*note that some configuration fields are needed for vector but not raster e.g. raster_cell_resolution*
```yaml
data_sources:
  - name: 'gvi_lines'
      filepath: /user_data_dir/data/gvi_lines.shp
      data_column: Comb_GVI
      no_data_value: 0
      min_data_value: 0.0
      max_data_value: 97.9
      good_exposure: True
      raster_cell_resolution: 10
      original_crs: 3879

  - name: "aqi"
      filepath: /user_data_dir/data/aqi.nc
      original_crs: 4326 
      data_column: AQI
      no_data_value: 1
      min_data_value: 1
      max_data_value: 5
      good_exposure: False
```

**Example of the data sources group configurations with optional configurations**

```yaml
data_sources:
  - name: 'gvi_lines'
      filepath: /user_data_dir/data/gvi_lines.shp
      data_type: vector
      data_buffer: 10
      save_raster_file: True
      data_column: Comb_GVI
      no_data_value: 0
      min_data_value: 0.0
      max_data_value: 97.9
      good_exposure: True
      raster_cell_resolution: 10
      original_crs: 3879

  - name: "aqi"
    filepath: /user_data_dir/data/gvi_lines.shp
    data_type: raster
    original_crs: 4326
    data_column: AQI
    no_data_value: 1
    min_data_value: 1
    max_data_value: 5
    good_exposure: False
    raster_cell_resolution: 10
    save_raster_file: True 
    custom_processing_function: convert_aq_nc_to_tif_and_scale_offset

```

---

## Routing Group

Routing group configures the routing settings.

Name in YML: routing

### transport_mode
  - **Type**: string
  - **Required**: mandatory
  - **Explanation**: travelling mode. Options: walking, cycling.
  - **Example**: walking

<div class="separator_line"></div>

### travel_speed
  - **Type**: integer | float
  - **Required**: mandatory
  - **Explanation**: define travelling speed in km/h. 
  - **Example**: 5.5
  - **Defaults**: 5.0 (walking), 15.0 (cycling)

<div class="separator_line"></div>

### od_crs
  - **Type**: integer
  - **Required**: mandatory
  - **Explanation**: CRS of the origin destination (OD) files. Both need to be in same CRS.
  - **Example**: 3879

<div class="separator_line"></div>

### origins
  - **Type**: string
  - **Required**: mandatory
  - **Explanation**: filepath to the origin(s) file. Can be in filetypes: gpkg, shp, csv. Csv needs od_lon_name, od_lat_name.
  - **Example**: user_folder/origins.shp

<div class="separator_line"></div>

### destinations
  - **Type**: string
  - **Required**: mandatory
  - **Explanation**: filepath to the destination(s) file. Can be in filetypes: gpkg, shp, csv. Csv needs od_lon_name, od_lat_name.
  - **Example**: user_folder/destinations.shp

<div class="separator_line"></div>

### od_lon_name
  - **Type**: string
  - **Required**: mandatory only for csv OD file, otherwise optional.
  - **Explanation**: name of the longitude column in the OD csv.
  - **Example**: lon

<div class="separator_line"></div>

### od_lat_name
  - **Type**: string
  - **Required**: mandatory only for csv OD file, otherwise optional.
  - **Explanation**: name of the latitude column in the OD csv.
  - **Example**: lat

<div class="separator_line"></div>

### precalculate
  - **Type**: boolean
  - **Required**: optional
  - **Explanation**: defines if segment weights should be precalculated to the network before routing. Using precalculate should be faster, especially for larger calculations. If this is False, will calculate segment costs while routing.
  - **Example**: False
  - **Default**: True


<div class="separator_line"></div>

### exposure_parameters
  - **Type**: dictionary
  - **Required**: optional
  - **Explanation**: defines the individual settings for each exposure data source. Fields: name, sensitivity, allow_missing_data. Needs list(s) of dicts, see the Routing YAML examples.
  - **Example**:  - name: gvi_lines
                    sensitivity: 2.5
                  - name: aqi
                    sensitivity: 2.5
                    allow_missing_data: false
  - **Default**: allow_missing_data = True
    
  ```{hint}
  **name**: needs to be the same as in data sources

  **sensitivity**: this is the weight which is used in formula to weighten the exposure factor derived from exposure data. Formula: traversal time + (traversal time * sensitivity * exposure factor). All exposure factors will be normalized between 0-1 and for positive exposures, made negative.

  **allow_missing_data**: Experimental feature, if set to False, will crash the route finding if any segment does not have exposure value. Most likely should not be used!!! Default is True.
  ```
  ```{attention}
  Every exposure data source needs to be given name and sensitivity. If exposure results are wanted from some paths, but that data source is not wanted to include in the path optimization
  that data sources sensitivity should then be set to 0.

  e.g. user want to find air quality optimized paths, but would also like to know the amount of greenery, but only want to route based on air quality. Setting greenery (and other possible exposure datasource) sensitivity to 0.
  ```

## Routing YAML Group Examples

**Example of the Routing configurations, with mandatory configurations and using SHP OD's.**
**Dismissing all the optional "key: value" fields.**

*note that this example is using gpkg OD's*
```yaml
routing:
  transport_mode: walking
  origins: /user_folder/some_origins_point(s).gpkg
  destinations: /user_folder/some_destination(s)_points.gpkg
  od_crs: 27700
  exposure_parameters:
    - name: gvi_lines
      sensitivity: 1.5
    - name: aqi
      sensitivity: 1.25
```
```{hint}
Using relatively small sensitivities (weights) produced the most optimal exposure routes, some even too optimal, neclecting time too much.

"Best" results were gained with 1.5, 2.5 and 5 sensitivities (weights). Using too large sensitivities (weigths) e.g. 10, 20 decreased the positive exposure so much that all segments got cheap. Read more from documentation section (and thesis).

```

*note that this example is using csv OD's, so need to define od_lon_name and od_lat_name*

```yaml
routing:
  transport_mode: cycling
  travel_speed: 5
  precalculate: True
  od_lon_name: long
  od_lat_name: lat    
  origins: /user_folder/some_origins_point(s).gpkg
  destinations: /user_folder/some_destination(s)_points.gpkg
  od_crs: 27700
  exposure_parameters:
    - name: gvi_lines
      sensitivity: 1.5
      allow_missing_values: False
    - name: aqi
      sensitivity: 2.5
```

---

## Analysing Group

Analysing group configures the last module of analysing results settings.

Name in YML: analysing

### keep_geometry
  - **Type**: boolean
  - **Required**: optional
  - **Explanation**: Defines if geometries should be included in the final results. If they are, final output file will be .gpgk, if not it will be .csv.
  - **Default**: False
  - **Example**: True
  ```{warning}
  Taking geometries to masscalculations will take more time and the final file more memory!
  ```

<div class="separator_line"></div>

### save_output_name
  - **Type**: string
  - **Required**: optional
  - **Explanation**: Custom name for the final output file.
  - **Default**: "output_results_[time_of_finnish]"
  - **Example**: london_routes_greenery_lit

<div class="separator_line"></div>

### cumulative_ranges
  - **Type**: dictionary
  - **Required**: optional
  - **Explanation**: Custom ranges to divide the results and save to final output as a new column/field. Needs the main dict (header) of cumulative_ranges, should have data sources names as dicts and ranges as list of lists, see the Analysing YAML examples.
  - **Example**: gvi_lines:
                    - [0,10]
                    - [10.01, 20]
                    - [20.01, 50]
                  aqi:
                    - [0, 0.99]
                    - [1, 1.99]
                    - [2, 2.99]
                    - [3, 3.99]
                    - [4, 5]
  ```{attention}
  Exposure data source names need to be exactly the same as defined earlier.
  ```

## Analysing YAML Group Examples

**Example of the Analysing group configurations, it only has optional parameters.**
**Dismissing all the optional "key: value" fields.**

```yaml
analysing:
    keep_geometry: True
    save_output_name: london_routes_greenery_fam
    cumulative_ranges:
      gvi_lines:
        - [0,10]
        - [10.01, 20]
        - [20.01, 50]
      aqi:
        - [0, 0.99]
        - [1, 1.99]
        - [2, 2.99]
        - [3, 3.99]
        - [4, 5]
```

---

## Complete User Configuration YAML example

Here is full example of filled user/config.yaml. This configuration is using vector gvi_lines, and raster aqi data sets. All data will be reprojected to project_crs of 3879. Exposure raster from gvi_lines will be created of 10m pixel cell resolution, aqi raster will be reprojected to match this with 10m resolution.

The route finding will use walking with speed 5 km/h. It will prefer and weight the greenery gvi_lines values little more than the aqi. The weights for segments will be precalculated as there seems to be thousands of OD points.

The geometries will not be kept for such large masscalculations. The resulting exposures will be grouped to cumulative ranges.

```{attention}
Note the importance of the correct intendations!
```

*user/config.yaml*
```yaml

project:
  project_crs: 3879

osm_network:
  osm_pbf_file_path: /user_data_dir/osm/hki.osm.pbf
  original_csr: 4326

data_sources:
  - name: 'gvi_lines'
    filepath: /user_data_dir/data/gvi_lines.shp
    data_type: vector # optional
    data_buffer: 10
    save_raster_file: True # optional
    data_column: Comb_GVI
    no_data_value: 0
    min_data_value: 0.0
    max_data_value: 97.9
    good_exposure: True
    raster_cell_resolution: 10
    original_crs: 3879

  - name: "aqi"
    filepath: /user_data_dir/data/gvi_lines.shp
    data_type: raster # optional
    original_crs: 4326
    data_column: AQI
    no_data_value: 1
    min_data_value: 1
    max_data_value: 5
    good_exposure: False
    raster_cell_resolution: 10 # optional
    save_raster_file: True # optional
    custom_processing_function: convert_aq_nc_to_tif_and_scale_offset # optional

routing:
  transport_mode: walking
  travel_speed: 5
  origins: /user_folder/thousands_origins_point(s).gpkg
  destinations: /user_folder/thousands_destination(s)_points.gpkg
  od_crs: 27700
  exposure_parameters:
    - name: gvi_lines
      sensitivity: 1.5
    - name: aqi
      sensitivity: 1.25

analysing:
    keep_geometry: False
    save_output_name: example_routes_greenery_airquality
    cumulative_ranges:
      gvi_lines:
        - [0,10]
        - [10.01, 20]
        - [20.01, 50]
      aqi:
        - [0, 0.99]
        - [1, 1.99]
        - [2, 2.99]
        - [3, 3.99]
        - [4, 5]
```