# User Configurations The user configuration: config.yaml can be found in GP2 root under /user/ directory. The user configurations is the place where to configure Green Paths 2.0. It uses YML fileformat. For more on YML syntax see [YML Basics](https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html). User configuratinos can be thought to belong into five groups (YML dictionaries): 1. project 2. osm_network 3. data_sources 4. routing 5. analysing These groups names need to be included in the user configurations, and the names need to be the ones described here! Example of the groups (YML dictionaries) and formatting: ```yaml project: some_project_key: 123 osm_network: some_osm_key: "123" data_sources: - some_data_source_name: "some name" some_data_source_key: 123 - some_data_source_name2: "some name 2" some_data_source_key2: 456 routing: some_routing_key: 123 analysing: some_analysing_key: True ``` ```{attention} When not using optional configuration e.g. data_buffer, remove both, the key and the value! ``` ```{hint} Remember to only choose one if multiple examples given here for the user configurations! ``` ```{tip} All strings can be written with or without apostrophes / quotation marks e.g. aqi, 'aqi' or "aqi". ``` ## User Configuration Validation and Descriptor When filling the user configurations user can use Descriptor command to describe the exposure data sources etc. NOTE: the Descriptor currently demands filled and valid user configurations, so it should be improved! Users should use "validate" command to check the validity of user configurations, when filling it. See User [Inteface Commands](#cli_user_interface) for more on using these commands. --- ## Project Group Project group configures the project wide settings. Groups name in YML: project ### project_crs - **Type**: integer - **Required**: mandatory - **Explanation**: The coordinate reference system (CRS) code to which all spatial data will be reprojected to. Needs metric unit CRS projection, not degrees. - **Examples**: 3879 ```{warning} CRS should be projected and using meters as units, not degrees. ```
### datas_coverage_safety_percentage - **Type**: integer | float - **Required**: optional - **Explanation**: percentage of which all exposure data needs to cover on the OSM road segments. Is calculated by simply dividing covered segments by all segments in OSM pbf. The extent of OSM street network will thus affect the coverage percentage. - **Default**: 33 - **Example**: 50 ## Project YAML Group Examples **Example of the project group configurations, with mandatory configurations.** **Dismissing all the optional "key: value" fields.** ```yaml project: project_crs: 3879 ``` **Example of the project group configurations** ```yaml project: project_crs: 3879 datas_coverage_safety_percentage: 75 ``` --- ## OSM Network Group OSM Network configures the OSM PBF settings. Name in YML: osm_network ### osm_pbf_file_path - **Type**: string - **Required**: mandatory - **Explanation**: file path to the OSM pbf file. - **Examples**: user_data_dir/osm/hki.osm.pbf ```{hint} Filepath can be relative if file located within this project root directory. Otherwise use absolute path. ``` ### original_crs - **Type**: integer - **Required**: mandatory - **Explanation**: The original CRS code of the OSM network. This will be transformed to project crs, if not the same. - **Examples**: 4326 ### segment_sampling_points_amount - **Type**: integer - **Required**: optional - **Explanation**: force a segment sampling points amount. If not given, the sampling points will be created based on the length of the segment (recommended). - **Default**: Generated by using segment length and raster cell size. - **Examples**: 5 ```{hint} Recomended not to use this and go with the default length based value, unless good reason! ``` **Example of the osm_network group configurations, with mandatory configurations.** **Dismissing all the optional "key: value" fields.** ```yaml osm_network: osm_pbf_file_path: /user_data_dir/osm/hki.osm.pbf original_csr: 4326 ``` **Example of the osm network group configurations with optional configurations** ```yaml osm_network: osm_pbf_file_path: /user_data_dir/osm/hki.osm.pbf original_csr: 4326 segmented_sampling_points_amount: 10 ``` --- ## Data Sources Data Sources configures the exposure data sources and their individual settings. These items are YML list items, so they start with character "-". There can be 1-n data sources. See example below. Groups name in YML: data_sources ### name - **Type**: string - **Required**: mandatory - **Explanation**: name for the exposure data source, can be anything but needs to be the same for the same data throught out user configurations. Prefer short names! - **Example**: aqi ```{warning} Data source name needs to be the same in routing and analysing configurations. ``` ### filepath: - **Type**: string - **Required**: mandatory - **Explanation**: filepath to the data. - **Example**: /user_data_dir/data/gvi_green.shp ```{hint} Filepath can be relative if file located within this project root directory. Otherwise use absolute path. ``` ### data_type - **Type**: string - **Required**: optional, recommended - **Explanation**: Data type of the exposure data, can be: "raster" or "vector". - **Example**: raster ```{hint} This field is not mandatory, data type will be determined from file name if not given. Recomended to give for more robust and solid execution. ``` ### original_crs - **Type**: integer - **Required**: mandatory - **Explanation**: Th original CRS code of the data. This will be transformed to project crs, if not the same. - **Examples**: 3879 ### min_data_value - **Type**: integer | float - **Required**: mandatory - **Explanation**: The theoretical minimum value of the data source. - **Examples**: 0.0, 1 ```{hint} Use theoretical value so that the data does not scew the results! ``` ### max_data_value - **Type**: integer | float - **Required**: mandatory - **Explanation**: The theoretical maximum value of the data source. - **Examples**: 5, 97.9 ```{hint} Use theoretical value so that the data does not scew the results! ``` ### good_exposure - **Type**: boolean - **Required**: mandatory - **Explanation**: Determines if the exposure values are treated as positive (bad) or negative (good) weights for the road segments. Adding cost to segments makes it more expensive and vice versa. - **Examples**: True ```{hint} Make sure this is correct! True means good exposure like greenery (decreasing traversal cost), False means bad exposure like noise or air quality (increasing traversal cost). ``` ```{warning} As air quality should be bad exposure, but low value in air quality e.g. 1.2 (form 1-5 scale) is actually clean air, but it will be slightly penalized, compared to segments that do not have any value. This should only be problem with sparse exposure data. ``` ### data_buffer - **Type**: integer | float - **Required**: optional - **Explanation**: Isotropic buffer for vector data, in meters. Can be used to increase the effect of points or lines etc. Should be used with caution and with a good reason, as it can twist the results. - **Example**: 5 ```{warning} Use only with good reason, know what you are doing. ``` ### data_column - **Type**: string - **Required**: mandatory (vector) | not used for raster - **Explanation**: The name of the data field "column" in the data source. - **Example**: db_hi ### no_data_value - **Type**: integer | float - **Required**: optional - **Explanation**: Value to be used for no data values (no exposure raster for segment found). If this is given, the segments with no data value do not get any good or bad weighting from exposure data sources that are not found for them. - **Examples**: 0.0, 1 - **Note**: Set this if the data has some specific value for no_data, e.g. -999. The no data will be filtered out and not used for routing or analysing exposure. ### layer_name - **Type**: string - **Required**: optional (vector), recommended - **Explanation**: For vector data that might have multiple layers (e.g. GPKG), the name of the layer. If not given, will take first layer if only one layer available. Otherwise will cause error. - **Example**: comb_gvi ### raster_cell_resolution - **Type**: integer | float - **Required**: mandatory (vector), optional (raster) - **Explanation**: The resolution (in meters) that the exposure raster will have. If this is given to raster data source, will reproject to this cell resolution. - **Example**: 20 ### save_raster_file - **Type**: boolean - **Required**: optional - **Explanation**: Decides if the exposure raster should be saved to cache for inspections etc. - **Default**: False - **Example**: True ### custom_processing_function - **Type**: string - **Required**: optional, experimental - **Explanation**: Experimental: if a data set needs some pre-pre-processing, a function needs to be manually written to globals in custom_functions.py and this given the name. It is recommended to process the exposure data sources so that no pre-pre-processin is needed. This is mainly done for AQI .nc data for Helsinki. ## Data sources YAML Group Examples **Example of the Data sources group configurations, with mandatory configurations.** **Dismissing all the optional "key: value" fields.** *gvi_lines is vector, aqi is raster data* *note that some configuration fields are needed for vector but not raster e.g. raster_cell_resolution* ```yaml data_sources: - name: 'gvi_lines' filepath: /user_data_dir/data/gvi_lines.shp data_column: Comb_GVI no_data_value: 0 min_data_value: 0.0 max_data_value: 97.9 good_exposure: True raster_cell_resolution: 10 original_crs: 3879 - name: "aqi" filepath: /user_data_dir/data/aqi.nc original_crs: 4326 data_column: AQI no_data_value: 1 min_data_value: 1 max_data_value: 5 good_exposure: False ``` **Example of the data sources group configurations with optional configurations** ```yaml data_sources: - name: 'gvi_lines' filepath: /user_data_dir/data/gvi_lines.shp data_type: vector data_buffer: 10 save_raster_file: True data_column: Comb_GVI no_data_value: 0 min_data_value: 0.0 max_data_value: 97.9 good_exposure: True raster_cell_resolution: 10 original_crs: 3879 - name: "aqi" filepath: /user_data_dir/data/gvi_lines.shp data_type: raster original_crs: 4326 data_column: AQI no_data_value: 1 min_data_value: 1 max_data_value: 5 good_exposure: False raster_cell_resolution: 10 save_raster_file: True custom_processing_function: convert_aq_nc_to_tif_and_scale_offset ``` --- ## Routing Group Routing group configures the routing settings. Name in YML: routing ### transport_mode - **Type**: string - **Required**: mandatory - **Explanation**: travelling mode. Options: walking, cycling. - **Example**: walking ### travel_speed - **Type**: integer | float - **Required**: mandatory - **Explanation**: define travelling speed in km/h. - **Example**: 5.5 - **Defaults**: 5.0 (walking), 15.0 (cycling) ### od_crs - **Type**: integer - **Required**: mandatory - **Explanation**: CRS of the origin destination (OD) files. Both need to be in same CRS. - **Example**: 3879 ### origins - **Type**: string - **Required**: mandatory - **Explanation**: filepath to the origin(s) file. Can be in filetypes: gpkg, shp, csv. Csv needs od_lon_name, od_lat_name. - **Example**: user_folder/origins.shp ### destinations - **Type**: string - **Required**: mandatory - **Explanation**: filepath to the destination(s) file. Can be in filetypes: gpkg, shp, csv. Csv needs od_lon_name, od_lat_name. - **Example**: user_folder/destinations.shp ### od_lon_name - **Type**: string - **Required**: mandatory only for csv OD file, otherwise optional. - **Explanation**: name of the longitude column in the OD csv. - **Example**: lon ### od_lat_name - **Type**: string - **Required**: mandatory only for csv OD file, otherwise optional. - **Explanation**: name of the latitude column in the OD csv. - **Example**: lat ### precalculate - **Type**: boolean - **Required**: optional - **Explanation**: defines if segment weights should be precalculated to the network before routing. Using precalculate should be faster, especially for larger calculations. If this is False, will calculate segment costs while routing. - **Example**: False - **Default**: True ### exposure_parameters - **Type**: dictionary - **Required**: optional - **Explanation**: defines the individual settings for each exposure data source. Fields: name, sensitivity, allow_missing_data. Needs list(s) of dicts, see the Routing YAML examples. - **Example**: - name: gvi_lines sensitivity: 2.5 - name: aqi sensitivity: 2.5 allow_missing_data: false - **Default**: allow_missing_data = True ```{hint} **name**: needs to be the same as in data sources **sensitivity**: this is the weight which is used in formula to weighten the exposure factor derived from exposure data. Formula: traversal time + (traversal time * sensitivity * exposure factor). All exposure factors will be normalized between 0-1 and for positive exposures, made negative. **allow_missing_data**: Experimental feature, if set to False, will crash the route finding if any segment does not have exposure value. Most likely should not be used!!! Default is True. ``` ```{attention} Every exposure data source needs to be given name and sensitivity. If exposure results are wanted from some paths, but that data source is not wanted to include in the path optimization that data sources sensitivity should then be set to 0. e.g. user want to find air quality optimized paths, but would also like to know the amount of greenery, but only want to route based on air quality. Setting greenery (and other possible exposure datasource) sensitivity to 0. ``` ## Routing YAML Group Examples **Example of the Routing configurations, with mandatory configurations and using SHP OD's.** **Dismissing all the optional "key: value" fields.** *note that this example is using gpkg OD's* ```yaml routing: transport_mode: walking origins: /user_folder/some_origins_point(s).gpkg destinations: /user_folder/some_destination(s)_points.gpkg od_crs: 27700 exposure_parameters: - name: gvi_lines sensitivity: 1.5 - name: aqi sensitivity: 1.25 ``` ```{hint} Using relatively small sensitivities (weights) produced the most optimal exposure routes, some even too optimal, neclecting time too much. "Best" results were gained with 1.5, 2.5 and 5 sensitivities (weights). Using too large sensitivities (weigths) e.g. 10, 20 decreased the positive exposure so much that all segments got cheap. Read more from documentation section (and thesis). ``` *note that this example is using csv OD's, so need to define od_lon_name and od_lat_name* ```yaml routing: transport_mode: cycling travel_speed: 5 precalculate: True od_lon_name: long od_lat_name: lat origins: /user_folder/some_origins_point(s).gpkg destinations: /user_folder/some_destination(s)_points.gpkg od_crs: 27700 exposure_parameters: - name: gvi_lines sensitivity: 1.5 allow_missing_values: False - name: aqi sensitivity: 2.5 ``` --- ## Analysing Group Analysing group configures the last module of analysing results settings. Name in YML: analysing ### keep_geometry - **Type**: boolean - **Required**: optional - **Explanation**: Defines if geometries should be included in the final results. If they are, final output file will be .gpgk, if not it will be .csv. - **Default**: False - **Example**: True ```{warning} Taking geometries to masscalculations will take more time and the final file more memory! ``` ### save_output_name - **Type**: string - **Required**: optional - **Explanation**: Custom name for the final output file. - **Default**: "output_results_[time_of_finnish]" - **Example**: london_routes_greenery_lit ### cumulative_ranges - **Type**: dictionary - **Required**: optional - **Explanation**: Custom ranges to divide the results and save to final output as a new column/field. Needs the main dict (header) of cumulative_ranges, should have data sources names as dicts and ranges as list of lists, see the Analysing YAML examples. - **Example**: gvi_lines: - [0,10] - [10.01, 20] - [20.01, 50] aqi: - [0, 0.99] - [1, 1.99] - [2, 2.99] - [3, 3.99] - [4, 5] ```{attention} Exposure data source names need to be exactly the same as defined earlier. ``` ## Analysing YAML Group Examples **Example of the Analysing group configurations, it only has optional parameters.** **Dismissing all the optional "key: value" fields.** ```yaml analysing: keep_geometry: True save_output_name: london_routes_greenery_fam cumulative_ranges: gvi_lines: - [0,10] - [10.01, 20] - [20.01, 50] aqi: - [0, 0.99] - [1, 1.99] - [2, 2.99] - [3, 3.99] - [4, 5] ``` --- ## Complete User Configuration YAML example Here is full example of filled user/config.yaml. This configuration is using vector gvi_lines, and raster aqi data sets. All data will be reprojected to project_crs of 3879. Exposure raster from gvi_lines will be created of 10m pixel cell resolution, aqi raster will be reprojected to match this with 10m resolution. The route finding will use walking with speed 5 km/h. It will prefer and weight the greenery gvi_lines values little more than the aqi. The weights for segments will be precalculated as there seems to be thousands of OD points. The geometries will not be kept for such large masscalculations. The resulting exposures will be grouped to cumulative ranges. ```{attention} Note the importance of the correct intendations! ``` *user/config.yaml* ```yaml project: project_crs: 3879 osm_network: osm_pbf_file_path: /user_data_dir/osm/hki.osm.pbf original_csr: 4326 data_sources: - name: 'gvi_lines' filepath: /user_data_dir/data/gvi_lines.shp data_type: vector # optional data_buffer: 10 save_raster_file: True # optional data_column: Comb_GVI no_data_value: 0 min_data_value: 0.0 max_data_value: 97.9 good_exposure: True raster_cell_resolution: 10 original_crs: 3879 - name: "aqi" filepath: /user_data_dir/data/gvi_lines.shp data_type: raster # optional original_crs: 4326 data_column: AQI no_data_value: 1 min_data_value: 1 max_data_value: 5 good_exposure: False raster_cell_resolution: 10 # optional save_raster_file: True # optional custom_processing_function: convert_aq_nc_to_tif_and_scale_offset # optional routing: transport_mode: walking travel_speed: 5 origins: /user_folder/thousands_origins_point(s).gpkg destinations: /user_folder/thousands_destination(s)_points.gpkg od_crs: 27700 exposure_parameters: - name: gvi_lines sensitivity: 1.5 - name: aqi sensitivity: 1.25 analysing: keep_geometry: False save_output_name: example_routes_greenery_airquality cumulative_ranges: gvi_lines: - [0,10] - [10.01, 20] - [20.01, 50] aqi: - [0, 0.99] - [1, 1.99] - [2, 2.99] - [3, 3.99] - [4, 5] ```