User Configurations
The user configuration: config.yaml can be found in GP2 root under /user/ directory.
The user configurations is the place where to configure Green Paths 2.0. It uses YML fileformat. For more on YML syntax see YML Basics.
User configuratinos can be thought to belong into five groups (YML dictionaries):
project
osm_network
data_sources
routing
analysing
These groups names need to be included in the user configurations, and the names need to be the ones described here! Example of the groups (YML dictionaries) and formatting:
project:
some_project_key: 123
osm_network:
some_osm_key: "123"
data_sources:
- some_data_source_name: "some name"
some_data_source_key: 123
- some_data_source_name2: "some name 2"
some_data_source_key2: 456
routing:
some_routing_key: 123
analysing:
some_analysing_key: True
Attention
When not using optional configuration e.g. data_buffer, remove both, the key and the value!
Hint
Remember to only choose one if multiple examples given here for the user configurations!
Tip
All strings can be written with or without apostrophes / quotation marks e.g. aqi, ‘aqi’ or “aqi”.
User Configuration Validation and Descriptor
When filling the user configurations user can use Descriptor command to describe the exposure data sources etc. NOTE: the Descriptor currently demands filled and valid user configurations, so it should be improved!
Users should use “validate” command to check the validity of user configurations, when filling it.
See User Inteface Commands for more on using these commands.
Project Group
Project group configures the project wide settings.
Groups name in YML: project
project_crs
Type: integer
Required: mandatory
Explanation: The coordinate reference system (CRS) code to which all spatial data will be reprojected to. Needs metric unit CRS projection, not degrees.
Examples: 3879
Warning
CRS should be projected and using meters as units, not degrees.
datas_coverage_safety_percentage
Type: integer | float
Required: optional
Explanation: percentage of which all exposure data needs to cover on the OSM road segments. Is calculated by simply dividing covered segments by all segments in OSM pbf. The extent of OSM street network will thus affect the coverage percentage.
Default: 33
Example: 50
Project YAML Group Examples
Example of the project group configurations, with mandatory configurations. Dismissing all the optional “key: value” fields.
project:
project_crs: 3879
Example of the project group configurations
project:
project_crs: 3879
datas_coverage_safety_percentage: 75
OSM Network Group
OSM Network configures the OSM PBF settings.
Name in YML: osm_network
osm_pbf_file_path
Type: string
Required: mandatory
Explanation: file path to the OSM pbf file.
Examples: user_data_dir/osm/hki.osm.pbf
Hint
Filepath can be relative if file located within this project root directory. Otherwise use absolute path.
original_crs
Type: integer
Required: mandatory
Explanation: The original CRS code of the OSM network. This will be transformed to project crs, if not the same.
Examples: 4326
segment_sampling_points_amount
Type: integer
Required: optional
Explanation: force a segment sampling points amount. If not given, the sampling points will be created based on the length of the segment (recommended).
Default: Generated by using segment length and raster cell size.
Examples: 5
Hint
Recomended not to use this and go with the default length based value, unless good reason!
Example of the osm_network group configurations, with mandatory configurations. Dismissing all the optional “key: value” fields.
osm_network:
osm_pbf_file_path: /user_data_dir/osm/hki.osm.pbf
original_csr: 4326
Example of the osm network group configurations with optional configurations
osm_network:
osm_pbf_file_path: /user_data_dir/osm/hki.osm.pbf
original_csr: 4326
segmented_sampling_points_amount: 10
Data Sources
Data Sources configures the exposure data sources and their individual settings.
These items are YML list items, so they start with character “-”. There can be 1-n data sources. See example below.
Groups name in YML: data_sources
name
Type: string
Required: mandatory
Explanation: name for the exposure data source, can be anything but needs to be the same for the same data throught out user configurations. Prefer short names!
Example: aqi
Warning
Data source name needs to be the same in routing and analysing configurations.
filepath:
Type: string
Required: mandatory
Explanation: filepath to the data.
Example: /user_data_dir/data/gvi_green.shp
Hint
Filepath can be relative if file located within this project root directory. Otherwise use absolute path.
data_type
Type: string
Required: optional, recommended
Explanation: Data type of the exposure data, can be: “raster” or “vector”.
Example: raster
Hint
This field is not mandatory, data type will be determined from file name if not given. Recomended to give for more robust and solid execution.
original_crs
Type: integer
Required: mandatory
Explanation: Th original CRS code of the data. This will be transformed to project crs, if not the same.
Examples: 3879
min_data_value
Type: integer | float
Required: mandatory
Explanation: The theoretical minimum value of the data source.
Examples: 0.0, 1
Hint
Use theoretical value so that the data does not scew the results!
max_data_value
Type: integer | float
Required: mandatory
Explanation: The theoretical maximum value of the data source.
Examples: 5, 97.9
Hint
Use theoretical value so that the data does not scew the results!
good_exposure
Type: boolean
Required: mandatory
Explanation: Determines if the exposure values are treated as positive (bad) or negative (good) weights for the road segments. Adding cost to segments makes it more expensive and vice versa.
Examples: True
Hint
Make sure this is correct! True means good exposure like greenery (decreasing traversal cost), False means bad exposure like noise or air quality (increasing traversal cost).
Warning
As air quality should be bad exposure, but low value in air quality e.g. 1.2 (form 1-5 scale) is actually clean air, but it will be slightly penalized, compared to segments that do not have any value. This should only be problem with sparse exposure data.
data_buffer
Type: integer | float
Required: optional
Explanation: Isotropic buffer for vector data, in meters. Can be used to increase the effect of points or lines etc. Should be used with caution and with a good reason, as it can twist the results.
Example: 5
Warning
Use only with good reason, know what you are doing.
data_column
Type: string
Required: mandatory (vector) | not used for raster
Explanation: The name of the data field “column” in the data source.
Example: db_hi
no_data_value
Type: integer | float
Required: optional
Explanation: Value to be used for no data values (no exposure raster for segment found). If this is given, the segments with no data value do not get any good or bad weighting from exposure data sources that are not found for them.
Examples: 0.0, 1
Note: Set this if the data has some specific value for no_data, e.g. -999. The no data will be filtered out and not used for routing or analysing exposure.
layer_name
Type: string
Required: optional (vector), recommended
Explanation: For vector data that might have multiple layers (e.g. GPKG), the name of the layer. If not given, will take first layer if only one layer available. Otherwise will cause error.
Example: comb_gvi
raster_cell_resolution
Type: integer | float
Required: mandatory (vector), optional (raster)
Explanation: The resolution (in meters) that the exposure raster will have. If this is given to raster data source, will reproject to this cell resolution.
Example: 20
save_raster_file
Type: boolean
Required: optional
Explanation: Decides if the exposure raster should be saved to cache for inspections etc.
Default: False
Example: True
custom_processing_function
Type: string
Required: optional, experimental
Explanation: Experimental: if a data set needs some pre-pre-processing, a function needs to be manually written to globals in custom_functions.py and this given the name. It is recommended to process the exposure data sources so that no pre-pre-processin is needed. This is mainly done for AQI .nc data for Helsinki.
Data sources YAML Group Examples
Example of the Data sources group configurations, with mandatory configurations. Dismissing all the optional “key: value” fields.
gvi_lines is vector, aqi is raster data note that some configuration fields are needed for vector but not raster e.g. raster_cell_resolution
data_sources:
- name: 'gvi_lines'
filepath: /user_data_dir/data/gvi_lines.shp
data_column: Comb_GVI
no_data_value: 0
min_data_value: 0.0
max_data_value: 97.9
good_exposure: True
raster_cell_resolution: 10
original_crs: 3879
- name: "aqi"
filepath: /user_data_dir/data/aqi.nc
original_crs: 4326
data_column: AQI
no_data_value: 1
min_data_value: 1
max_data_value: 5
good_exposure: False
Example of the data sources group configurations with optional configurations
data_sources:
- name: 'gvi_lines'
filepath: /user_data_dir/data/gvi_lines.shp
data_type: vector
data_buffer: 10
save_raster_file: True
data_column: Comb_GVI
no_data_value: 0
min_data_value: 0.0
max_data_value: 97.9
good_exposure: True
raster_cell_resolution: 10
original_crs: 3879
- name: "aqi"
filepath: /user_data_dir/data/gvi_lines.shp
data_type: raster
original_crs: 4326
data_column: AQI
no_data_value: 1
min_data_value: 1
max_data_value: 5
good_exposure: False
raster_cell_resolution: 10
save_raster_file: True
custom_processing_function: convert_aq_nc_to_tif_and_scale_offset
Routing Group
Routing group configures the routing settings.
Name in YML: routing
transport_mode
Type: string
Required: mandatory
Explanation: travelling mode. Options: walking, cycling.
Example: walking
travel_speed
Type: integer | float
Required: mandatory
Explanation: define travelling speed in km/h.
Example: 5.5
Defaults: 5.0 (walking), 15.0 (cycling)
od_crs
Type: integer
Required: mandatory
Explanation: CRS of the origin destination (OD) files. Both need to be in same CRS.
Example: 3879
origins
Type: string
Required: mandatory
Explanation: filepath to the origin(s) file. Can be in filetypes: gpkg, shp, csv. Csv needs od_lon_name, od_lat_name.
Example: user_folder/origins.shp
destinations
Type: string
Required: mandatory
Explanation: filepath to the destination(s) file. Can be in filetypes: gpkg, shp, csv. Csv needs od_lon_name, od_lat_name.
Example: user_folder/destinations.shp
od_lon_name
Type: string
Required: mandatory only for csv OD file, otherwise optional.
Explanation: name of the longitude column in the OD csv.
Example: lon
od_lat_name
Type: string
Required: mandatory only for csv OD file, otherwise optional.
Explanation: name of the latitude column in the OD csv.
Example: lat
precalculate
Type: boolean
Required: optional
Explanation: defines if segment weights should be precalculated to the network before routing. Using precalculate should be faster, especially for larger calculations. If this is False, will calculate segment costs while routing.
Example: False
Default: True
exposure_parameters
Type: dictionary
Required: optional
Explanation: defines the individual settings for each exposure data source. Fields: name, sensitivity, allow_missing_data. Needs list(s) of dicts, see the Routing YAML examples.
Example: - name: gvi_lines sensitivity: 2.5 - name: aqi sensitivity: 2.5 allow_missing_data: false
Default: allow_missing_data = True
Hint
name: needs to be the same as in data sources
sensitivity: this is the weight which is used in formula to weighten the exposure factor derived from exposure data. Formula: traversal time + (traversal time * sensitivity * exposure factor). All exposure factors will be normalized between 0-1 and for positive exposures, made negative.
allow_missing_data: Experimental feature, if set to False, will crash the route finding if any segment does not have exposure value. Most likely should not be used!!! Default is True.
Attention
Every exposure data source needs to be given name and sensitivity. If exposure results are wanted from some paths, but that data source is not wanted to include in the path optimization that data sources sensitivity should then be set to 0.
e.g. user want to find air quality optimized paths, but would also like to know the amount of greenery, but only want to route based on air quality. Setting greenery (and other possible exposure datasource) sensitivity to 0.
Routing YAML Group Examples
Example of the Routing configurations, with mandatory configurations and using SHP OD’s. Dismissing all the optional “key: value” fields.
note that this example is using gpkg OD’s
routing:
transport_mode: walking
origins: /user_folder/some_origins_point(s).gpkg
destinations: /user_folder/some_destination(s)_points.gpkg
od_crs: 27700
exposure_parameters:
- name: gvi_lines
sensitivity: 1.5
- name: aqi
sensitivity: 1.25
Hint
Using relatively small sensitivities (weights) produced the most optimal exposure routes, some even too optimal, neclecting time too much.
“Best” results were gained with 1.5, 2.5 and 5 sensitivities (weights). Using too large sensitivities (weigths) e.g. 10, 20 decreased the positive exposure so much that all segments got cheap. Read more from documentation section (and thesis).
note that this example is using csv OD’s, so need to define od_lon_name and od_lat_name
routing:
transport_mode: cycling
travel_speed: 5
precalculate: True
od_lon_name: long
od_lat_name: lat
origins: /user_folder/some_origins_point(s).gpkg
destinations: /user_folder/some_destination(s)_points.gpkg
od_crs: 27700
exposure_parameters:
- name: gvi_lines
sensitivity: 1.5
allow_missing_values: False
- name: aqi
sensitivity: 2.5
Analysing Group
Analysing group configures the last module of analysing results settings.
Name in YML: analysing
keep_geometry
Type: boolean
Required: optional
Explanation: Defines if geometries should be included in the final results. If they are, final output file will be .gpgk, if not it will be .csv.
Default: False
Example: True
Warning
Taking geometries to masscalculations will take more time and the final file more memory!
save_output_name
Type: string
Required: optional
Explanation: Custom name for the final output file.
Default: “output_results_[time_of_finnish]”
Example: london_routes_greenery_lit
cumulative_ranges
Type: dictionary
Required: optional
Explanation: Custom ranges to divide the results and save to final output as a new column/field. Needs the main dict (header) of cumulative_ranges, should have data sources names as dicts and ranges as list of lists, see the Analysing YAML examples.
Example: gvi_lines: - [0,10] - [10.01, 20] - [20.01, 50] aqi: - [0, 0.99] - [1, 1.99] - [2, 2.99] - [3, 3.99] - [4, 5]
Attention
Exposure data source names need to be exactly the same as defined earlier.
Analysing YAML Group Examples
Example of the Analysing group configurations, it only has optional parameters. Dismissing all the optional “key: value” fields.
analysing:
keep_geometry: True
save_output_name: london_routes_greenery_fam
cumulative_ranges:
gvi_lines:
- [0,10]
- [10.01, 20]
- [20.01, 50]
aqi:
- [0, 0.99]
- [1, 1.99]
- [2, 2.99]
- [3, 3.99]
- [4, 5]
Complete User Configuration YAML example
Here is full example of filled user/config.yaml. This configuration is using vector gvi_lines, and raster aqi data sets. All data will be reprojected to project_crs of 3879. Exposure raster from gvi_lines will be created of 10m pixel cell resolution, aqi raster will be reprojected to match this with 10m resolution.
The route finding will use walking with speed 5 km/h. It will prefer and weight the greenery gvi_lines values little more than the aqi. The weights for segments will be precalculated as there seems to be thousands of OD points.
The geometries will not be kept for such large masscalculations. The resulting exposures will be grouped to cumulative ranges.
Attention
Note the importance of the correct intendations!
user/config.yaml
project:
project_crs: 3879
osm_network:
osm_pbf_file_path: /user_data_dir/osm/hki.osm.pbf
original_csr: 4326
data_sources:
- name: 'gvi_lines'
filepath: /user_data_dir/data/gvi_lines.shp
data_type: vector # optional
data_buffer: 10
save_raster_file: True # optional
data_column: Comb_GVI
no_data_value: 0
min_data_value: 0.0
max_data_value: 97.9
good_exposure: True
raster_cell_resolution: 10
original_crs: 3879
- name: "aqi"
filepath: /user_data_dir/data/gvi_lines.shp
data_type: raster # optional
original_crs: 4326
data_column: AQI
no_data_value: 1
min_data_value: 1
max_data_value: 5
good_exposure: False
raster_cell_resolution: 10 # optional
save_raster_file: True # optional
custom_processing_function: convert_aq_nc_to_tif_and_scale_offset # optional
routing:
transport_mode: walking
travel_speed: 5
origins: /user_folder/thousands_origins_point(s).gpkg
destinations: /user_folder/thousands_destination(s)_points.gpkg
od_crs: 27700
exposure_parameters:
- name: gvi_lines
sensitivity: 1.5
- name: aqi
sensitivity: 1.25
analysing:
keep_geometry: False
save_output_name: example_routes_greenery_airquality
cumulative_ranges:
gvi_lines:
- [0,10]
- [10.01, 20]
- [20.01, 50]
aqi:
- [0, 0.99]
- [1, 1.99]
- [2, 2.99]
- [3, 3.99]
- [4, 5]