Development of a Web GIS for small-scale detection and analysis of COVID-19 (SARS-CoV-2) cases based on volunteered geographic information for the city of Cologne, Germany, in July/August 2020

The development of applications for data acquisition/analysis with the help of open-source products and open data has been possible without any challenges using the aforementioned frameworks, languages, and components. When implementing these applications, a large amount of data protection and ethical considerations must be taken into account. IT security plays a significant role in this process. From the viewpoint of data protection, VGI generally already makes a contribution through its voluntary nature because users decide on their own responsibility whether and which data they want to contribute. Infected persons can specify all the buildings they visited, but they do not have to. For example, one’s home or workplace may be places that the participants do not want to specify. This may be a variation of the Not in my Backyard phenomenon, as described by Engler et al. [22]. Simultaneously, the data collection application offers the possibility of anonymously or optionally contributing data via pseudonyms. However, this data protection aspect is offset by the loss of information. If data records are collected anonymously or via a pseudonym, the total number of individuals who contributed data can no longer be traced. It is, therefore, not possible to state how many people have updated data records. Furthermore, it is no longer possible to determine the validity of certain data. If a single, identical building is repeatedly selected anonymously, these incorrect contributions distort the overall result. Here, an additional control instance has to be considered, for example, for each browser session, each house ring can be selected only once. Alternatively, captchas can provide a remedy, but they reduce user experience and may not be suitable in disaster situations. Many requirements of the GDPR were considered during development (data minimization, integrity, non-concatenation, transparency, availability, and confidentiality). The requirement formulated therein for intervenability, for example, a possibility to permanently delete all contributions or to correct information, has not been implemented in the context of the work, but should definitely be considered in a new or further development of the application. For this purpose, new functions must be added to the code. However, with the help of OpenLayers, it is possible to provide appropriate workflows and implement transactions for updating and deleting records. This can be done based on WFS services that retrieve data from the Postgres database tables. This basis is already provided by the current development of applications for data collection. However, the anonymization or pseudonymization procedure implemented in this study can be problematic in this context. Anonymous contributions cannot be assigned to any user, so subsequent updates of the data are impossible. The same applies when the pseudonym is changed. The development of a registration or login system can provide a remedy. However, this involves increased programming efforts and further data protection issues, such as secure and encrypted storage of access data. Simultaneously, registration processes can reduce the willingness to contribute data as this creates additional work, and login data are stored permanently. It also contradicts the terms of confidentiality, data minimization, and non-concatenation. In general, there are no explicit guidelines for the implementation of data protection and ethical or legal measures. Therefore, it may be an option to develop appropriate guidelines for the implementation of corresponding measures in the development of VGI applications. The publication of DSK [2] used in this work, which contains exemplary measures, can serve as a guide here. However, these are not specified for VGI platforms. Another point is that many house perimeters of infected persons may not be recorded because they are regarded as private spatial units by the persons concerned. It could be helpful to provide such an application directly to an institution. In addition, access should be enabled only for those affected or infected in the event of illness. The inclusion of a state institution could increase acceptance and increase the user’s willingness to contribute relevant data. This is supported by Flanagin and Metzger’s [26] comments that traditional sources of geographic information such as government institutions, cartographers, or other public institutions are usually considered as credible because they are recognized authorities. Chen et al. [15] also note that a framework provided by the government can contribute to quality assurance, for example, by providing an application or platform to collect the information or even official geodata as a basis. In the event of a disaster, the public can use this framework to quickly collect and analyze data [15]. The basic dataset of house perimeters chosen in this work provides high coverage for places where people have stayed inside a building. However, it does not adequately represent spaces that cannot be mapped in the form of a house perimeter, such as streets, train stations, parks, public squares, and other outdoor locations. The dataset of the district government of Cologne has a high spatial coverage for buildings in the city area but does not guarantee the inclusion of every house perimeter in Cologne. For example, in the city center near Barbarossaplatz, there is a large gap in the dataset (Figs. 10 and 11), as revealed by mapping. Furthermore, certain buildings that could be grouped together as a complex, such as Cologne’s main train station (Fig. 12), are shown in individual polygons. Along with this, it can be helpful to extend the application so that it is possible to select and process several house perimeters simultaneously. Thus, fewer transactions between the client and server are necessary because the update is performed with one request. This also makes the input easier for the user. Simultaneously, no function has yet been implemented to allow post-processing and correction of incorrect records. The house perimeters can be assigned to a building function with the help of the ATKIS object-type catalog. Twenty-six (68.42%) of the 38 house perimeters fall into the residential category. The dominance of residential units can be explained by the fact that since the onset of the pandemic and prescribed measures by the federal government, some activities have been limited to the home. Numerous workers do not necessarily have to go to an office building but can work from home. In addition, nonessential travel can be avoided. Hotels have been closed at times, so they have also been selected infrequently. Less than 50% of all house perimeters are assigned to the residential or residential classes with trade and services in the initial data set, so there is generally a dominance of these categories. However, the classification also reveals weaknesses in the data collection. No attention has been paid to the verticality of buildings. It is not apparent whether a contribution to a residential building with trade and services is related to the residential or service sector. Consequently, the information is lost. Only 12 (31.58%) of the selected house perimeters were purely residential buildings, while 14 (36.84%) also contained trade or service facilities. For the PLZ-8 data, only the corresponding geometries are available. For further research, the associated socioeconomic attributes are particularly important, as well as in connection with building classification. From this aspect, spatial patterns could possibly be derived in the context of VGI-based data collection. To increase transparency, an application for automated data evaluation was developed in this study. This provides data via the REST interfaces, which are immediately visualized. This corresponds to the requirements of Sula [54] that the results should be provided in the form of public channels [54]. For higher transparency, the implementation of a higher number of endpoints is discussed. In particular, geospatial data can be provided increasingly via the API. In addition to the endpoint that provides district-level case counts in the GeoJSON format in this work, interfaces could be implemented in the same way to output district-level or ZIP code-level case counts. With GeoJSON output, authorized administrative bodies could visualize the data independently in a GIS for their own purposes. In general, the output of a wide variety of information via an API is conceivable. Any query that can be made via SQL in Postgres can also be implemented as a REST interface. However, as already mentioned, data protection aspects must be considered. For this, the consent of the users would have to be obtained in advance at the time of collection of data from third parties. This was not the case in this study. Furthermore, notably, no personal data are provided in the API, and it is not possible to draw conclusions about individuals’ data. For example, an interface that outputs all visited house perimeters is unsuitable. It should also be noted that an API and its application must always be comprehensively documented. The developed API offers various interfaces that provide automated results for spatial distribution, among other things. However, in its current alpha version, it is not usable for outsiders. As part of the documentation, all the interfaces must be explained in detail. For example, users have to be able to see which endpoints output which data with which attributes, which requests to the API are generally possible (GET, POST, etc.), or what must be considered for potential transactions, for example, data type and field length in the database. Therefore, the source code must be extended. Especially in implementations of interfaces for update or delete operations, input parameters have to be checked for their validity on the client and server side, such that manipulations by SQL injection are not possible. The modules used in the development provide functions that perform such checks and can be extended if required. In this work, an application for automated data evaluation was developed, especially for transparency purposes. Simultaneously, various authors call for results to be made available via open channels. However, the app for data collection does not refer to open data evaluation. On the one hand, this is due to IT security reasons, and, on the other hand, the automatic evaluation is only to be tested in this work for the time being. It is possible that this approach has contributed to the fact that some visitors to the site were deterred from participating, but they would have been willing to contribute data if they had also been given direct access to results, for example, in the form of the developed API with node.js. In the context of the application for data evaluation, manual corrections may also be necessary, or the SQL query for determining the spatial position of a house perimeter may have to be revised or supplemented. Figure 13 shows a house perimeter that is not clearly located within an area of the Urban Atlas. A pure query using the Postgres function ST_WITHIN is not sufficient here. The house perimeter was not assigned to the polygon of the urban atlas. Alternatively, instead of using further Postgres functions like ST_INTERSECTS, each building can also be represented with the help of a point, for example, the center of the house perimeter polygon, so that a pure mapping via ST_WITHIN would be possible here.

Fig. 10

(Screenshot from 14.09.2020)

Data gap in the initial data set. Blue: initial data, red: missing data.

Fig. 11

(Own photo from 20.09.2020)

Data gap in the initial data set II.

Fig. 12

(Screenshot from 05.07.2020)

Selected house perimeter of the Cologne Hbf on an iPhone SE.

Fig. 13

(Screenshot from 07.09.2020)

Difficulty in determining the position of a house perimeter due to overlap with a polygon of the Urban Atlas.

For this, however, further tables must be created in the database or the point representations of a house perimeter must be calculated within the SQL query. In this work, because of the problem of an insufficient SQL query, manual corrections were necessary to determine the correct case numbers. Thus, although the application for automated analysis is functional and provides a visualization of the results, the SQL queries of the endpoints need to be revised so that a correct output of the data is produced. This is possible, for example, using the previously described methodology. Another possible extension that enhances the application is the implementation of notifications when two users have selected the same house ring. In this way, it is directly visible where contact might have existed. At the same time, however, data protection aspects have to be considered, so that it is not possible for others to know who the contact person is. During data collection, users updated several house perimeters twice or thrice within one browser session. Therefore, it may be necessary to improve the user feedback or revise the instructions for recording the data and add the attribute “Number of visits”. This work does not employ an approach that uses Bluetooth technology described in Chapter 2. The full use of Bluetooth functionalities, as in the Corona-Warn-App, has so far been reserved for native apps. However, developers are actually working on implementing a Bluetooth API for browsers. However, these are experimental features [5]. These could be relevant in the future so that users with operating systems other than iOS or Android are not prevented from using them. This would make it possible for at-risk patients with Windows or Huawei operating systems to use the app. The cost factor is of little importance here, all common browsers use JS to manipulate content, so there is no duplication of development work. Furthermore, the extent to which the tracking of the web server (in this work, Nginx) is considered in the data evaluation or to what extent this is restricted or regulated in advance must be taken into account. In the context of data protection, logging of the web server is essential to ensure security for the application and to identify requests to the server with the intention of tapping information or access data and to block these IP addresses. In contrast, the standard logging of the web server also allows conclusions to be drawn about an individual IP address. Access times or frequencies of page calls and requests to the API to generate new tokens or the origin of the application call can be traced. For example, in this work, from the logs, we can trace that users accessed the URL from WhatsApp, Instagram, or Facebook, which are popular social networks in Germany. The applications developed can only be used to a limited extent for contact tracking. Although it is possible to see whether different pseudonyms have updated the same data set and thus whether there was potential contact, it is not possible to clearly determine whether the same person may have updated the same house ring under several pseudonyms and thus falsify the data collection. Furthermore, time was not considered. It is possible that there is a period of several weeks or months between visits to a building by two people, so that the contact cannot be classified as relevant. The time factor, therefore, must be considered in the further development of the application, for example, by specifying the time when the building was visited during processing. Here again, data protection aspects must be considered. If necessary, detailed information about a certain point in time can narrow the group of people, so that it may be possible to draw conclusions about individuals. If the time factor is considered, double or multiple entries of buildings can also be meaningful and plausible, for example, if a building is repeatedly visited on a daily basis. Furthermore, it is difficult to make a statement regarding the validity of the collected data, which is a clear weakness of VGI in the context of health-related, personal data. The dataset can be cleaned for duplicate or multiple selections because of the verification mechanisms used, but it is not possible to verify whether a participant was actually infected or visited a selected building. In other VGI projects, such as the OSM, contributions are always verifiable for third parties. For example, if paths or buildings are digitized, others can see whether they actually exist. However, cases of disease are not physically represented in the landscape; thus, no visual inspection can be performed. This goes hand-in-hand with ethical problems. If measures are derived from such an application, but a large part of the input data is based on false assertions, these are not target-oriented and the credibility of the application (i.e., data integrity) is not given. No valid statement can be made regarding the actual distribution of cases of the disease in urban areas. In general, consideration should be given to supplementing the methods used with additional methods, especially qualitative methods. An example for this could be interviews with test persons who use the tool and voluntarily agree to answer questions about the developed applications. In particular, helpful user experiences can be collected to identify weaknesses.

View original article

INTERNATIONAL JOURNAL OF HEALTH GEOGRAPHICS

分享书签

0 0 0 0 0 0 0

More from this channel

Development of a Web GIS for small-scale detection and analysis of COVID-19 (SARS-CoV-2) cases based on volunteered geographic information for the city of Cologne, Germany, in July/August 2020

留言 (0)