Analyzing third-party data leaks on online pharmacy websites

4.1 Manual network traffic analysis

The third-party services found on the 48 online pharmacy websites in our manual network analysis are shown in Table 2. Websites of 4 pharmacies did not have third-party services. These pharmacies have been omitted from the table. Pharmacy 26 had 6 unique third-party services, while several other pharmacy websites included 3 services. These numbers can be considered large as we only followed a test sequence which consisted of a few different pages processing sensitive data. Google was the most frequently used third-party service (35 occurrences), followed by Pingdom (20), a Swedish website monitoring tool, and Facebook (13). A noteworthy detail is that we have excluded Giosg, which appeared 9 times, from this table. This is not an analytics service like other third parties but a live chat service based in Finland, enabling the customer to chat with a pharmacist, which is a service pharmacies are legally obligated to provide in Finland.

Table 2 Pharmacies and detected third-party analytics services

The personal data shared with third-party services includes various pieces of identifying technical data. One of the most important of these is the IP address of the device accompanying each network packet, a crucial data point when trying to identify an individual. Along with IP addresses, unique identifiers associated with devices and users, often stored in cookies on the user’s computer, can be used to single out specific users. The User-Agent headers provide details about the operating system and browser in every HTTP request. There are also several other data items, such as screen resolution, window size, language and country, that can be combined to identify a specific user.

Recital 26 of the preamble to the GDPR stipulates that when assessing whether an individual can be identified, it is essential to consider all means reasonably likely to be used to a person’s direct or indirect identification (see also [44, para. 302–307, 45, p. 11–14]). This encompasses all objective facts like the associated expenses and time involved in the identification process, along with the existing technological capabilities and advancements. In accordance with the ruling in the Breyer case by the CJEU, IP addresses can be considered to constitute personal data, even if one must obtain additional information from a third party to identify a specific individual [41, para. 49]. Even though it is not required that all the information enabling the person to be identified must be in the hands of a single entity [41, para. 42–43, 46, para. 45–46, 47, para. 90], we argue that Big Tech companies that as such have access to extensive amounts of data and are operating in the technology field are likely to have the means to identify a person effectively.

The CJEU has also assessed in its recent case Meta Platforms and Others the combining and using of personal data, including sensitive personal data, for behavioral advertising purposes within the context of Meta group [48]. In general the case reflects the aspect that Big Tech companies, such as Meta, receive and link data from various sources [48, para. 26–27]. Furthermore, the court maintained that where a social network user visits websites or apps to which sensitive data relate and enters information into them when registering or when placing online orders, collection of data from the visits and of the information entered by the user, the linking of all those data with the user’s social network account and the use of those data by the operator, must be regarded as processing of sensitive data, where that data processing allows sensitive data to be revealed. Data collection may happen by means of integrated interfaces, cookies or similar storage technologies.

Recital 26 of the GDPR states that data protection provisions do not apply to data that has been anonymized. Google Analytics has an IP anonymization feature (IP masking) that partially or fully omits the collected IP address. The effectiveness of this anonymization method can be questioned, however. Although anonymization of the IP address is carried out, several other technical data items are delivered along with it. This makes identifying the user possible, for large analytics companies such as Google. Therefore, it is questionable whether the data can be considered anonymous "in such a manner that the data subject is not or no longer identifiable", as the recital 26 of the preamble to the GDPR puts it. Also, as the anonymization process is carried out on Google’s own servers, Google is in practice processing the IP address data also in its identifiable form.

Recital 30 of the preamble to the GDPR maintains that persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as IP addresses, cookie identifiers or other identifiers, leaving traces which can be used to identify them. While the IP address is a very important piece of data when third parties seek to identify a specific user [49], it is also worth noting that by using cookies, large analytics providers such as Google and Facebook/Meta can link the actions of users on different websites to their Google or Facebook accounts. Very often, this also means that the real name of the user can easily be linked to an action such as purchasing prescription medicines. For example, Google Analytics uses a cookie that contains a unique identifier, client ID (cid), for each individual browser-device pair. The main cookie of Google Analytics (called _ga), lasts for 2 years and enables Google to distinguish users from one another.Footnote 3

Moreover, Google Signals, which was released in 2018, enables cross-device tracking of users in Google Analytics.Footnote 4 Google Signals implements this by solution by leveraging its own data from users logged into Google accounts. Therefore, an individual using their Google account on a computer, tablet, or phone can be easily identified as the same user across different browsers and devices. In this case, too, the actions of the user can be linked to their real name. This is of course even more serious than leaking just an IP address combined with sensitive information. Meta/Facebook has also introduced a similar cross-device tracking based on Facebook accounts. Due to the factors discussed above, we argue that when it comes to the selected online pharmacies, there is a substantial risk that the leaked sensitive data can be linked to the individual using the pharmacy website, especially when it comes to technology giants.

While identifying the user is crucial, the most sensitive piece of information potentially leaked to third parties in online pharmacies concerns URL addresses of the pages visited by the customer (who is identified by a device identifier or IP address, for instance). There are usually two pages of interest: the product page of a particular prescription medicine, and the page for ordering prescription medicines. A visit on the product page indicates interest in the medicine in question. Respectively, visiting the order page implies that the customer has the intention of ordering medicines. These details are definitely personal data items that should never leak to third parties. When these two page visits (viewing a product and intending to place an order) can be connected to each other by a third party, meaning that the customer is intending to order a specific prescription medicine, a compelling argument can be made that sensitive data concerning health is leaking. Figure 1 shows an example screenshot of a medicine name leaking in an URL address.

Fig. 1figure 1

A screenshot showing a medicine name (Risperdal) being leaked to Google as a part of an URL address. The website name has been omitted from the figure

We found that there are two main ways for a third party service to deduce the connection between a specific medicine and a purchase. First, the connection is quite obvious when the page visited previously (referer) is transmitted to the third party when the customer arrives at the medicine order page. In this scenario, the third party can promptly discern that the order originates from a product page for a specific medicine, which signifies a clear intention to purchase that specific prescription medicine. In most cases, the medicine name was even a part of the product page’s URL address (although a working and unique product page URL without a medicine name is enough, as the third party can simply check the contents of the page and learn about the medicine). The second type of connection between a medicine and order is created when a third party does not get the information about the previous page directly, but the third party is present on both the product page and the order page. By analyzing the sequence of timestamps associated with page visits, the third party can infer that the two pages have been visited in succession. This leads to the conclusion that with a high probability, an order has been placed for the medicine that was viewed previously. Although this latter case requires more analysis, in both of these cases the third party can be quite confident that the customer has a definite intent to purchase a specific prescription medicine.

Table 3 shows what kinds of sensitive data is delivered to third parties as the user proceeds to order a prescription medicine on the studied pharmacy websites. Out of 48 websites, the ones that transferred any kind of sensitive data to third-party analytics – 41 websites in total – are presented in the table. The columns in this table correspond to the data leak types L2–L4 presented in Section 3.2. The first column of the table displays whether the URL address of the medicine product page the user is viewing is sent to third parties. The second column designates whether the intent to make a medicine order was leaked (note that this does not mean a specific medicine name leaked). Last but not least, the third column indicates whether the intention to make an order can be connected to a specific prescription medicine.

Table 3 The data items related to viewing and ordering prescription medicines delivered to third parties

We can see that the URL of a medicine product page was leaked in 18 cases out of 48 (37.5%). In 41 cases (85.4%), the intention to make an order was leaked to at least one third party – 23 of these data leaks did not have information on the specific medicine which was ordered. The specific prescription medicine name could be connected to an order made by a particular user in 18 cases (37.5%). These findings give reason for great concern and also warrant further study with a larger set of pharmacies (carried out in Section 4.4).

The different platforms used to build the studied pharmacy websites are indicated in different colors in Table 3. The table shows that when pharmacies from cluster C1 (shown in red) contain third-party services, they leak several types of sensitive data. As P1 is a generic e-commerce platform, it is not really planned for handling sensitive data. When building webstores with this platform, the integration of Google Analytics to the website is made effortless. However, it is worth noting that some pharmacies in C1 (e.g. Pharmacies 6 and 10) have chosen not to use any analytics. Cluster C3 (shown in yellow) exhibits a pattern similar to many pharmacies in C1, where all studied data items are usually leaked to third-party analytics services. This is somewhat surprising, as it is a platform specifically designed for online pharmacies. P3 highlights both employing analytics services and the importance of security in its advertising. Unfortunately, these goals appear to be in conflict when it comes to practice. C2 (shown in green), also an online pharmacy platform, fares better. P2-based pharmacies do not have prescription medicine product pages, so their address cannot be leaked. This design choice also protects data on a specific prescription medicine from leaking in the order phase. Nevertheless, third-party services are used on every single website built on this platform. Finally, clusters C4 and C5 (in blue and white) are each represented by only one pharmacy that has analytics. These pharmacies (like the ones in C2) send data on intent to order to third parties, without leaking a medicine name.

A notable finding in our study is also that pharmacy customer’s sensitive health related information was likely transferred outside the EU or EEA at least in some cases. For example, the name of prescription medicine was leaked to Bing/Microsoft (3 cases) and Yandex (1 case) which were found to likely be located in North America and Russian Federation, respectively.Footnote 5 However, it is also worth noting that third-party service providers can have access to the data located in Europe from outside the EU or EEA which is regarded as a data transfer. Our analysis only covers the client-side functionality of websites. Personal data transfers out of EEA are subject to special safeguards under the GDPR as they may involve noteworthy risks to the data subjects e.g. due to access requests by public authorities (see e.g. [50]). This has lead to data protection supervisory authorities to reject, for example, the use of Google Analytics in many cases (see e.g. [51,52,53]) and to cast a record fine of 1.2 billion euros to Meta Platforms Ireland Ltd [54]. Transferring sensitive health related data out of EEA definitely raises concerns about whether an equivalent level of protection has been granted to the data as required by the GDPR.

4.2 Automatic analysis with a larger dataset

To get a better understanding on the complete set of Finnish online pharmacies, we ran an automatic analysis including all 163 pharmacies. As a result of this more extensive analysis, 145 pharmacies had third-party services on their web pages and only 18 did not. Out of all 163 online pharmacies, 57 (35.0%) leaked the medicine name either on the product page or during ordering. It goes without saying that this number is remarkably high – over one third of online pharmacies leaked the customer’s sensitive health data to third parties!

Figure 2 shows the third-party services found in the Finnish online pharmacies. While Google’s top place in the diagram is not surprising, 134 pharmacy websites is a very high number. Google’s services are present on 81.6% of all studied online pharmacies. Pingdom, which seems to be an analytics service of choice on Platform 2, has 20 occurrences, while Facebook/Meta has 12 and Microsoft has 4. Finally, while Matomo/Piwik is an analytics service which is often used locally and retains the pharmacy’s control of data (without surrendering it to a third party), in our dataset this was not always the case. We have counted the instances of Matomo/Piwik analytics service in the cases where it was not used locally, but instead, data was sent to an external domain not owned by the pharmacy.

Fig. 2figure 2

The presence of third-party services on 163 Finnish online pharmacy websites

Figure 3 shows the cases where the data on a specific prescription medicine is leaked to a third party, either on the product page or when ordering the medicine. The fact that the medicine viewed or ordered by the user is leaked to Google on 55 pharmacy websites is astounding. This is over third (33.7%) of all Finnish online pharmacies! It also means that in the set of all 57 pharmacies that leak sensitive health data, Google’s services receive this data in 96.4% of the cases. Compared to Google, Facebook’s presence as a receiver of sensitive data is not that high (8 cases, 4.9% of all pharmacies). It is also worth noting that in this second diagram Pingdom has disappeared – despite the relatively high presence, it never collects information on the exact prescription medicine.

Fig. 3figure 3

The cases in which third parties received information on a specific medicine viewed or ordered by the user (counted once per pharmacy) on 163 Finnish online pharmacy websites

Figure 4 shows the connections between the used platforms and medicine name data leaks. Only two platforms, P1 ans P3, ever leak the medicine name. In total, 35.2% (44/125) of the P1-based pharmacies leak the medicine name, while 86.7% (13/15) of the P3-based pharmacies leak the medicine name. In both clusters C1 and C3, the use of Google’s services is the largest culprit for data leaks – causing 42 leaks in C1 and 13 leaks in C3. Clusters C2, C4 and C5 were not found to have any medicine name leaks in our experiments.

Fig. 4figure 4

A Comparison of the used platforms: the pharmacies leaking medicine name and pharmacies without medicine name leaks

4.3 Change in the number of third parties

In addition to scientific pursuits, one of the objectives of the current study was to improve the privacy of Finnish online pharmacies. To this end, our findings were reported to the Finnish data protection authorities. Dozens of online pharmacies are still being investigated as of this article being written. Initially, the privacy issues were discovered and reported to authorities in April 2022. The data protection authority started an investigation on a larger scale in November 2022, and at this point, we revisited the situation. Finally, we carried out the last analysis in March 2023 when all pharmacies had been given a fair possibility to improve their websites.

Figure 5 shows the number of third-party services on all studied Finnish online pharmacy websites (N = 48) as a function of time. Looking at the initial situation compared to March 2023, the number of third parties has dropped from 79 to 18, and is now less than fourth (22.8%) of the initial number of third-party services. Figure 6, on the other hand, shows the number of instances where information on a specific medicine leaks to third parties. Again, comparing the initial situation with the current one we can see that the drop is even more drastic here: from 33 leaks to 4. The number in March 2023 is only 12.1% of the initial number. This shows the significant impact our findings have already had on the privacy of the analyzed pharmacies.

Fig. 5figure 5

A change in the number of third-party services on the Finnish online pharmacy websites

Fig. 6figure 6

A change in the number of instances where a specific medicine name leaks to third parties

Despite the good outcome, the charts also show that there are individual cases where new privacy risks have been introduced even after the authorities have been involved. For example, in Fig. 6 we can see that Snapchat has suddenly popped up as a new destination for data on prescription medicine names. Our goal in future is to keep monitoring the situation, and if necessary, work with the pharmacies to reduce the number of prescription medicine leaks to zero.

In the figures, it is worth noting that one pharmacy can contain numerous third parties and several leaks, so the numbers reflect data leaks rather than unique pharmacies. Also, three (3) online pharmacies have been on a long maintenance break after the data leaks were discovered, which may have minor effects on the reported numbers.

4.4 Analysis of privacy policies

We analyzed privacy policies of 20 pharmacies included in the manual network traffic analysis. Among these online pharmacies, 16 were found to send personal data to third-party services. In the privacy policies on these websites, 10 out of 16 pharmacies denied sending any data on products (medicines) users have viewed or ordered. This, of course, completely contradicts our network traffic analysis which proves that many of these websites share data about prescription medicines with third parties.

Three of the analyzed pharmacies admitted in their privacy policies that these kinds of transfers to third parties can happen. However, none of them explicitly stated that information about intended prescription medicine orders is handed over to third parties. The used expressions were more subtle, stating that the personal data that is collected on the website can, among several other personal data items, include data on visited URLs or ordered products. It is left to the user to realize that the visited URL can also mean a product page of a specific prescription medicine. The user may not also immediately realize that data on an ordered product could mean sensitive data on a specific prescription medicine.

In another section of these privacy policies, it was then explained that the collected personal data can also be shared with third parties. One of the analyzed privacy policy documents (Pharmacy 11) did not state clearly whether personal data can be sent to third parties. Although the cookie banners are outside the scope of the current study, it is worth noting that Pharmacy 1 very clearly explained in its cookie consent banner that data on prescriptions or medication is not collected. This statement and our findings blatantly contradict each other.

Based on this analysis, it is obvious that the majority of the analyzed privacy policies did not adequately inform the customer about the fact that their sensitive health data is turned over to third parties. Curiously, in several cases privacy policy documents had clearly been directly copied from other online pharmacies. On many occasions, large sections of the documents were identical and only a few details were changed in the text, without sufficient attention to the applicability to the specific pharmacy and contents of the document in general. Indeed, numerous privacy policies even had the same misspellings at several points in the text.

The lack of transparency in the studied privacy policies in terms of found data leaks is not that surprising, however, when we take into account that the web developers and data protection officers have most likely been totally unaware of these leaks before they were informed about the situation.

4.5 Legal analysis: Data concerning health

This section discusses why information concerning purchases on prescription medicines should be considered as data concerning health under the GDPR. In the presented study, it was discovered that the URL addresses of the websites on which prescription medicines could be ordered were sent to analytic service providers among with other identifiable data such as IP address and device identifiers. In some cases, the said URL address directly contained the name and the details of the prescription medicine. The combination of collected data also revealed the person’s intention to buy the medicine in question. Information concerning a person’s medication can lead into interpretations about his or her health status and, thus, can also have significant impacts on the individual in case misused. Thus, we argue that this kind of information should be considered as “data concerning health” under the GDPR to ensure that the individual’s right to data protection and privacy are efficiently protected.

Health data as sensitive data is subject to special protection under data protection and privacy laws [55, para. 126]. Under Article 9(1) of the GDPR the processing of sensitive data is by default prohibited. This applies regardless of whether the information in question is correct and of whether the controller is acting with the aim of obtaining that sensitive data [48, para. 69]. Sensitive data can only be processed lawfully in case a special legal ground, such as explicit consent, is applied and additional safeguards, such as conducting a Data Protection Impact Assessment in high risk cases, implemented. Processing of sensitive data should particularly be considered when implementing data protection by design approach into services and products as well as in defining technical and organizational safeguards for processing activities. According to Bygrave and Tosoni, health data needs special protection as it can reveal the essential vulnerabilities of a person, exposing the person in question to negative consequences such as stigma and discrimination [56, p. 218]. Respecting the confidentiality of health data is also important from the perspective of general interest as it is essential for ensuring trust in health care services [56, p. 218, 57].

Article 4(15) of the GDPR defines “data concerning health” as personal data relating to the physical or mental health of a person. This includes the provision of health care services, which reveal information about his or her health status. According to the recital 35 of the GDPR, health data includes all data pertaining to health status of a person which reveal information relating to his or her past, current or future physical or mental health status. This includes, for example, any information on a disease, disability, disease risk, medical history, clinical treatment or biomedical state of the person. Furthermore, the GDPR acknowledges that data concerning health needs to be granted special protection, as the use of such sensitive data may potentially have significant adverse impacts on individuals.Footnote 6

Several factors support the interpretation that information concerning purchase of a prescription medicine should constitute data concerning health under the GDPR in the context of the studied online pharmacies. Firstly, in light of the to preamble to the GDPR and the case law of the CJEU, the term “data concerning health” should be given a wide interpretation (see e.g. [55, 58, p. 5, 59]). In case Vyriausioji tarnybinès etikos komisija the CJEU maintains that personal data that indirectly reveals sensitive information concerning the individual is to be interpret as sensitive data [59, para. 128]. Secondly, WP29 has explicitly stated that health data, which is a broader term than pure “medical data”, also includes data about the purchase of medical products, devices and services, when health status can be inferred from the data [60, p. 2]. WP29’s interpretation concerned the term “health data” under the Directive 95/46/EU, the predecessor of the GDPR. Nevertheless, it can be assumed that this interpretation would apply also under the GDPR [37, p. 36]. Thirdly, as acknowledged in the GDPR, processing of information concerning the health status of a person can result in a high risk to the rights and freedom of individuals. Disclosing information concerning a person’s prescription medicines can be assumed to have similar risks to the individual.

Nevertheless, the scope of the term potentially has certain limitations when addressing indirect information that should be taken into consideration. Schäfke-Zell presents that to clarify the gray areas of the term "data concerning health", the scope of the term has sometimes been defined in legal commentaries by the purpose of the processing activity rather than by the categories of data in question [37, p. 34–36]. Similar kind of approach has been presented also, for example, in the guideline of the EDPB concerning processing of personal data through video devices, presenting that data is to be regarded as sensitive data in case the material is processed to actually deduce sensitive information from it [61]. Furthermore, it should also be noted that e.g. in case Dionyssopoulou, the CJEU determined that a mere reference without any disclosure of data concerning a person’s health did not constitute health data [56, p. 221, 62, 63].

However, in the context of this study, as the information concerning person’s medication purchases can be used to determine sensitive information concerning the person’s health status (cf. e.g. [37, p. 40–41, 38, p. 238–239, 60, p. 4–5]), the information is collected together with several other identifiers (cf. e.g. [37, p. 39–40, 60, p. 4–5]), and the Big Tech related analytics services have the resources to analyze data by advanced technologies (cf. [38, p. 239–241]), a broad interpretation of the term "data concerning health" should be applied. This is particularly because the use of this data for unwanted purposes is likely to have significant impact on the customers in question (cf. e.g. [39, p. 140, 40, p. 11]) and disclosing this data to third parties increases the risk level of processing. Thus, any other interpretation would diminish the individual’s right to data protection and privacy.

留言 (0)

沒有登入
gif