- Home >
- Services >
- Access to Knowledge >
- Trend Monitor >
- Type of Threat or Opportunity >
- Trend snippet: Data problems are very likely to occur in the planning and development stage of algorithms
Trends in Security Information
The HSD Trendmonitor is designed to provide access to relevant content on various subjects in the safety and security domain, to identify relevant developments and to connect knowledge and organisations. The safety and security domain encompasses a vast number of subjects. Four relevant taxonomies (type of threat or opportunity, victim, source of threat and domain of application) have been constructed in order to visualize all of these subjects. The taxonomies and related category descriptions have been carefully composed according to other taxonomies, European and international standards and our own expertise.
In order to identify safety and security related trends, relevant reports and HSD news articles are continuously scanned, analysed and classified by hand according to the four taxonomies. This results in a wide array of observations, which we call ‘Trend Snippets’. Multiple Trend Snippets combined can provide insights into safety and security trends. The size of the circles shows the relative weight of the topic, the filters can be used to further select the most relevant content for you. If you have an addition, question or remark, drop us a line at info@securitydelta.nl.
visible on larger screens only
Please expand your browser window.
Or enjoy this interactive application on your desktop or laptop.
Data problems are very likely to occur in the planning and development stage of algorithms
1.4.2 The data challenge As has been illustrated above, the quality, accuracy, validity and reliability of algorithms depend on the quality of the input provided.109 Several problems may arise in this respect.
First, an otherwise correctly designed algorithmic system may be fed with incorrect information. To give a simple example: if a sensor dysfunction wrongly leads to detecting the violation of a traffic rule,110 a rule-based algorithm will not be able to identify the mistake itself and will simply generate a decision to impose a fine based on the assumption that a traffic offence has been committed. However, humans would consider that outcome to be unfair, because the car driver is being fined for an offence that she has not committed.111 Another inaccuracy that may occur is that the data used to train an algorithm is unrepresentative of the general population, inadequately deals with outliers or does not include particular minority groups.112 This type of inaccuracy may easily lead to discrimination, since this may result in an algorithm automatically reflecting the imbalanced data on which it has been trained. The classic example is that of the facial recognition software that performs less well on Black women’s faces than on White women’s and Black men’s faces because Black women are under-represented in the dataset used to train the algorithm.113 Finally, and very relevant from a non-discrimination perspective, if a self-learning algorithm is fed with unbalanced or biased data it is very likely to generate equally unbalanced and biased output based on its detection of correlations and patterns in that data.114 As Barocas and Selbst have explained, ‘approached without care, data mining can reproduce existing patterns of discrimination, inherit the prejudice of prior decision makers, or simply reflect the widespread biases that persist in society’.115 Often this is summarised as ‘rubbish in, rubbish out’ or ‘bias in, bias out’.116 As was noted in the AI Now report from 2016, ‘there is the risk that AI systems trained on this data will produce models that replicate and magnify those biases. In such cases, AI systems would exacerbate the discriminatory dynamics that create social inequality, and would likely do so in ways that would be less obvious than human prejudice and implicit bias’.117 This is exacerbated if a self-learning algorithm is using the output it has generated based on flawed data to further ‘improve’ itself. In that case a feedback loop can be created that reinforces already existing patterns of structural discrimination by ‘reifying’ and further enacting discriminatory correlations.118 Because of the human factor, discussed in section 1.4.1 above, such data problems are very likely to occur in the planning and development stage, for example when selecting the data that is to be used or when preparing or labelling the data. Clearly it is imperative to be aware of the quality, accuracy and reliability of the data used for labelling, training, feedback and learning. In this report, this is described as the data challenge.