3.1. Big Data Ethics in a Broader Context
The research conducted by other authors shows that there has been a long-term discussion about the impacts of new technologies on society and their ethical norms and values. Computer ethics have been evolving since the invention of computers in the 20th century after the world wars and has been described by many authors, such as Norbert Wiener [
14], Walter Maner [
19], or James Moor [
15]. However, the foundation of modern information ethics was established at the end of the 20th century by Rafael Capurro [
16] and Luciano Floridi as described below.
Over the last few years, there has been a visible shift from information ethics to data ethics based on the idea of two Oxford academics, Luciano Floridi and Mariarosaria Taddeo, who stated the following: “We should concentrate on what is being handled (data) as the true invariant of our concerns and that is why labels such as ‘robo-ethics’ or ‘machine ethics’ miss the point.” [
7].
The most recent definition of data ethics is from 2016 and was provided by Floridi and Taddeo, approaching the topic on different levels of abstraction (LoA), such as macroethics, distinguishing the ethics of data, algorithms, and practices.
This respected definition of data ethics in LoA of data was set in their article “What is Data Ethics?” [
7] as
“In the light of this change of LoA, data ethics can be defined as the branch of ethics that studies and evaluates moral problems related to data (including generation, recording, curation, processing, dissemination, sharing and use), algorithms (including artificial intelligence, artificial agents, machine learning and robots) and corresponding practices (including responsible innovation, programming, hacking and professional codes), in order to formulate and support morally good solutions (e.g., right conducts or right values). This means that the ethical challenges posed by data science can be mapped within the conceptual space delineated by three axes of research: the ethics of data, the ethics of algorithms, and the ethics of practices.”
3.1.2. Description of Stakeholder Groups
The specific role of stakeholder groups means that there is increasing information asymmetry between the individual users that can be considered data-poor and big organizations (state organizations and corporations) that collect data about individual users and can be considered data-rich.
The term of information asymmetry is not new and is related to the phenomenon of the digital divide. The term digital divide in relation to IT technology was coined in the 1980s with the introduction of the Internet in America and its nonlinear spread between its first users.
Over time, the digital divide has been categorized into three parts: the first divide mainly regards the limited access to new technology [
20], the second divide is more about the skills needed [
21], and, finally, the third divide asks the question, “who benefits most from being online and then also in the offline world?” [
22].
Thus, when speaking about stakeholders in Big Data, we can provide three important categories of stakeholder groups:
To be data-rich brings insight into many areas of the economy and society and competitive advantages that generate new business opportunities.
The definition of data rich, based on the digital divide description, means having five barrier elements:
Technical means, software, hardware, and connectivity quality;
Autonomy of use (location of access, freedom to use the medium for one’s preferred activities);
Use patterns (types of uses of the internet);
Social support networks (availability of others one can turn to for assistance with use and size of networks to encourage use);
Skill (one’s ability to use the medium effectively) [
21].
Data-rich organizations nearly always have a corporate social responsibility strategy (CSR) in place that also covers the data ethics area. We could expect that for data corporations like Facebook, Amazon, and Google, among others, the CSR should be a good balancing power. However, many recently highly publicized affairs, e.g., the Cambridge Analytica scandal, show that it is not the case. The research of CSR and Big Data and analytics [
23] showed that Big Data is underemployed in the area of corporate social responsibility. Their research and in-depth assessment conducted on a sample of the best-ranked global German companies, selected from the 2015 sustainability ranking reports, confirmed that these companies are not necessarily primarily interested in CSR but in economic interests.
A relevant question emerges: if organizations are not prioritizing CSR strategies, why would they heed the advice of a proposed data expert? Based on our proposal, data experts do not use their power in a way to increase the importance of CSR in organization priorities, but they can solve the ethical problem directly without involving management and their prioritization process. Data experts, as key people responsible for the design of Big Data solutions, can change an unethical solution to an ethical one by themselves when they are motivated and capable of such behavior.
Data poor means (opposite to data rich) missing one of the five barrier elements, namely, the technical means, autonomy to use them, ability to use patterns, access to social support networks, and possessing specific data skills.
However, even if the data user held all of above-mentioned elements, it is still mainly access to the critical volume of data that brings the competitive advantage hidden in data. Furthermore, despite it being highly unlikely that individuals would have all of these, even if they did, they would never have the capabilities of data-rich organizations on a comparable scale [
4].
To be a data-poor user means providing personal data to data-rich organizations that benefit from it in the online and also offline world. Originally, it was a good trade-off, i.e., users exchanged their privacy for better personalized and often free services provided by organizations. However, over time, negative issues, such privacy intrusion, the big brother effect on the state level and the little brother on the corporate level, confusion, and others, have arisen and created inequalities affecting data-poor users. For a comprehensive insight into Big Data issues, see [
5,
24].
In addition to the data-rich organization and data-poor users that we identified, there is a third group of power users working in data-rich organizations that we call data experts.
A data expert is a member of both previously named groups: data-rich as part of a corporation and data-poor as an individual. In reference to Helen Nissenbaum and her concept of privacy as contextual integrity [
2], we can assume data experts to be part of many contexts and therefore competent in Big Data ethics. This stems from their insight into the context of Big Data and awareness of many benefits and issues related to Big Data. The comprehensive insight originates from the leading role of data experts in the design of IT solutions that respond to business requirements assigned to them by organizations.
The possible special role of data experts is derived from the following three aspects:
Motivation and competence;
Sense of responsibility for data ethics;
Possibility and means to influence Big Data issues.
The fact that data experts simultaneously belong to the data-rich and data-poor stakeholder groups ought to be a good motivation to be the third balancing power that mediates the opposite interests of both groups.
The responsibility related to the role of data experts could be derived from their belonging to a data special social group and professional and ethical organizations that group people based on a shared professional identity. Belonging to these groups is often confirmed by acceptance to official memberships in international, national, and corporate groups and associations.
“The common identity is produced and reproduced through occupational and professional socialization by means of shared educational backgrounds, professional training, and vocational experiences, and by the membership of professional associations (local, regional, national and international) and institutes where practitioners develop and maintain a shared work culture.”
Membership in such a professional organization related to specific industries or occupations is usually not mandatory; however, there are some benefits, such as personal certification, training availability, access to knowledge bases, and possible participation in conferences that are available only to members of these organizations. In some cases, if you are not a member of such a professional organization, you practically cannot do your job, for example, doctors of medicine who are not members of Camera Medica (Medical Chamber).
We discuss and conclude the role of data experts and the possibility and means of influencing Big Data issues in
Section 3.4, and
Figure 1 provides a visual form of this conclusion.
3.2. Demand for Governance and Regulatory Frameworks
3.2.1. Governance Rules
There are many approaches to governance; however, governance as a general term is still considered vague and may have a different meaning in different contexts. Furthermore, governance, as a steering principle, is overarching at many levels, such as states and society, organizations, information technologies, and data or security projects.
Comprehensive research on the theme of governance was provided by Petr Vymetal in the paper “Governance: Defining the Concept” [
25]. As a summary of this research, we suggest using the following comprehensive definition that is appropriate for the majority of disciplines and levels:
“Governance is the system of values, policies, and institutions by which a society manages its economic, political, and social affairs through interactions within, and among the state, civil society, and private sectors. It is the way a society organizes itself to make and implement decisions—achieving mutual understanding, agreement, and action. It comprises the mechanisms and processes for citizens and groups to articulate their interests, mediate their differences and exercise their legal rights and obligations. It is the rules, institutions, and practices that set limits and provide incentives for individuals, organizations, and firms. Governance, including its social, political and economic dimensions, operates at every level of human enterprise, be it the household, village, municipality, nation, region or globe.”
In respect to Big Data ethics, we use the top-down cascading principle and find that the following governance areas are relevant:
“States & Society Governance: A very comprehensive view” is provided by Bell and Hindmoor in their book, Rethinking Governance: The Centrality of the State in Modern Society (2009).
Bell and Hindmoor [
9] name many different perspectives on state and society governance, such as a state-centric relational approach, hierarchy, top-down governance, governance through persuasion, governance through markets and contracts, governance through community engagement, and governance through associations.
In regard to the current data-driven changes that are supported by IT, namely, reducing hierarchy, adding complexity, and introducing new trends, and because of the new, open risks in digital public governance [
11,
27], it is very important to address the question mentioned also at the end of Bell and Hindmoor in “Rethinking Governance: How to govern society without Governance” [
9].
This approach to state governance without governance was holistically described by the concept of governmentality and analyzed by Michel Foucault in 1978 [
10]. He addresses the question of how conduct governance in the emerging global society. He describes his observation of the contemporary population that can be governed by a government through apparatuses of security using a political economy. The security apparatuses are to provide the population with a general feeling of well-being. In doing so, Foucault recommends that the state’s actions be rather restricted and subtle in their nature, yet consequently very influential in their outcome. He offers a depiction of a multicentered society, which, even from the position of a state, is best regulated by the market mechanisms and by injecting the individuals with ideas rather than forcing on them the government’s will by force. The individuals then become auto-regulated and auto-disciplined.
In order to be able to govern in this manner, to conduct such governmentality, an extensive amount of information about the population is required. With the development of Big Data, this becomes increasingly accessible. The problem is that it is not primarily the state gathering the information; it is the technology companies and, subsequently, other business sectors in the market.
Faced with the new phenomenon of the global technology companies, it may be rather difficult for the states to control them. In any case, the answer to these questions is not to open a vast conflict of sovereignty and discipline between the “private” Big Data and the government. On the contrary, for states, it should prove more effective to engage with the technology companies using the instruments of Foucault’s governmentality as described above. This temptation to use commercial data collected by corporations about their clients for the purpose of the state to control their inhabitants came true recently in China. The Chinese government introduced their social credit score system, scoring the behavior of each individual in China, and in 2018, approached the Alibaba e-commerce platform for the initial data to calibrate their model. This system, which has been in full operation since 2020, is based not only on the payment history of individuals but also on the monitored behavior of individuals [
28,
29]. When discussing the mechanism of an autoregulated society wherein Big Data is an essential component of this governance, which is our extension of Foucault’s observation, we should better describe the other forces of the regulation mentioned above, ideally by some widely accepted frameworks that we briefly introduce.
For the level of state, society, and organizations (covering also enterprises), we can consider the law as this widely accepted framework that we describe further. For the area of IT in enterprises, where the data experts operate, “IT best practice” standards are more respected by these data experts than the law. Enterprise governance of IT (EGIT) is an important part of the governance of organizations (corporations/enterprises or even state organizations).
Several institutions are concentrated on research and development in the area of EGIT. The most important seems to be the IT Governance Institute (ITGI), which is a branch of the ISACA (Information Systems Audit and Control Association), an independent, nonprofit global association.
The relevance of ITGI is essential because of its connection to COBIT (Control Objectives for Information and Related Technology), which is a standard also supported by the ISACA. Currently, COBIT 5 is considered the most comprehensive framework in the area of IT governance and management.
“Enterprise governance of IT (EGIT) is defined as an integral part of enterprise governance, exercised by the Board, overseeing the definition and implementation of processes, structures, and relational mechanisms in the organization that enable both business and IT people to execute their responsibility in support of business /IT alignment and the creation of business value from IT-enabled business investment.”
We have no space in this paper further discuss EGIT and COBIT; however, we want to highlight here their relevance for data experts. In the data ethics area, it is important that the same cascading and steering principle of governance described above are applied to different levels, such as states and society, organizations, information technologies, and finally also to data projects.
For our approach to Big Data ethics and the model introduced in
Figure 1, the following terms are important: the role of stakeholders, balancing powers, principles, rules, and motivations (goals), among others. They exist in all relevant governance models even though their meaning can be slightly different depending on the governance level. What is essential for the role of data experts as individuals is that they are part of many governance models, and all of these governance models are logically interconnected (cascaded) in a global society.
3.2.2. Regulatory Framework
The American lawyer and respected professor of law at Harvard University, Lawrence Lessig, in his book “Code and Other Laws of Cyberspace” [
8], described several factors as the need to regulate cyberspace, such as the market, legislation, social norms, and architecture.
“Our choice is not between “regulation” and “no regulation.” The code regulates. It implements values, or not. It enables freedoms or disables them. It protects privacy or promotes monitoring. People choose how the code does these things. People write the code. Thus, the choice is not whether people will decide how cyberspace regulates. People–coders–will.”
Although in computer and data science, “code” typically refers to the source code of a computer program, in law, “code” usually refers to valid legislation. In his work, Lawrence Lessig explores how code in both senses serves as an instrument for social control, leading to his maxim that “code is law.”
With rapidly developing technologies, the ever-growing generation of data, the increasing power of corporations, and noting that the law (code) usually works retrospectively, the importance of data experts as “coders” that write the new rules is currently essential.
We do not have enough space to describe all of the above-named forces in detail; thus, the four Lessig forces must suffice here, and we comment in-depth on the forces of social norms and human values. We consider them to be the most relevant to the Big Data ethics discussed in this paper.
For Lessig, the market means the general market principles governing a society.
The law means the whole legislative framework, such as constitutional law, civil codes, corporate law, and criminal codes, where some special legislations, such as GDPR (General Data Protection Regulation), are very relevant for Big Data ethics in the European Union and also in California, USA.
The term architecture was used originally by Lessig as one of the four general powers regulating social systems; thus, it is not equivalent to the commonly used term for IT architecture, e.g., in the TOGAF framework. However, Lessig‘s architecture is used as a possible equivalent to the software code that is similar to Big Data solutions created by data experts discussed in this paper.
Social norms and human values are an important part of ethics as practical philosophy and guide the behavior of individuals in stakeholder groups. As such, we provide an in-depth discussion of this in the next section.
3.3. Social Norms and Human Values
The data experts hold the position and role as employees of organizations and simultaneously as users. In both roles, they are acting individually and make decisions on their own; however, to decide as an individual takes into account many factors that are not visible at first glance.
The philosopher Jan Sokol [
30], building on the work of other philosophers, introduces three different sets of rules on how individuals govern themselves in society and its social norms:
Social custom;
Individual morality;
Ethics.
Social custom regards cultural stereotypes that are earned unconsciously and automatically like we were taught as children.
Individual morality means morality as a voluntary self-restriction, regardless of the actions of the majority or the state of a given society.
Sokol argues ethics as the best way to regulate relations in a society: “ethics as a search for what is best” [
30]. This accounts for positive action of an individual, a creation. It involves continuously asking the following question: what is right and what is wrong? Furthermore, it also necessary to be motivated and able to search for new answers in new contexts.
Regarding the relationship between human values and social norms, we follow leading sociologists from the previous century, such as Emile Durkheim (1897–1964) and Max Weber (1905–1958), who stated that human values are a central concept for explaining the social behavior of groups and individuals. As stated by the Israeli sociologist Shalom H. Schwartz,
“Values have played an important role not only in sociology, but in psychology, anthropology, and related disciplines as well. Values are used to characterize cultural groups, societies, and individuals, to trace change over time, and to explain the motivational bases of attitudes and behavior.”
In the next sections, we briefly introduce two complex concepts of human values, namely Schwartz theory of human values and the European Charter of Fundamental Rights, to identify the values that are generally accepted in a global society [
12] and those also protected by law in the European Union.
3.4. Data Experts as the Balancing Power of Data Ethics
In the section dedicated to stakeholder groups, we described the special role of data experts that is based, in our opinion, on three elements: motivation and competence; sense of responsibility for data ethics; and possibility and means of influencing Big Data issues.
The motivation of data experts arises from the fact that they simultaneously belong to the data-rich and data-poor stakeholder groups. The competence comes from the insight of data experts into the context of Big Data and the awareness of the positive benefits of use cases practiced by data-rich organizations, as well as the negative issues impacting mainly data-poor users.
The responsibility comes from their shared professional identity and their belonging to a special social data group and professional and ethical organizations. Official membership in the professional associations is not mandatory; however, it brings many benefits, and we can expect the increasing importance of such associations for practicing the job of data experts in a similar manner to how doctors of medicine are forced to be members of Camera Medica.
The possibility and means of influencing Big Data issues refers to the means of data experts to influence Big Data issues arising from the three following aspects:
The leading role of experts in data projects;
Respecting the IT-relevant, best-practice methods;
Using a data ethics guidance system to support ethical thinking.
The leading role of experts in data projects means that IT projects are typically managed in an organization based on the distribution of powers, roles, and competencies in project teams. Project governance and management follow certain best-practice standards, e.g., Prince, PMI, ITIL, and others. Project governance respects the roles in the team, e.g., business owner, solution designer, project coordinator, etc.
The only comprehensive insight into a data project belongs to data experts that need to understand both the business domain and the technical details of the solution that resolve the problem of the business domain.
As a business domain, we understand verticals, such as finance, retail, and manufacturing among others, and also horizontal domains, such as marketing, sales, operation, and accounting among others.
As technical domains, we identify many different areas, such as computing, networking, databases, and many others, or more general hardware, software, services and processes among others.
The leading role of data experts originates from the unique role in the design of IT solutions that solve, in many different and creative ways, the business requirements assigned to them by organizations.
Respecting the IT-relevant, best-practice methods includes, apart from the general project management methods mentioned above, data-specific methods, such as software engineering and pattern recognition, agile methods, architecture and solution design methods, and many others.
Regarding the term governance, we should mention COBIT 5, DAMA-DMBOK, COSO, ITIL, and ISO/IEC 27 0xx. Concerning the process of solution design, the CRISP-DM (Cross-Industry Standard Process for Data Mining) is usually a very relevant method for data experts.
Becoming a data expert is usually a long career path that starts at university and is achieved through practice to become a true data scientist with a wide range of experience and a leading role in writing the “code of data solutions and Big Data ethics”.
Using a data ethics guidance system to support ethical thinking has not yet been well-described. Usually, there is a data etiquette, or “netiquette” (rules regarding how to behave in the online world), that is part of the CSR at an organization.
Following the belief “ethics as that the best to regulate relations in society” [
30], data experts appreciate guidelines that support the process of continuously asking the following question: what is right and what is wrong?
As part of the research for this article, we explored the Data Ethics Canvas produced by the Open Data Institute (ODI) and DEDA methodology developed at Utrecht University. Although the ODI Canvas can be helpful in providing a set of essential areas to consider when working with data, the DEDA framework goes further in a detailed process and is a possible guideline for data experts.
DEDA supports continuously asking what is right and what is wrong. DEDA is an abbreviation for Data Ethics Decision Aid. It is an ethical assurance approach focused on Big Data projects that are based on a guided discussion that should include all of the people relevant to a project and take place before the Big Data system is designed.
DEDA implementation consists of three steps: learning methodology, organizing the project with all stakeholders, and asking a set of 29 predefined ethical questions relevant to data governance. A few examples of DEDA predefined ethical questions are as follows: Is there someone in the team who can explain how the algorithms in use work? Can you communicate how the algorithms work? Where do the data(set) come from? [
31]. The DEDA framework was improved in an iterative process (2016–2018) and has since been applied by various Dutch municipalities. We consider DEDA and its process as a well-documented test subject about data ethics and a value-sensitive design approach applied to data projects in organizations that clearly declare their values, as described by Franzke et.al in 2021 [
32].
Our model shown in
Figure 1 does not provide detailed guidelines in the decision-making process of data projects like DEDA but describes the roles and responsibilities of different stakeholders in the context of different regulatory and governance frameworks and appeals to data experts to use their balancing power in a divided society.
To summarize this section, the balancing role of data experts involves three main aspects: motivation and competence; sense of responsibility for data ethics; and possibility and means of influencing Big Data issues. Furthermore, we argue that a data expert is a unique role in which all three of the above-mentioned aspects are possessed, and, therefore, it could be the balancing power of data ethics.