Data Ethics: A Checklist with 7 Points to Consider

Murat Durmus

CEO & Founder @ AISOMA AG | Thought-Provoking Thoughts on AI | Member of the Advisory Board AI Frankfurt | Author of the book "MINDFUL AI" | AI | AI-Strategy | AI-Ethics | XAI | Philosophy

Published Jun 24, 2022

Data ethics involves defining and promoting ethical principles for handling personal data; This includes responsible practices for collecting, using, storing, sharing, and disposing of data. Data ethics aims to ensure that data is managed morally, soundly, and responsibly.

In an increasingly data-driven world, where vast amounts of personal information are being collected and processed, data ethics seeks to address the ethical challenges and implications that arise. It recognizes the importance of respecting individuals' privacy, autonomy, and rights in data handling.

Data ethics involves informed consent, transparency, fairness, accountability, data minimization, purpose limitation, data security, and the responsible use of emerging technologies like artificial intelligence and machine learning. It also examines the potential biases and discrimination resulting from data collection and analysis.

It is recommended that both organizations and individuals adopt ethical frameworks and practices to handle data that uphold individual rights and promote trust, integrity, and social responsibility; This can be achieved through privacy policies, privacy impact assessments, data anonymization or pseudonymization, and promoting data literacy and awareness among stakeholders.

Our focus should be on maintaining data ethics to balance maximizing the advantages of data-driven technologies while preserving individual privacy and well-being. We are constantly working towards setting ethical norms and standards that promote responsible data use, fostering a more reliable and ethical data ecosystem.

Below are 7 points to consider.

1. Start with precise user needs and public benefit

Describe the user's need:

Does everyone in the team understand the user's needs?
How does this benefit the public?
What would be the harm in not using data science — what might not be met?
Do you have supporting evidence for the approach being likely to meet a user's need or provide public benefit?

2. Be aware of relevant legislation and codes of practice

List the pieces of legislation, codes of practice, and guidance that apply to your project:

Do all team members understand how relevant laws applicable to the project?
If necessary, have you consulted with relevant experts?
Could you let me know if you've been able to speak with your information assurance team?
If using personal data, do you understand obligations under data protection legislation?

3. Use data that is proportionate to the user's need

If using personal data, have you answered the questions for determining proportionality?

You'll need to include evidence to support any decision.
If using personal data, what measures are in place to control access?
How widely are you searching personal data?
How can you meet the project aim using the minimum personal data possible?
Is there a way to achieve the same aim with less identifiable data?
Can you use synthetic data?
Has the data being used been provided for your analysis?
Using data that the public has freely volunteered, would your project jeopardize people providing this again in the future?
Could you clearly explain why you need to use that data for members of the public?
Is there a fair balance between the rights of individuals and the interests of the community?

4. Understand the limitations of the data

Identify the potential limitations of the data source(s) and how they are being mitigated:

What data source(s) is being used?
Are all metadata and field names clearly understood?
What processes do you have in place to ensure and maintain data integrity?
Is there a plan in place to identify errors and biases?
What are the caveats?

5. Use robust practices and work within your skillset

Explain the relevant expertise and approaches that are being employed to maximize the efficacy of the project:

Describe the disciplines involved and why.
Is there expertise the project requires that you don’t currently have?
Have you designed the approach with a policy team or subject matter expert(s)?
Has all subject matter context, from policy experts or otherwise, been considered when determining the appropriate loss function for the model?
If necessary, how can you (or with external scrutiny) check that the algorithm makes the right output decision when new data is added?
How has reproducibility been made? Could another analyst repeat your procedure based on your documentation?
How confident are you that the algorithm is robust, and that any assumptions are met?
What is the quality of the model outputs, and how does this stack up against the project objectives?
If using data about people, is it possible that a data science technique is basing analysis on proxies for protected variables which could lead to a discriminatory policy decision?
What processes are in place to ensure and maintain data integrity?

6. Make your work transparent and be accountable

Describe how you have considered making your work transparent and accountable:

Have you been able to speak with your organization to find out if you can speak about your project openly?
Have you considered how internal and external engagement could benefit your project?
How interpretable are the outputs of your work?
Could you explain how approaches were designed in plain English to other practitioners, policymakers, and, if appropriate, the public?
Can you openly publish your methodology, metadata about your model, and the model itself, e.g., on Github?
Can you get peers to review your Pull Requests?

7. Embed data use responsibly

Describe the steps taken to ensure any insight is managed responsibly:

How many people will be affected by the new model, insight, or service?
Who are the insight, model, or new service users?
Do users have the appropriate support and training to maintain the new technology?
Do you know if future events have been planned?
Is your implementation plan correlated with the impact of a particular model?
Please let me know how often you'll report on these plans to senior reporting officers.

Murat Durmus

(Author of the Book: "MINDFUL AI: Reflections on Artificial Intelligence")

Thoughts on AI (weekly)

15,404 followers

+ Subscribe

Irina Raicu

Director of the Internet Ethics Program at the Markkula Center for Applied Ethics, Santa Clara University

For those looking for teaching resources on this: "An Introduction to Data Ethics": https://www.scu.edu/ethics/focus-areas/technology-ethics/resources/an-introduction-to-data-ethics/

Varun Madiyal

Insightful read 💗

1 Reaction

See more comments

To view or add a comment, sign in

Data Ethics: A Checklist with 7 Points to Consider

Murat Durmus

CEO & Founder @ AISOMA AG | Thought-Provoking Thoughts on AI | Member of the Advisory Board AI Frankfurt | Author of the book "MINDFUL AI" | AI | AI-Strategy | AI-Ethics | XAI | Philosophy

1. Start with precise user needs and public benefit

2. Be aware of relevant legislation and codes of practice

3. Use data that is proportionate to the user's need

Recommended by LinkedIn

4. Understand the limitations of the data

5. Use robust practices and work within your skillset

6. Make your work transparent and be accountable

7. Embed data use responsibly

Thoughts on AI (weekly)

15,404 followers

More articles by Murat Durmus

Insights from the community

Others also viewed

Unveiling AI Ethics: Charting a Course for Compliant AI Innovations

The Role of Data Ethics in Modern IT

The Role of Data Ethics in Data Science A Perspective

Data Ethics: Guidelines for Responsible Data Handling

Big Data Ethics

Data Ethics

Creating a Code of Ethics for Generative AI

Data Governance and Ethics: Best Practices for Managing and Protecting Data

Understanding AI Ethics: Principles, Challenges, and Future Directions

Data ethics considerations

Explore topics

1. Start with precise user needs and public benefit

2. Be aware of relevant legislation and codes of practice

3. Use data that is proportionate to the user's need

Recommended by LinkedIn

4. Understand the limitations of the data

5. Use robust practices and work within your skillset

6. Make your work transparent and be accountable

7. Embed data use responsibly

Thoughts on AI (weekly)

15,404 followers

More articles by Murat Durmus

So Many AIs, So Little Time

The Time Will Come When We Will Envy Sisyphus

A Machine that Calls You by Your First Name

Debugging the Wrong System

Embodied Meaning

Fragments of the Furnace

The Fog of Synthetic Souls

Checkbox Faith

The Illusion of Control or Chainsaws and Choices

Bias is a Mirror, Not a Bug

Insights from the community

Others also viewed

Unveiling AI Ethics: Charting a Course for Compliant AI Innovations

The Role of Data Ethics in Modern IT

The Role of Data Ethics in Data Science A Perspective

Data Ethics: Guidelines for Responsible Data Handling

Big Data Ethics

Data Ethics

Creating a Code of Ethics for Generative AI

Data Governance and Ethics: Best Practices for Managing and Protecting Data

Understanding AI Ethics: Principles, Challenges, and Future Directions

Data ethics considerations

Explore topics