SlideShare a Scribd company logo
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022
DOI: 10.5121/ijaia.2022.13105 73
MITIGATION TECHNIQUES TO OVERCOME
DATA HARM IN MODEL BUILDING FOR ML
Ayse Arslan
Oxford Alumni of Northern California, Santa Clara, USA
ABSTRACT
Given the impact of Machine Learning (ML) on individuals and the society, understanding how harm might
be occur throughout the ML life cycle becomes critical more than ever. By offering a framework to
determine distinct potential sources of downstream harm in ML pipeline, the paper demonstrates the
importance of choices throughout distinct phases of data collection, development, and deployment that
extend far beyond just model training. Relevant mitigation techniques are also suggested for being used
instead of merely relying on generic notions of what counts as fairness.
KEYWORDS
Fairness in machine learning, societal implications of machine learning, algorithmic bias, AI ethics,
allocative harm, representational harm
1. INTRODUCTION
Artificial Intelligence (AI) refers to the art of creating machines that are able to think and act like
human-beings; or think and act reasonably. Every new technology brings with it questions of
ethics and unintended consequences. Looking closely, we can see technologies like AI reflect
humanity’s imperfections back to us. Technologies like AI can enhance, rather than reduce, the
human experience if humanity can be added back into the digital world. This paper provides a
framework for understanding different sources of harm throughout the ML life cycle in order to
offer techniques for mitigations based on an understanding of the data generation and
development processes rather than relying on generic assumptions of what being fair means.
2. EXISTING WORK
An ML algorithm aims to find patterns in a (usually massive) dataset, and to apply that
knowledge to make a prediction about new data points (e.g: photos, job applicant profiles,
medical records etc.) (Cusumano et al., 2019; Parker, van Alstyne, & Choudary, 2016). As a
result, problems can arise during the data collection, model development, and deployment
processes that can lead to different harmful downstream consequences.
This paper refers to the concept of “harm” or “negative consequences” caused by ML systems.
ML (Machine Learning) can be defined as the overall process inferring in a statistical way from
existing data in order to generalize to new, unseen data.
Deep reinforcement learning—where machines learn by testing the consequences of their
actions—combines deep neural networks with reinforcement learning, which together can be
trained to achieve goals over many steps. Most machine learning algorithms are good at
perceptive tasks such as recognizing a voice or a face. Yet, deep reinforcement learning can learn
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022
74
tactical sequences of actions, things like winning a board game or delivering a package. In the
real world, human-beings are able to very quickly parse complex scenes where simultaneously
many aspects of common sense related to physics, psychology, language and more are at play.
Basically, the machine learning process can be divided into the “training phase” and “test phase”:
- During the training phase, the ML team gathers data, selects an ML architecture, and
trains a model. In data poisoning attacks, the attacker inserts manipulated data into the
training dataset. During training, the model tunes its parameters on the poisoned data and
becomes sensitive to the adversarial perturbations they contain. A poisoned model will
have erratic behavior at inference time. Backdoor attacks are a special type of data
poisoning, in which the adversary implants visual patterns in the training data. After
training, the attacker uses those patterns during inference time to trigger specific behavior
in the target ML model.
- In the test phase, the trained model is evaluated on examples it hasn’t seen before. Test
phase or “inference time” attacks are the types of attacks that target the model after
training. An attacker creates an adversarial example by starting with a normal input (e.g.,
an image) and gradually adding noise to it to skew the target model’s output toward the
desired outcome (e.g., a specific output class or general loss of confidence). Another
class of inference-time attacks tries to extract sensitive information from the target
model. If the training data included sensitive information such as credit card numbers or
passwords, these types of attacks can be very damaging. Also Having direct access to the
model will make it easier for the attacker to create adversarial examples.
Models are then built using the training data (not including the held-out validation data).
As seen in Figure 1, a model is defined, and optimized on the training data. Test and benchmark
data is used to evaluate it, and the final model is then integrated into a real-world context. This
process is naturally cyclic, and decisions influenced by models affect the state of the world that
exists the next time data is collected or decisions are applied. The red color indicate where in this
pipeline different sources of downstream harm might arise.
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022
75
Fig. 1. Overview of ML data generation and model development
2.1. Model Evaluation
After the final model is chosen, the performance of the model on the test data is reported. The test
data is not used before this step, to ensure that the model’s performance is a true representation of
how it performs on unseen data. Aside from the test data, other available datasets — also called
benchmark datasets — may be used to demonstrate model robustness or to enable comparison to
other existing methods.
2.2. Model Post-processing
Once a model has been trained, there are various post-processing steps that may needed. For
example, if the output of a model performing binary classification is a probability, but the desired
output to display to users is a categorical answer, there remains a choice of what threshold(s) to
use to round the probability to a hard classification.
2.3. Model Deployment
There are many steps that arise in deploying a model to a real-world setting. For example, the
model may need to be changed based on requirements for explainability or apparent consistency
of results, or there may need to be built-in mechanisms to integrate real-time feedback.
Importantly, there is no guarantee that the population a model sees as input after it is deployed
(here, we will refer to this as the use population) looks the same as the population in the
development sample.
The algorithms used to parse and analyze those data become commercial black boxes. Barocas et
al. [4] provide a useful framework for thinking about how these consequences actually manifest,
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022
76
splitting them into allocative harms (when opportunities or resources are withheld from certain
people or groups) and representational harms (when certain people or groups are stigmatized or
stereotyped). For example, algorithms that determine whether someone is offered a loan or a job
[12, 36] risk inflicting allocative harm. We, human-beings are fallible in making unbiased
decisions ourselves and algorithms can actually help us detect human-generated (and socially
reinforced) discrimination (Kleinberg et al., 2020; Mullainathan, 2019).
In order for an ML model to work well, the following simple steps can be implemented:
1. Train a classifier on labeled data.
2. The bigger classifier model then infers pseudo-labels on a much larger unlabeled dataset.
3. Then, it trains a larger classifier on the combined labeled and pseudo-labeled data, while
also adding noise.
4. (Optional) Going back to step 2, the smaller model may be used a new classifier.
One can view this as a form of self-training, because the model generates pseudo-labels with
which it retrains itself to improve performance. One underpinning hypothesis is that the noise
added during training not only helps with the learning, but also makes the model more robust.
This approach is similar to knowledge distillation, which is a process of transferring knowledge
from a large model to a smaller model. The goal of distillation is to improve speed in order to
build a model that is fast to run in production without sacrificing much in quality compared to the
bigger model.
Fig. 2. Simple illustrations of the model and knowledge distillation.
Knowledge distillation does not add noise during training (e.g., data augmentation or model
regularization) and typically involves a smaller inference model. In contrast, one can think of it as
the process of “knowledge expansion”. One strategy for training production models is to apply
training twice (Fig. 2):
- first to get a larger inference model T’ and then
- to derive a smaller model S.
In some cases, the training may need data augmentation, yet, in certain applications, e.g., natural
language processing, such types of input noise are not readily available. For those applications,
the training model can be simplified to have no noise. In that case, the above two-stage process
becomes a simpler method:
- First, the bigger model infers pseudo-labels on the unlabeled dataset from which is a new
model (T’) that is of equal-or-larger size than the original model being trained.
- The self-training phase is then followed by knowledge distillation to produce a smaller
model for production.
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022
77
3. SOURCES OF HARM IN ML
This section explores each potential source of harm in-depth. Each subsection will detail where
and how in the ML pipeline problems might arise, as well as a characteristic example. These
categories are not mutually exclusive; however, identifying and characterizing each one as
distinct makes them less confusing and easier to tackle.
3.1. Historical Bias
Historical bias arises even if data is perfectly measured and sampled, if the world as it is or was
leads to a model that produces harmful outcomes. Such a system, even if it reflects the world
accurately, can still inflict harm on a population. Considerations of historical bias often involve
evaluating the representational harm (such as reinforcing a stereotype) to a particular group.
3.2. Representation Bias
Representation bias occurs when the development sample under-represents some part of the
population, and subsequently fails to generalize well for a subset of the use population.
Representation bias can arise in several ways:
(1) When defining the target population, if it does not reflect the use population. Data that is
representative of Boston, for example, may not be representative if used to analyze the
population of Indianapolis.
(2) When defining the target population, if contains under-represented groups. Say the target
population for a particular medical dataset is defined to be adults aged 18-40. There are
minority groups within this population: for example, people who are pregnant may make
up only 5% of the target population.
(3) When sampling from the target population, if the sampling method is limited or uneven.
For example, the target population for modeling an infectious disease might be all adults,
but medical data may be available only for the sample of people who were considered
serious enough to bring in for further screening. As a result, the development sample will
represent a skewed subset of the target population. In statistics, this is typically referred
to as sampling bias.
3.3. Measurement Bias
Measurement bias occurs when choosing, collecting, or computing features and labels to use in a
prediction problem. For example, “creditworthiness” is an abstract construct that is often
operationalized with a measureable proxy like a credit score. Proxies become problematic when
they are poor reflections or the target construct and/or are generated differently across groups,
which can happen when:
(1) The proxy is an oversimplification of a more complex construct. Consider the prediction
problem of deciding whether a student will be successful (e.g., in a college admissions
context). Algorithm designers may resort to a single available label such as “GPA” [28],
which ignores different indicators of success present in different parts of the population.
(2) The method of measurement varies across groups. For example, consider factory workers
at several different locations who are monitored to count the number of errors that occur
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022
78
(i.e., observed number of errors is being used as a proxy for work quality). This can also
lead to a feedback loop wherein the group is subject to further monitoring because of the
apparent higher rate of mistakes [5, 17].
(3) The accuracy of measurement varies across groups. For example, in medical
applications, “diagnosed with condition X” is often used as a proxy for “has condition
X.” However, structural discrimination can lead to systematically higher rates of
misdiagnosis or underdiagnosis in certain groups [23, 32, 35].
3.4. Aggregation Bias
A particular dataset might represent people or groups with different backgrounds, cultures or
norms, and a given variable can mean something quite different across them. Aggregation bias
can lead to a model that is not optimal for any group, or a model that is fit to the dominant
population (e.g., if there is also representation bias).
3.5. Learning Bias
Learning bias arises when modelling choices amplify performance disparities across different
examples in the data [24]. For example, an important modelling choice is the objective function
that an ML algorithm learns to optimize during training. Typically, these functions encode some
measure of accuracy on the task (e.g., cross-entropy loss for classification problems or mean
squared error for regression problems).
3.6. Evaluation Bias
Evaluation bias occurs when the benchmark data used for a particular task does not represent the
use population. Evaluation bias ultimately arises because of a desire to quantitatively compare
models against each other. Such generalizations are often not statistically valid [38], and can lead
to overfitting to a particular benchmark.
3.7. Deployment Bias
Deployment bias arises when there is a mismatch between the problem a model is intended to
solve and the way in which it is actually used. This often occurs when a system is built and
evaluated as if it were fully autonomous, while in reality, it operates in a complicated socio-
technical system moderated by institutional structures and human decision-makers (Selbst et al.
[39] refers to this as the “framing trap”).
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022
79
4. SUGGESTED MODEL AS A MITIGATION TECHNIQUE
Fig. 3. Recommended Model
Figure 3 provides an overview of the suggested mitigation technique. As shown in Figure 3., the
data transformation sequence can be abstracted into a general process 𝐴. Let 𝑋 and 𝑌 be the
underlying feature and label constructs we wish to capture where 𝑠 : 𝑋𝑁 → 𝑋𝑛 is the sampling
function. 𝑋 ′ and 𝑌 ′ are the measured feature and label proxies that are chosen to build a model,
where 𝑟 and 𝑡 are the projections from constructs to proxies, i.e., 𝑋 → 𝑋 ′ and 𝑌 → 𝑌 ′.
The function 𝑓ideal : 𝑋 → 𝑌 is the target function—learned using the ideal constructs from the
target population—but 𝑓actual : 𝑋 ′ → 𝑌 ′ is the actual function that is learned using proxies
measured from the development sample. Then, the function 𝑘 computes some evaluation
metric(s) 𝐸 for 𝑓actual on data 𝑋 ′𝑚, 𝑌 ′𝑚 (possibly generated by a different process, e.g., 𝐴eval in
Figure 2).
Given the learned function 𝑓actual, a new input example 𝑥, and any external, environmental
information 𝑧, a function ℎ governs the real-world decision 𝑑 that will be made (e.g., a human
decision-maker taking a model’s prediction and making a final decision).
Historical bias is defined by inherent problems with the distribution of 𝑋 and/or 𝑌 across the
entire population. Therefore, solutions that try to adjust 𝑠 by collecting more data (that then
undergoes the same transformation to 𝑋 ′ ) will likely be ineffective for either of these issues.
However, it may be possible to combat historical bias by designing 𝑠 to systematically over- or
under-sample 𝑋 and 𝑌, leading to a development sample with a different distribution that does not
reflect the same undesirable historical biases.
In contrast, representation bias stems either from the target population definition (𝑋𝑁 , 𝑌𝑁 ) or the
sampling function (𝑠). In this case, methods that adjust 𝑟 or 𝑡 (e.g., choosing different features or
labels) or 𝑔 (e.g., changing the objective function) may be misguided. Importantly, solutions that
do address representation bias by adjusting 𝑠 implicitly assume that 𝑟 and 𝑡 are acceptable and
that therefore, improving 𝑠 will mitigate the harm.
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022
80
Learning bias is an issue with the way 𝑓 is optimized, and mitigations should target the defined
objective(s) and learning process [24]. In addition, some sources of harm are connected: e.g.,
learning bias can exacerbate performance disparities on under-represented groups, so changing 𝑠
to more equally represent different groups/examples could also help prevent it.
Deployment bias arises when ℎ introduces unexpected behaviour affecting the final decision 𝑑.
Dealing with deployment bias is challenging since the function ℎ is usually determined by
complex real-world institutions or human decision-makers. Mitigating deployment bias might
involve instituting a system of checks and balances in which users balance their faith in model
predictions with other information and judgements [26]. This might be facilitated by choosing an
𝑓 that is human-interpretable, or by developing interfaces that help users understand model
uncertainty and how predictions should be used.
Finally, there is a risk of exploitation by bad actors. Those who intentionally and willfully post
misleading or dangerous material will not be deterred by an algorithmic warning. Instead, they
could use the warnings to help them craft harmful posts that fall just below the threshold of
algorithmic detection.
5. RECOMMENDATIONS
Here is an overview of some challenges and potential solutions regarding the development and
deployment of AI model.
5.1. Simple Models are Effective
If an application only requires detecting the difference between a few different objects with high
certainty, even simple detectors can do the task. Users can benefit greatly once they realize that
their applications can be solved for a fraction of the computational complexity with much simpler
models than what’s on the forefront of research.
5.2. Leverage Existing Models
As existing models already exist for almost every application, rather than reinventing the wheel,
it’s often much easier to start with a network based on one of these architectures. Moreover,
starting with a known model will reduce the amount of time, data, and effort to train a model,
since it’s possible to retrain existing models in a process called ‘transfer learning.’
5.3. Integrate Quantization Early
Quantizing a model down from multi-byte precisions to a single-byte can multiply inference
speed with little to no degradation in accuracy. For example, frameworks such as PyTorch expose
their own methods for quantizing models, but they’re not always compatible with each other.
Regardless of the approach taken, the aim should be to quantize from the outset of developing the
model in a consistent way.
6. CONCLUSION
This paper provides a framework for understanding the sources of downstream harm caused by
ML systems to facilitate productive communication around potential issues. By framing sources
of downstream harm through the data generation, model building, evaluation, and deployment
processes, we encourage application-appropriate solutions rather than relying on broad notions of
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022
81
what is fair. Fairness is not one-size-fits-all; knowledge of an application and engagement with its
stakeholders should inform the identification of these sources.
In practice, ML is an iterative process with a long and complicated feedback loop. This paper
highlighted problems that manifest through this loop, from historical context to the process of
benchmarking models to their final integration into real-world processes.
REFERENCES
[1] Agre, P. E. (1994). Surveillance and capture: Two models of privacy. The Information Society,
10(2), 101–127.
[2] Allen, J. (2016). Topologies of power. Beyond territory and networks. Routledge.
[3] Bratton, B. (2015). The Stack: On software and sovereignty. MIT Press.
[4] Bucher, T. (2018). If...then: Algorithmic power and politics. Oxford University Press.
[5] Castañeda, L., & Selwyn, N. (2018). More than tools? Making sense of the ongoing digitizations of
higher education. International Journal of Educational Technology in Higher Education, 15(1).
[6] Decuypere, M. (2019a). Open Education platforms: Theoretical ideas, digital operations and the
figure of the open learner. European Educational Research Journal, 18(4), 439–460.
[7] Decuypere, M. (2019b). Researching educational apps: ecologies, technologies, subjectivities znd
learning regimes. Learning, Media and Technology, 44(4), 414–429.
[8] Decuypere, M. (2019c). STS in/as education: where do we stand and what is there (still) to gain?
Some outlines for a future research agenda. Discourse: Studies in the Cultural Politics of Education,
40(1), 136–145
[9] Dieter, M., Gerlitz, C., Helmond, A., Tkacz, N., Vlist, F., Der, V., & Weltevrede, E. (2018). Store,
interface, package, connection : Methods and propositions for multi-situated app studies. CRC
Media of Cooperation Working Paper Series No 4.
[10] Drucker, J. (2020). Visualization and Interpretation: Humanistic Approaches to Display. MIT Press.
Journal of New Approaches in Educational Research, 10(1)
[11] Mathias, Decuypere The Topologies of Data Practices: A Methodological Introduction Fedorova, K.
(2020). Tactics of Interfacing. Encoding Affect in Art and Technology. MIT Press. Goriunova, O.
(2019). The Digital Subject: People as Data as Persons. Theory, Culture & Society, 36(6), 125–145.
[12] Ruppert, E. (2020). Population Geometries of Europe: The Topologies of Data Cubes and Grids.
Science, Technology, & Human Values, 45(2), 235–261.
[13] Gulson, K. N., Lewis, S., Lingard, B., Lubienski, C., Takayama, K., & Webb, P. T. (2017). Policy
mobilities and methodology: a proposition for inventive methods in education policy studies.
Critical Studies in Education, 58(2), 224–241.
[14] Gulson, K. N., & Sellar, S. (2019). Emerging data infrastructures and the new topologies of
education policy. Environment and Planning D: Society and Space, 37, 350–366.
[15] Hartong, S. (2020). The power of relation-making: insights into the production and operation of
digital school performance platforms in the US. Critical Studies in Education, 00(00), 1–16.
[16] Hartong, S., & Förschler, A. (2019). Opening the black box of data-based school monitoring: Data
infrastructures, flows and practices in state education agencies. Big Data & Society, 6(1),
[17] Lash, S. (2012). Deforming the Figure: Topology and the Social Imaginary. Theory, Culture &
Society, 29(4-5), 261–287.
[18] Latour, B. (1986). Visualization and cognition: Thinking with eyes and hands. Knowledge &
Society, 6, 1–40. Retrieved from http://hci.ucsd.edu/10/readings/Latour(1986).pdf
[19] Law, J. (2004). After Method: Mess in Social Science Research. Psychology Press.
[20] Lewis, S. (2020). Providing a platform for “what works”: Platform-based governance and the
reshaping of teacher learning through the OECD’s PISA4U. Comparative Education, 56(4).
[21] Lewis, S., & Hardy, I. (2017). Tracking the Topological: The Effects of Standardised Data Upon
Teachers’ Practice. British Journal of Educational Studies, 65(2), 219–238.
[22] Light, B., Burgess, J., & Duguay, S. (2018). The walkthrough method: An approach to the study of
apps. New Media and Society, 20(3), 881–900.
[23] Lindh, M., & Nolin, J. (2016). Information We Collect: Surveillance and Privacy in the
Implementation of Google Apps for Education. European Educational Research Journal, 15(6),
International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022
82
Lury, C., & Day, S. (2019). Algorithmic Personalization as a Mode of Individuation. Theory,
Culture & Society, 36(2), 17–37.
[24] Mathias, Decuypere The Topologies of Data Practices: A Methodological Introduction Lury, C.,
Fensham, R., Heller-Nicholas, A., & Lammes, S. (2018). Routledge Handbook of Interdisciplinary
Research Methods. Routledge.
[25] Lury, C., Parisi, L., & Terranova, T. (2012). Introduction: The Becoming Topological of Culture.
Theory, Culture & Society, 29(4-5), 3–35.
[26] Lury, C., Tironi, M., & Bernasconi, R. (2020). The Social Life of Methods as Epistemic Objects:
Interview with Celia Lury. Diseña, 16, 32–55.
[27] Lury, C., & Wakeford, N. (2012). Introduction: A perpetual inventory. Inventive Methods (pp. 15–
38). Routledge.
[28] Martin, L., & Secor, A. J. (2014). Towards a post-mathematical topology. Progress in Human
Geography, 38(3), 420–438.
[29] Piattoeva, N., & Saari, A. (2020). Rubbing against data infrastructure(s): methodological
explorations on working with(in) the impossibility of exteriority. Journal of Education Policy,
00(00), 1–21.
[30] Plantin, J. C., Lagoze, C., Edwards, P. N., & Sandvig, C. (2018). Infrastructure studies meet
platform studies in the age of Google and Facebook. New Media and Society, 20(1), 293–310.
[31] Prince, R. (2017). Local or global policy? Thinking about policy mobility with assemblage and
topology. Area, 49(3), 335–341.
[32] Ratner, H. (2019). Topologies of Organization: Space in Continuous Deformation. Organization
Studies, 1–18.
[33] Ratner, H., & Gad, C. (2019). Data warehousing organization: Infrastructural experimentation with
educational governance. Organization, 26(4), 537–552.
[34] Ratner, H., & Ruppert, E. (2019). Producing and projecting data: Aesthetic practices of government
data portals. Big Data & Society, 6(2), 1–16.
[35] Ruppert, E., Law, J., & Savage, M. (2013). Reassembling Social Science Methods: The Challenge
of Digital Devices. Theory, Culture & Society, 30(4), 22–46.
[36] Suchman, L. (2012). Configuration. In C. Lury & N. Wakeford (Eds.), Inventive Methods: The
Happening of the Social (pp. 48–60). Taylor and Francis.
[37] Thompson, G., & Cook, I. (2015). Becoming-topologies of education: deformations, networks and
the database effect. Discourse: Studies in the Cultural Politics of Education, 36(5), 732–748.
[38] Thompson, G., & Sellar, S. (2018). Datafication, testing events and the outside of thought.
Learning, Media and Technology, 43(2), 139–151.
[39] van de Oudeweetering, K., & Decuypere, M. (2019). Understanding openness through (in)visible
platform boundaries: a topological study on MOOCs as multiplexes of spaces and times.
International Journal of Educational Technology in Higher Education, 16(1).
[40] van de Oudeweetering, K., & Decuypere, M. (2020). In between hyperboles: forms and formations
in Open Education. Learning, Media and Technology, Advance online publication, 1–18.
[41] Williamson, B. (2017). Learning in the “platform society”: Disassembling an educational data
assemblage. Research in Education, 98(1), 59–82.
AUTHORS
Ayse received her MSc in Internet Studies in University of Oxford in 2006. She participated in various
research projects for UN, Nato and the EU regarding HCI (human-computer interaction). She completed
her doctorate degree in user experience design in Oxford while working as an adjunct faculty member at
Bogazici University in her home town Istanbul. Ayse has also a degree in Tech Policy from Cambridge
University. Currently, Ayse lives in Silicon Valley where she works as a visiting scholar for Google on
human-computer interaction design.
Ad

More Related Content

What's hot (18)

ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ijesajournal
 
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...
ijaia
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data Mining
IRJET Journal
 
Identification of important features and data mining classification technique...
Identification of important features and data mining classification technique...Identification of important features and data mining classification technique...
Identification of important features and data mining classification technique...
IJECEIAES
 
Biometric Identification and Authentication Providence using Fingerprint for ...
Biometric Identification and Authentication Providence using Fingerprint for ...Biometric Identification and Authentication Providence using Fingerprint for ...
Biometric Identification and Authentication Providence using Fingerprint for ...
IJECEIAES
 
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
IRJET Journal
 
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
RSIS International
 
A scenario based approach for dealing with
A scenario based approach for dealing withA scenario based approach for dealing with
A scenario based approach for dealing with
ijcsa
 
COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...
COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...
COMPARATIVE ANALYSIS OF DIFFERENT MACHINE LEARNING ALGORITHMS FOR PLANT DISEA...
International Research Journal of Modernization in Engineering Technology and Science
 
Prediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural NetworkPrediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural Network
rahulmonikasharma
 
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDYMACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
IAEME Publication
 
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
ijtsrd
 
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONDEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
ijaia
 
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...
Devansh16
 
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET Journal
 
A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system
IJECEIAES
 
R-PI BASED DETECTION OF LUNG CANCER USING MRI IMAGE
R-PI BASED DETECTION OF LUNG CANCER USING MRI IMAGER-PI BASED DETECTION OF LUNG CANCER USING MRI IMAGE
R-PI BASED DETECTION OF LUNG CANCER USING MRI IMAGE
International Research Journal of Modernization in Engineering Technology and Science
 
Cw36587594
Cw36587594Cw36587594
Cw36587594
IJERA Editor
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ijesajournal
 
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...
A NOVEL SCHEME FOR ACCURATE REMAINING USEFUL LIFE PREDICTION FOR INDUSTRIAL I...
ijaia
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data Mining
IRJET Journal
 
Identification of important features and data mining classification technique...
Identification of important features and data mining classification technique...Identification of important features and data mining classification technique...
Identification of important features and data mining classification technique...
IJECEIAES
 
Biometric Identification and Authentication Providence using Fingerprint for ...
Biometric Identification and Authentication Providence using Fingerprint for ...Biometric Identification and Authentication Providence using Fingerprint for ...
Biometric Identification and Authentication Providence using Fingerprint for ...
IJECEIAES
 
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
IRJET Journal
 
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...Comparative Analysis: Effective Information Retrieval Using Different Learnin...
Comparative Analysis: Effective Information Retrieval Using Different Learnin...
RSIS International
 
A scenario based approach for dealing with
A scenario based approach for dealing withA scenario based approach for dealing with
A scenario based approach for dealing with
ijcsa
 
Prediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural NetworkPrediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural Network
rahulmonikasharma
 
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDYMACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
IAEME Publication
 
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
ijtsrd
 
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATIONDEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
DEEP-LEARNING-BASED HUMAN INTENTION PREDICTION WITH DATA AUGMENTATION
ijaia
 
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...
Paper Annotated: SinGAN-Seg: Synthetic Training Data Generation for Medical I...
Devansh16
 
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET- Sentiment Analysis to Segregate Attributes using Machine Learning Tech...
IRJET Journal
 
A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system A novel ensemble modeling for intrusion detection system
A novel ensemble modeling for intrusion detection system
IJECEIAES
 

Similar to MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML (20)

Fake News Detection using Deep Learning
Fake News Detection using Deep LearningFake News Detection using Deep Learning
Fake News Detection using Deep Learning
NIET Journal of Engineering & Technology (NIETJET)
 
Ijatcse71852019
Ijatcse71852019Ijatcse71852019
Ijatcse71852019
loki536577
 
A Survey of Security of Multimodal Biometric Systems
A Survey of Security of Multimodal Biometric SystemsA Survey of Security of Multimodal Biometric Systems
A Survey of Security of Multimodal Biometric Systems
IJERA Editor
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
PhD Assistance
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
ijsc
 
Adversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A SurveyAdversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A Survey
CSCJournals
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ijesajournal
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ijesajournal
 
22903-56094---conference-presentation.pptx
22903-56094---conference-presentation.pptx22903-56094---conference-presentation.pptx
22903-56094---conference-presentation.pptx
AarthiE9
 
Automated machine learning: the new data science challenge
Automated machine learning: the new data science challengeAutomated machine learning: the new data science challenge
Automated machine learning: the new data science challenge
IJECEIAES
 
A Study on Machine Learning and Its Working
A Study on Machine Learning and Its WorkingA Study on Machine Learning and Its Working
A Study on Machine Learning and Its Working
IJMTST Journal
 
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
IJCNCJournal
 
Machine Learning The Powerhouse of AI Explained.pdf
Machine Learning The Powerhouse of AI Explained.pdfMachine Learning The Powerhouse of AI Explained.pdf
Machine Learning The Powerhouse of AI Explained.pdf
CIO Look Magazine
 
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
eswaralaldevadoss
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
Temok IT Services
 
Secure Machine Learning: Overcoming Algorithmic Bias, Adversarial ML, and Oth...
Secure Machine Learning: Overcoming Algorithmic Bias, Adversarial ML, and Oth...Secure Machine Learning: Overcoming Algorithmic Bias, Adversarial ML, and Oth...
Secure Machine Learning: Overcoming Algorithmic Bias, Adversarial ML, and Oth...
Hunter Carlisle
 
Advanced Machine Learning- Introduction to Machine Learning
Advanced Machine Learning- Introduction to Machine LearningAdvanced Machine Learning- Introduction to Machine Learning
Advanced Machine Learning- Introduction to Machine Learning
Alamelu
 
Intrusion detection using incremental
Intrusion detection using incrementalIntrusion detection using incremental
Intrusion detection using incremental
ijmpict
 
IRJET- Breast Cancer Prediction using Deep Learning
IRJET-  	  Breast Cancer Prediction using Deep LearningIRJET-  	  Breast Cancer Prediction using Deep Learning
IRJET- Breast Cancer Prediction using Deep Learning
IRJET Journal
 
How to choose the right AI model for your application?
How to choose the right AI model for your application?How to choose the right AI model for your application?
How to choose the right AI model for your application?
Benjaminlapid1
 
Ijatcse71852019
Ijatcse71852019Ijatcse71852019
Ijatcse71852019
loki536577
 
A Survey of Security of Multimodal Biometric Systems
A Survey of Security of Multimodal Biometric SystemsA Survey of Security of Multimodal Biometric Systems
A Survey of Security of Multimodal Biometric Systems
IJERA Editor
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
PhD Assistance
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
ijsc
 
Adversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A SurveyAdversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A Survey
CSCJournals
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ijesajournal
 
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCEANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ANALYSIS OF SYSTEM ON CHIP DESIGN USING ARTIFICIAL INTELLIGENCE
ijesajournal
 
22903-56094---conference-presentation.pptx
22903-56094---conference-presentation.pptx22903-56094---conference-presentation.pptx
22903-56094---conference-presentation.pptx
AarthiE9
 
Automated machine learning: the new data science challenge
Automated machine learning: the new data science challengeAutomated machine learning: the new data science challenge
Automated machine learning: the new data science challenge
IJECEIAES
 
A Study on Machine Learning and Its Working
A Study on Machine Learning and Its WorkingA Study on Machine Learning and Its Working
A Study on Machine Learning and Its Working
IJMTST Journal
 
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
ADDRESSING IMBALANCED CLASSES PROBLEM OF INTRUSION DETECTION SYSTEM USING WEI...
IJCNCJournal
 
Machine Learning The Powerhouse of AI Explained.pdf
Machine Learning The Powerhouse of AI Explained.pdfMachine Learning The Powerhouse of AI Explained.pdf
Machine Learning The Powerhouse of AI Explained.pdf
CIO Look Magazine
 
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
Unlocking the Potential of Artificial Intelligence_ Machine Learning in Pract...
eswaralaldevadoss
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
Temok IT Services
 
Secure Machine Learning: Overcoming Algorithmic Bias, Adversarial ML, and Oth...
Secure Machine Learning: Overcoming Algorithmic Bias, Adversarial ML, and Oth...Secure Machine Learning: Overcoming Algorithmic Bias, Adversarial ML, and Oth...
Secure Machine Learning: Overcoming Algorithmic Bias, Adversarial ML, and Oth...
Hunter Carlisle
 
Advanced Machine Learning- Introduction to Machine Learning
Advanced Machine Learning- Introduction to Machine LearningAdvanced Machine Learning- Introduction to Machine Learning
Advanced Machine Learning- Introduction to Machine Learning
Alamelu
 
Intrusion detection using incremental
Intrusion detection using incrementalIntrusion detection using incremental
Intrusion detection using incremental
ijmpict
 
IRJET- Breast Cancer Prediction using Deep Learning
IRJET-  	  Breast Cancer Prediction using Deep LearningIRJET-  	  Breast Cancer Prediction using Deep Learning
IRJET- Breast Cancer Prediction using Deep Learning
IRJET Journal
 
How to choose the right AI model for your application?
How to choose the right AI model for your application?How to choose the right AI model for your application?
How to choose the right AI model for your application?
Benjaminlapid1
 
Ad

Recently uploaded (20)

David Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And PythonDavid Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And Python
David Boutry
 
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
SanjeetMishra29
 
OPTIMIZING DATA INTEROPERABILITY IN AGILE ORGANIZATIONS: INTEGRATING NONAKA’S...
OPTIMIZING DATA INTEROPERABILITY IN AGILE ORGANIZATIONS: INTEGRATING NONAKA’S...OPTIMIZING DATA INTEROPERABILITY IN AGILE ORGANIZATIONS: INTEGRATING NONAKA’S...
OPTIMIZING DATA INTEROPERABILITY IN AGILE ORGANIZATIONS: INTEGRATING NONAKA’S...
ijdmsjournal
 
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Optimizing Reinforced Concrete Cantilever Retaining Walls Using Gases Brownia...
Journal of Soft Computing in Civil Engineering
 
Zeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdfZeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdf
Saikumar174642
 
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control Monthly May 2025
Water Industry Process Automation & Control
 
Understand water laser communication using Arduino laser and solar panel
Understand water laser communication using Arduino laser and solar panelUnderstand water laser communication using Arduino laser and solar panel
Understand water laser communication using Arduino laser and solar panel
NaveenBotsa
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
Slide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptxSlide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptx
vvsasane
 
Construction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil EngineeringConstruction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil Engineering
Lavish Kashyap
 
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
AI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in RetailAI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in Retail
IJDKP
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
speedcomcyber25
 
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
22PCOAM16_MACHINE_LEARNING_UNIT_IV_NOTES_with_QB
Guru Nanak Technical Institutions
 
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdfDahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
PawachMetharattanara
 
Automatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and BeyondAutomatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and Beyond
NU_I_TODALAB
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
David Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And PythonDavid Boutry - Specializes In AWS, Microservices And Python
David Boutry - Specializes In AWS, Microservices And Python
David Boutry
 
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
🚀 TDX Bengaluru 2025 Unwrapped: Key Highlights, Innovations & Trailblazer Tak...
SanjeetMishra29
 
OPTIMIZING DATA INTEROPERABILITY IN AGILE ORGANIZATIONS: INTEGRATING NONAKA’S...
OPTIMIZING DATA INTEROPERABILITY IN AGILE ORGANIZATIONS: INTEGRATING NONAKA’S...OPTIMIZING DATA INTEROPERABILITY IN AGILE ORGANIZATIONS: INTEGRATING NONAKA’S...
OPTIMIZING DATA INTEROPERABILITY IN AGILE ORGANIZATIONS: INTEGRATING NONAKA’S...
ijdmsjournal
 
Zeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdfZeiss-Ultra-Optimeter metrology subject.pdf
Zeiss-Ultra-Optimeter metrology subject.pdf
Saikumar174642
 
Understand water laser communication using Arduino laser and solar panel
Understand water laser communication using Arduino laser and solar panelUnderstand water laser communication using Arduino laser and solar panel
Understand water laser communication using Arduino laser and solar panel
NaveenBotsa
 
acid base ppt and their specific application in food
acid base ppt and their specific application in foodacid base ppt and their specific application in food
acid base ppt and their specific application in food
Fatehatun Noor
 
Slide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptxSlide share PPT of SOx control technologies.pptx
Slide share PPT of SOx control technologies.pptx
vvsasane
 
Construction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil EngineeringConstruction Materials (Paints) in Civil Engineering
Construction Materials (Paints) in Civil Engineering
Lavish Kashyap
 
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
[PyCon US 2025] Scaling the Mountain_ A Framework for Tackling Large-Scale Te...
Jimmy Lai
 
AI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in RetailAI-Powered Data Management and Governance in Retail
AI-Powered Data Management and Governance in Retail
IJDKP
 
Machine foundation notes for civil engineering students
Machine foundation notes for civil engineering studentsMachine foundation notes for civil engineering students
Machine foundation notes for civil engineering students
DYPCET
 
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx698642933-DdocfordownloadEEP-FAKE-PPT.pptx
698642933-DdocfordownloadEEP-FAKE-PPT.pptx
speedcomcyber25
 
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdfDahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
Dahua Smart Cityyyyyyyyyyyyyyyyyy2025.pdf
PawachMetharattanara
 
Automatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and BeyondAutomatic Quality Assessment for Speech and Beyond
Automatic Quality Assessment for Speech and Beyond
NU_I_TODALAB
 
Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025Transport modelling at SBB, presentation at EPFL in 2025
Transport modelling at SBB, presentation at EPFL in 2025
Antonin Danalet
 
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
01.คุณลักษณะเฉพาะของอุปกรณ์_pagenumber.pdf
PawachMetharattanara
 
Personal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.pptPersonal Protective Efsgfgsffquipment.ppt
Personal Protective Efsgfgsffquipment.ppt
ganjangbegu579
 
Ad

MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML

  • 1. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022 DOI: 10.5121/ijaia.2022.13105 73 MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML Ayse Arslan Oxford Alumni of Northern California, Santa Clara, USA ABSTRACT Given the impact of Machine Learning (ML) on individuals and the society, understanding how harm might be occur throughout the ML life cycle becomes critical more than ever. By offering a framework to determine distinct potential sources of downstream harm in ML pipeline, the paper demonstrates the importance of choices throughout distinct phases of data collection, development, and deployment that extend far beyond just model training. Relevant mitigation techniques are also suggested for being used instead of merely relying on generic notions of what counts as fairness. KEYWORDS Fairness in machine learning, societal implications of machine learning, algorithmic bias, AI ethics, allocative harm, representational harm 1. INTRODUCTION Artificial Intelligence (AI) refers to the art of creating machines that are able to think and act like human-beings; or think and act reasonably. Every new technology brings with it questions of ethics and unintended consequences. Looking closely, we can see technologies like AI reflect humanity’s imperfections back to us. Technologies like AI can enhance, rather than reduce, the human experience if humanity can be added back into the digital world. This paper provides a framework for understanding different sources of harm throughout the ML life cycle in order to offer techniques for mitigations based on an understanding of the data generation and development processes rather than relying on generic assumptions of what being fair means. 2. EXISTING WORK An ML algorithm aims to find patterns in a (usually massive) dataset, and to apply that knowledge to make a prediction about new data points (e.g: photos, job applicant profiles, medical records etc.) (Cusumano et al., 2019; Parker, van Alstyne, & Choudary, 2016). As a result, problems can arise during the data collection, model development, and deployment processes that can lead to different harmful downstream consequences. This paper refers to the concept of “harm” or “negative consequences” caused by ML systems. ML (Machine Learning) can be defined as the overall process inferring in a statistical way from existing data in order to generalize to new, unseen data. Deep reinforcement learning—where machines learn by testing the consequences of their actions—combines deep neural networks with reinforcement learning, which together can be trained to achieve goals over many steps. Most machine learning algorithms are good at perceptive tasks such as recognizing a voice or a face. Yet, deep reinforcement learning can learn
  • 2. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022 74 tactical sequences of actions, things like winning a board game or delivering a package. In the real world, human-beings are able to very quickly parse complex scenes where simultaneously many aspects of common sense related to physics, psychology, language and more are at play. Basically, the machine learning process can be divided into the “training phase” and “test phase”: - During the training phase, the ML team gathers data, selects an ML architecture, and trains a model. In data poisoning attacks, the attacker inserts manipulated data into the training dataset. During training, the model tunes its parameters on the poisoned data and becomes sensitive to the adversarial perturbations they contain. A poisoned model will have erratic behavior at inference time. Backdoor attacks are a special type of data poisoning, in which the adversary implants visual patterns in the training data. After training, the attacker uses those patterns during inference time to trigger specific behavior in the target ML model. - In the test phase, the trained model is evaluated on examples it hasn’t seen before. Test phase or “inference time” attacks are the types of attacks that target the model after training. An attacker creates an adversarial example by starting with a normal input (e.g., an image) and gradually adding noise to it to skew the target model’s output toward the desired outcome (e.g., a specific output class or general loss of confidence). Another class of inference-time attacks tries to extract sensitive information from the target model. If the training data included sensitive information such as credit card numbers or passwords, these types of attacks can be very damaging. Also Having direct access to the model will make it easier for the attacker to create adversarial examples. Models are then built using the training data (not including the held-out validation data). As seen in Figure 1, a model is defined, and optimized on the training data. Test and benchmark data is used to evaluate it, and the final model is then integrated into a real-world context. This process is naturally cyclic, and decisions influenced by models affect the state of the world that exists the next time data is collected or decisions are applied. The red color indicate where in this pipeline different sources of downstream harm might arise.
  • 3. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022 75 Fig. 1. Overview of ML data generation and model development 2.1. Model Evaluation After the final model is chosen, the performance of the model on the test data is reported. The test data is not used before this step, to ensure that the model’s performance is a true representation of how it performs on unseen data. Aside from the test data, other available datasets — also called benchmark datasets — may be used to demonstrate model robustness or to enable comparison to other existing methods. 2.2. Model Post-processing Once a model has been trained, there are various post-processing steps that may needed. For example, if the output of a model performing binary classification is a probability, but the desired output to display to users is a categorical answer, there remains a choice of what threshold(s) to use to round the probability to a hard classification. 2.3. Model Deployment There are many steps that arise in deploying a model to a real-world setting. For example, the model may need to be changed based on requirements for explainability or apparent consistency of results, or there may need to be built-in mechanisms to integrate real-time feedback. Importantly, there is no guarantee that the population a model sees as input after it is deployed (here, we will refer to this as the use population) looks the same as the population in the development sample. The algorithms used to parse and analyze those data become commercial black boxes. Barocas et al. [4] provide a useful framework for thinking about how these consequences actually manifest,
  • 4. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022 76 splitting them into allocative harms (when opportunities or resources are withheld from certain people or groups) and representational harms (when certain people or groups are stigmatized or stereotyped). For example, algorithms that determine whether someone is offered a loan or a job [12, 36] risk inflicting allocative harm. We, human-beings are fallible in making unbiased decisions ourselves and algorithms can actually help us detect human-generated (and socially reinforced) discrimination (Kleinberg et al., 2020; Mullainathan, 2019). In order for an ML model to work well, the following simple steps can be implemented: 1. Train a classifier on labeled data. 2. The bigger classifier model then infers pseudo-labels on a much larger unlabeled dataset. 3. Then, it trains a larger classifier on the combined labeled and pseudo-labeled data, while also adding noise. 4. (Optional) Going back to step 2, the smaller model may be used a new classifier. One can view this as a form of self-training, because the model generates pseudo-labels with which it retrains itself to improve performance. One underpinning hypothesis is that the noise added during training not only helps with the learning, but also makes the model more robust. This approach is similar to knowledge distillation, which is a process of transferring knowledge from a large model to a smaller model. The goal of distillation is to improve speed in order to build a model that is fast to run in production without sacrificing much in quality compared to the bigger model. Fig. 2. Simple illustrations of the model and knowledge distillation. Knowledge distillation does not add noise during training (e.g., data augmentation or model regularization) and typically involves a smaller inference model. In contrast, one can think of it as the process of “knowledge expansion”. One strategy for training production models is to apply training twice (Fig. 2): - first to get a larger inference model T’ and then - to derive a smaller model S. In some cases, the training may need data augmentation, yet, in certain applications, e.g., natural language processing, such types of input noise are not readily available. For those applications, the training model can be simplified to have no noise. In that case, the above two-stage process becomes a simpler method: - First, the bigger model infers pseudo-labels on the unlabeled dataset from which is a new model (T’) that is of equal-or-larger size than the original model being trained. - The self-training phase is then followed by knowledge distillation to produce a smaller model for production.
  • 5. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022 77 3. SOURCES OF HARM IN ML This section explores each potential source of harm in-depth. Each subsection will detail where and how in the ML pipeline problems might arise, as well as a characteristic example. These categories are not mutually exclusive; however, identifying and characterizing each one as distinct makes them less confusing and easier to tackle. 3.1. Historical Bias Historical bias arises even if data is perfectly measured and sampled, if the world as it is or was leads to a model that produces harmful outcomes. Such a system, even if it reflects the world accurately, can still inflict harm on a population. Considerations of historical bias often involve evaluating the representational harm (such as reinforcing a stereotype) to a particular group. 3.2. Representation Bias Representation bias occurs when the development sample under-represents some part of the population, and subsequently fails to generalize well for a subset of the use population. Representation bias can arise in several ways: (1) When defining the target population, if it does not reflect the use population. Data that is representative of Boston, for example, may not be representative if used to analyze the population of Indianapolis. (2) When defining the target population, if contains under-represented groups. Say the target population for a particular medical dataset is defined to be adults aged 18-40. There are minority groups within this population: for example, people who are pregnant may make up only 5% of the target population. (3) When sampling from the target population, if the sampling method is limited or uneven. For example, the target population for modeling an infectious disease might be all adults, but medical data may be available only for the sample of people who were considered serious enough to bring in for further screening. As a result, the development sample will represent a skewed subset of the target population. In statistics, this is typically referred to as sampling bias. 3.3. Measurement Bias Measurement bias occurs when choosing, collecting, or computing features and labels to use in a prediction problem. For example, “creditworthiness” is an abstract construct that is often operationalized with a measureable proxy like a credit score. Proxies become problematic when they are poor reflections or the target construct and/or are generated differently across groups, which can happen when: (1) The proxy is an oversimplification of a more complex construct. Consider the prediction problem of deciding whether a student will be successful (e.g., in a college admissions context). Algorithm designers may resort to a single available label such as “GPA” [28], which ignores different indicators of success present in different parts of the population. (2) The method of measurement varies across groups. For example, consider factory workers at several different locations who are monitored to count the number of errors that occur
  • 6. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022 78 (i.e., observed number of errors is being used as a proxy for work quality). This can also lead to a feedback loop wherein the group is subject to further monitoring because of the apparent higher rate of mistakes [5, 17]. (3) The accuracy of measurement varies across groups. For example, in medical applications, “diagnosed with condition X” is often used as a proxy for “has condition X.” However, structural discrimination can lead to systematically higher rates of misdiagnosis or underdiagnosis in certain groups [23, 32, 35]. 3.4. Aggregation Bias A particular dataset might represent people or groups with different backgrounds, cultures or norms, and a given variable can mean something quite different across them. Aggregation bias can lead to a model that is not optimal for any group, or a model that is fit to the dominant population (e.g., if there is also representation bias). 3.5. Learning Bias Learning bias arises when modelling choices amplify performance disparities across different examples in the data [24]. For example, an important modelling choice is the objective function that an ML algorithm learns to optimize during training. Typically, these functions encode some measure of accuracy on the task (e.g., cross-entropy loss for classification problems or mean squared error for regression problems). 3.6. Evaluation Bias Evaluation bias occurs when the benchmark data used for a particular task does not represent the use population. Evaluation bias ultimately arises because of a desire to quantitatively compare models against each other. Such generalizations are often not statistically valid [38], and can lead to overfitting to a particular benchmark. 3.7. Deployment Bias Deployment bias arises when there is a mismatch between the problem a model is intended to solve and the way in which it is actually used. This often occurs when a system is built and evaluated as if it were fully autonomous, while in reality, it operates in a complicated socio- technical system moderated by institutional structures and human decision-makers (Selbst et al. [39] refers to this as the “framing trap”).
  • 7. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022 79 4. SUGGESTED MODEL AS A MITIGATION TECHNIQUE Fig. 3. Recommended Model Figure 3 provides an overview of the suggested mitigation technique. As shown in Figure 3., the data transformation sequence can be abstracted into a general process 𝐴. Let 𝑋 and 𝑌 be the underlying feature and label constructs we wish to capture where 𝑠 : 𝑋𝑁 → 𝑋𝑛 is the sampling function. 𝑋 ′ and 𝑌 ′ are the measured feature and label proxies that are chosen to build a model, where 𝑟 and 𝑡 are the projections from constructs to proxies, i.e., 𝑋 → 𝑋 ′ and 𝑌 → 𝑌 ′. The function 𝑓ideal : 𝑋 → 𝑌 is the target function—learned using the ideal constructs from the target population—but 𝑓actual : 𝑋 ′ → 𝑌 ′ is the actual function that is learned using proxies measured from the development sample. Then, the function 𝑘 computes some evaluation metric(s) 𝐸 for 𝑓actual on data 𝑋 ′𝑚, 𝑌 ′𝑚 (possibly generated by a different process, e.g., 𝐴eval in Figure 2). Given the learned function 𝑓actual, a new input example 𝑥, and any external, environmental information 𝑧, a function ℎ governs the real-world decision 𝑑 that will be made (e.g., a human decision-maker taking a model’s prediction and making a final decision). Historical bias is defined by inherent problems with the distribution of 𝑋 and/or 𝑌 across the entire population. Therefore, solutions that try to adjust 𝑠 by collecting more data (that then undergoes the same transformation to 𝑋 ′ ) will likely be ineffective for either of these issues. However, it may be possible to combat historical bias by designing 𝑠 to systematically over- or under-sample 𝑋 and 𝑌, leading to a development sample with a different distribution that does not reflect the same undesirable historical biases. In contrast, representation bias stems either from the target population definition (𝑋𝑁 , 𝑌𝑁 ) or the sampling function (𝑠). In this case, methods that adjust 𝑟 or 𝑡 (e.g., choosing different features or labels) or 𝑔 (e.g., changing the objective function) may be misguided. Importantly, solutions that do address representation bias by adjusting 𝑠 implicitly assume that 𝑟 and 𝑡 are acceptable and that therefore, improving 𝑠 will mitigate the harm.
  • 8. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022 80 Learning bias is an issue with the way 𝑓 is optimized, and mitigations should target the defined objective(s) and learning process [24]. In addition, some sources of harm are connected: e.g., learning bias can exacerbate performance disparities on under-represented groups, so changing 𝑠 to more equally represent different groups/examples could also help prevent it. Deployment bias arises when ℎ introduces unexpected behaviour affecting the final decision 𝑑. Dealing with deployment bias is challenging since the function ℎ is usually determined by complex real-world institutions or human decision-makers. Mitigating deployment bias might involve instituting a system of checks and balances in which users balance their faith in model predictions with other information and judgements [26]. This might be facilitated by choosing an 𝑓 that is human-interpretable, or by developing interfaces that help users understand model uncertainty and how predictions should be used. Finally, there is a risk of exploitation by bad actors. Those who intentionally and willfully post misleading or dangerous material will not be deterred by an algorithmic warning. Instead, they could use the warnings to help them craft harmful posts that fall just below the threshold of algorithmic detection. 5. RECOMMENDATIONS Here is an overview of some challenges and potential solutions regarding the development and deployment of AI model. 5.1. Simple Models are Effective If an application only requires detecting the difference between a few different objects with high certainty, even simple detectors can do the task. Users can benefit greatly once they realize that their applications can be solved for a fraction of the computational complexity with much simpler models than what’s on the forefront of research. 5.2. Leverage Existing Models As existing models already exist for almost every application, rather than reinventing the wheel, it’s often much easier to start with a network based on one of these architectures. Moreover, starting with a known model will reduce the amount of time, data, and effort to train a model, since it’s possible to retrain existing models in a process called ‘transfer learning.’ 5.3. Integrate Quantization Early Quantizing a model down from multi-byte precisions to a single-byte can multiply inference speed with little to no degradation in accuracy. For example, frameworks such as PyTorch expose their own methods for quantizing models, but they’re not always compatible with each other. Regardless of the approach taken, the aim should be to quantize from the outset of developing the model in a consistent way. 6. CONCLUSION This paper provides a framework for understanding the sources of downstream harm caused by ML systems to facilitate productive communication around potential issues. By framing sources of downstream harm through the data generation, model building, evaluation, and deployment processes, we encourage application-appropriate solutions rather than relying on broad notions of
  • 9. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022 81 what is fair. Fairness is not one-size-fits-all; knowledge of an application and engagement with its stakeholders should inform the identification of these sources. In practice, ML is an iterative process with a long and complicated feedback loop. This paper highlighted problems that manifest through this loop, from historical context to the process of benchmarking models to their final integration into real-world processes. REFERENCES [1] Agre, P. E. (1994). Surveillance and capture: Two models of privacy. The Information Society, 10(2), 101–127. [2] Allen, J. (2016). Topologies of power. Beyond territory and networks. Routledge. [3] Bratton, B. (2015). The Stack: On software and sovereignty. MIT Press. [4] Bucher, T. (2018). If...then: Algorithmic power and politics. Oxford University Press. [5] Castañeda, L., & Selwyn, N. (2018). More than tools? Making sense of the ongoing digitizations of higher education. International Journal of Educational Technology in Higher Education, 15(1). [6] Decuypere, M. (2019a). Open Education platforms: Theoretical ideas, digital operations and the figure of the open learner. European Educational Research Journal, 18(4), 439–460. [7] Decuypere, M. (2019b). Researching educational apps: ecologies, technologies, subjectivities znd learning regimes. Learning, Media and Technology, 44(4), 414–429. [8] Decuypere, M. (2019c). STS in/as education: where do we stand and what is there (still) to gain? Some outlines for a future research agenda. Discourse: Studies in the Cultural Politics of Education, 40(1), 136–145 [9] Dieter, M., Gerlitz, C., Helmond, A., Tkacz, N., Vlist, F., Der, V., & Weltevrede, E. (2018). Store, interface, package, connection : Methods and propositions for multi-situated app studies. CRC Media of Cooperation Working Paper Series No 4. [10] Drucker, J. (2020). Visualization and Interpretation: Humanistic Approaches to Display. MIT Press. Journal of New Approaches in Educational Research, 10(1) [11] Mathias, Decuypere The Topologies of Data Practices: A Methodological Introduction Fedorova, K. (2020). Tactics of Interfacing. Encoding Affect in Art and Technology. MIT Press. Goriunova, O. (2019). The Digital Subject: People as Data as Persons. Theory, Culture & Society, 36(6), 125–145. [12] Ruppert, E. (2020). Population Geometries of Europe: The Topologies of Data Cubes and Grids. Science, Technology, & Human Values, 45(2), 235–261. [13] Gulson, K. N., Lewis, S., Lingard, B., Lubienski, C., Takayama, K., & Webb, P. T. (2017). Policy mobilities and methodology: a proposition for inventive methods in education policy studies. Critical Studies in Education, 58(2), 224–241. [14] Gulson, K. N., & Sellar, S. (2019). Emerging data infrastructures and the new topologies of education policy. Environment and Planning D: Society and Space, 37, 350–366. [15] Hartong, S. (2020). The power of relation-making: insights into the production and operation of digital school performance platforms in the US. Critical Studies in Education, 00(00), 1–16. [16] Hartong, S., & Förschler, A. (2019). Opening the black box of data-based school monitoring: Data infrastructures, flows and practices in state education agencies. Big Data & Society, 6(1), [17] Lash, S. (2012). Deforming the Figure: Topology and the Social Imaginary. Theory, Culture & Society, 29(4-5), 261–287. [18] Latour, B. (1986). Visualization and cognition: Thinking with eyes and hands. Knowledge & Society, 6, 1–40. Retrieved from http://hci.ucsd.edu/10/readings/Latour(1986).pdf [19] Law, J. (2004). After Method: Mess in Social Science Research. Psychology Press. [20] Lewis, S. (2020). Providing a platform for “what works”: Platform-based governance and the reshaping of teacher learning through the OECD’s PISA4U. Comparative Education, 56(4). [21] Lewis, S., & Hardy, I. (2017). Tracking the Topological: The Effects of Standardised Data Upon Teachers’ Practice. British Journal of Educational Studies, 65(2), 219–238. [22] Light, B., Burgess, J., & Duguay, S. (2018). The walkthrough method: An approach to the study of apps. New Media and Society, 20(3), 881–900. [23] Lindh, M., & Nolin, J. (2016). Information We Collect: Surveillance and Privacy in the Implementation of Google Apps for Education. European Educational Research Journal, 15(6),
  • 10. International Journal of Artificial Intelligence and Applications (IJAIA), Vol.13, No.1, January 2022 82 Lury, C., & Day, S. (2019). Algorithmic Personalization as a Mode of Individuation. Theory, Culture & Society, 36(2), 17–37. [24] Mathias, Decuypere The Topologies of Data Practices: A Methodological Introduction Lury, C., Fensham, R., Heller-Nicholas, A., & Lammes, S. (2018). Routledge Handbook of Interdisciplinary Research Methods. Routledge. [25] Lury, C., Parisi, L., & Terranova, T. (2012). Introduction: The Becoming Topological of Culture. Theory, Culture & Society, 29(4-5), 3–35. [26] Lury, C., Tironi, M., & Bernasconi, R. (2020). The Social Life of Methods as Epistemic Objects: Interview with Celia Lury. Diseña, 16, 32–55. [27] Lury, C., & Wakeford, N. (2012). Introduction: A perpetual inventory. Inventive Methods (pp. 15– 38). Routledge. [28] Martin, L., & Secor, A. J. (2014). Towards a post-mathematical topology. Progress in Human Geography, 38(3), 420–438. [29] Piattoeva, N., & Saari, A. (2020). Rubbing against data infrastructure(s): methodological explorations on working with(in) the impossibility of exteriority. Journal of Education Policy, 00(00), 1–21. [30] Plantin, J. C., Lagoze, C., Edwards, P. N., & Sandvig, C. (2018). Infrastructure studies meet platform studies in the age of Google and Facebook. New Media and Society, 20(1), 293–310. [31] Prince, R. (2017). Local or global policy? Thinking about policy mobility with assemblage and topology. Area, 49(3), 335–341. [32] Ratner, H. (2019). Topologies of Organization: Space in Continuous Deformation. Organization Studies, 1–18. [33] Ratner, H., & Gad, C. (2019). Data warehousing organization: Infrastructural experimentation with educational governance. Organization, 26(4), 537–552. [34] Ratner, H., & Ruppert, E. (2019). Producing and projecting data: Aesthetic practices of government data portals. Big Data & Society, 6(2), 1–16. [35] Ruppert, E., Law, J., & Savage, M. (2013). Reassembling Social Science Methods: The Challenge of Digital Devices. Theory, Culture & Society, 30(4), 22–46. [36] Suchman, L. (2012). Configuration. In C. Lury & N. Wakeford (Eds.), Inventive Methods: The Happening of the Social (pp. 48–60). Taylor and Francis. [37] Thompson, G., & Cook, I. (2015). Becoming-topologies of education: deformations, networks and the database effect. Discourse: Studies in the Cultural Politics of Education, 36(5), 732–748. [38] Thompson, G., & Sellar, S. (2018). Datafication, testing events and the outside of thought. Learning, Media and Technology, 43(2), 139–151. [39] van de Oudeweetering, K., & Decuypere, M. (2019). Understanding openness through (in)visible platform boundaries: a topological study on MOOCs as multiplexes of spaces and times. International Journal of Educational Technology in Higher Education, 16(1). [40] van de Oudeweetering, K., & Decuypere, M. (2020). In between hyperboles: forms and formations in Open Education. Learning, Media and Technology, Advance online publication, 1–18. [41] Williamson, B. (2017). Learning in the “platform society”: Disassembling an educational data assemblage. Research in Education, 98(1), 59–82. AUTHORS Ayse received her MSc in Internet Studies in University of Oxford in 2006. She participated in various research projects for UN, Nato and the EU regarding HCI (human-computer interaction). She completed her doctorate degree in user experience design in Oxford while working as an adjunct faculty member at Bogazici University in her home town Istanbul. Ayse has also a degree in Tech Policy from Cambridge University. Currently, Ayse lives in Silicon Valley where she works as a visiting scholar for Google on human-computer interaction design.
  翻译: