Policy-As-Code: Keeping Up With The Pace Of Change
What is it? What can I do with it? How can I implement it?
It is not funny encountering a hallucinating system where the rules are inconsistent with reality!
IT Operations are evolving.
AI for IT operations (#AIOps) combines non-stationary #IT components, including Operations, #AI, and #Data. The processes are constantly in flux, and their context evolves, meaning organizations must curate the models and the supporting data to remain relevant across given timeframes. Keeping pace with this constant change requires deliberate policies.
AIOps leverage existing AI, Big Data, IT operations, infrastructure, and practices to gain insights and augment human-in-the-loop judgments. How do organizations guarantee good data and security hygiene?
AIOps demands a strategy that accepts the exploratory components of AI and socio-technical solutions. The outcomes and insights strengthen policies and implement continuous maintenance.
Traditional IT roles must change to handle the transformations.
How does a software-intensive system comply with regulatory concerns? How can a system learn to be fair?
What Is Policy-As-Code?
Organizations are collectives motivated by common interests and work together to achieve a common objective.
A #policy is a set of ideas or plans agreed upon by a collective mapping out what to do in particular situations.
Policies are implemented as statements of intent and formulated as executable procedures or protocols. They are generally studied and advanced by a governance body. Policies add structure to understanding and assist in making subjective and objective decisions.
The policies are codified in a set of syntaxes that form a parsable formal language expressing the underlying intentions. The cost of consistency and repeatability is learning a specific syntax.
Circa 2007, before Hudson and, later, Jenkins were household names, it was common for software-intensive organizations to implement a home-baked CI/CD pipeline. I was part of an in-house team developing software tools and figuring out how to build consistent data workflows for an enterprisewide CI/CD pipeline. The pipeline was affectionately called Daily Build Daily Test, DBDT.
Note that the Daily part of the name was thought of when the software builds took a long time. The thinking and the processes slowly evolved such that the temporal reference of the DBDT name was lost. The daily part did not signal the target endpoint. There were hundreds of builds in different verticals across the organization daily by the time the system matured.
A few colleagues and I were tasked with turning handbook-based policies into a firmwide automated CI/CD pipeline. I specifically focused on designing and developing the Firmware (Update) Over The Air (FOTA) delta generation methods and tools. I was tasked with seamlessly automating and integrating these tools into the DBDT pipeline.
One of the challenges was how to compare firmware labels. Even though there were centralized deliveries to the customer-facing organization, each domain could generate its firmware builds and potentially label it as desired. However, a central representative steering team developed common standards to which the domain teams strictly adhered.
For the case of the delta generation algorithm, comparisons along aspects of time are only meaningful for tags based on a temporal component. If the handbook policy used a naming structure that is not specific to the time, the edit distance between the labels might yield a valid signal.
In most cases, though, the firmware labels combine structured naming and a granular time component, allowing the metadata to be categorized according to originating domains.
One can safely assume the closest firmware binary deltas arose from builds more comparable in size and aligned in both aspects. The algorithm also dealt with the arbitration of any collisions encountered.
These delta generation tools automatically selected the most appropriate firmware binaries and generated deltas between them for V&V and production. This process packaged several complex subprocesses with defined quality gates into a tangible component with structured borders. The firmware metaheuristics, including the epistemic context and declared characteristics of the incoming builds, were used to predict which pairs would most likely produce a small delta to shorten the testing times.
The entry criteria must form a stringent quality gate to the intake process. Quantifiable quality metrics formed part of the onboarding verification enabled by deliberate upstream designs. Policies remove the need for human actors to assert the degree of quality and trust. The onboarding rules are reduced to passing through these quality gates in a #schema.
Self-describing manifests captured the metadata as the belief-set forward-declaring the minimum information required to form an opinion about the components. A similar technique is used across multiple domains where the head summarizes the valuable features of the payload.
Know Thy Data! Typically, an entire family of epistemic states obeys a set of system constraints. Methods for selecting the available options must be clear about how they arrive at the choices. Can the processing agent establish how sufficiently close to the truth the forward-declared metadata is?
A whole realm of theorems exists dealing with knowledge representation and how to reason about it. Effectively, models must represent what is possible in the restricted universe, which ideally must not overlap with the non-normal universe. Unlike propositional logic which only expresses facts, autoepistemic logic can express knowledge and lack of knowledge about facts.
As a well-known platform developer, the organization had developed a culture of not compromising on quality. They invested a lot in ensuring that everyone understood how their part mattered to the delivery of the whole. More importantly, the development workflow was clearly defined and communicated, so creating the triggers for each quality gate was relatively straightforward. The constituent entities were self-describing, self-registering, and discoverable.
They would update their presence information and status for the client systems to decide on availability and capabilities. The workflow involved iterating over a set of rules and constraints describing an execution policy.
The CI/CD pipeline featured rules and workflow schemas implemented using IBM Rational’s ClearQuest Test Manager (CQTM) as the core technology for managing the test infrastructure components. CQTM builds workflows and triggers around Test Plans, Test Cases, Test Requirements, Test Configurations, Test Scripts, and Test Results.
The CQTM schema implemented policy. A knowledge of self-describing languages such as #XML and #JSON helps provide an intuitive understanding of the problem. It can reduce the critical parts of it to a set of concepts without recourse to the dialects introduced by specific tools. This separation allows one to step away from optimizing examples and focus on the classes of problems instead.
Handbook policies, rules, and constraints captured in the schemas restricted the possible paths through the given context. The schema strictly validates the iterable object structures in the instance files. The automation involved building a parser to navigate this codified policy in a self-describing document.
For completeness, in psychology, a schema is a knowledge structure that allows organisms to interpret and understand their world. In the technology context, a schema is an outline, diagram, or model used to describe the structure of different types of data.
Automating Well-Behaved Workflows
Most software developers will, at one point, be expected to automate systems designed to control or influence a structured workflow.
Recommended by LinkedIn
#RPA, including e-Procurement systems, fall into this category, where monitoring and suitable control structures are automated to enforce specific procedures and checkpoints. This type of automation allows the implementation of policies-as-code.
Handbook-based policies are typically applied inconsistently.
They rely on parties reading, comprehending, and remembering how to use or enforce the said policies in their operations. On the other hand, automating procedures brings consistency but may risk making the processes too impersonal. An ideal automated approach must adapt to the nuances of non-stationary environments. To attain equipoise for all controllable components of the system, the automating agent must usually mutually counteract opposing objectives - a feat for which human actors could be better designed.
The schema implements the rules and steps as a chain within a document that enforces precedence. Unlike rules repeatedly enforced by human actors, the selection and parsing engine checks and validates the codified policies for correctness and consistency before execution.
Policy-as-code is code authored in a high-level language to manage and automate policies. By representing policies as code in text files, proven software development (#DevOps) and security (#DevSecOps) best practices can be adopted, such as version control, granular levels of trust, automated testing, and automated deployment. Where the rules are separated from the execution engine, the concern is on maintaining their integrity. Still, it allows flexible adaptation of the schema and the executable instances.
In ClearQuest and Jenkins workflows, events trigger when the agent executing the workflow satisfies the necessary conditions.
In #XML, #YAML, and #JSON, the body of the text document implements the pertinent rules allowing human inspections. That enables the needed transparency. The schema holds the document metadata, informing the parser how to consume the document if it conforms to the defined structure.
Of course, depending on the level of sensitivity, it is prudent to encrypt the schema and the XML or JSON document only to allow authorized parties to view it.
Automation tools allow stitching together services with little or no code.
Developers may realize simple system event triggers using predicate logic in the format, “IF condition x is met, THEN execute packaged service y” (where x may be a composition of conditions). While y needs to be exact, x may not have a crisp boundary. Crisp boundaries may exclude data and do not represent the entire field of possible values for most real-life conditions. Most real-world problems require a #fuzzy rule-based or probabilistic decision boundary. Professor Lotfi A. Zadeh and others have advanced rigorous mathematics on how to map the probabilistic decision boundary to a fuzzy decision boundary. In our case, however, the key is knowing and mapping the side effects of action y and finding the mathematical description of the outside of condition x. Outcome y may influence the environment in a way that skews observations or lead to an endless loop. (Translation: Stuff that breaks The Internet!) .
The challenge of identifying and optimizing these decision boundaries, coupled with the copious amounts of data and a large set of parameters to manipulate, motivates the need for AI and AIOps. AI allows organizations to discover various automation opportunities and practical process mining.
Who Needs It?
Robotic Process Automation, #RPA, applications can implement robust pipelines.
Processes are automated using virtual software robots. RPA, together with AI, opens up enormous potential for businesses. #AI models, including machine learning models, natural language processing (#NLP), and computer vision. New AI skills dramatically expand the ability of robots to handle cognitive processes.
Organizations may augment vanilla RPA with NLP categories, including Named Entity Recognition, Speech Comprehension, Document Understanding, and Document Summarization. As an active area of research, the use cases and implementation for NLP and other cognitive services keep evolving. In other words, tactical solutions can only go so far.
A structured and extensible architectural guideline must be in place to handle this organic evolution and maintain security hygiene.
Organizations may opt to apply RPA to embed Machine Intelligence into day-to-day operations enabling automated decision-making processes and analyses. With the recent advances in Large Language Models, for example, the front-end operations are screaming for a rapid rethink. Moving the categorization and automatic labeling closer to the data collection point reduces upstream burdens.
We have talked about smart contracts and the Data Mesh in previous articles. Both these technologies implement policies. The organization may choose to automate those policies enforce consistent procedures, and motivate adoption. Using mature tools and codifying best practices removes the focus on undifferentiated work and bridges the knowledge gaps.
Some Barriers To Adoption
Critical considerations for transforming policy into code must include checking validity. What is the definition of the policy? How can an organization guarantee that policy compliance is consistent?
The validity of policies and their assignments is measured against the expected conditions and non-compliant resources to mitigate the impact of unknown and undefined effects. Therefore, the surface to be covered by remediation tasks is much more extensive than can be covered by scripted tests. The test regime must include exploratory testing.
An objective analysis of business requirements will usually highlight that not all good intentions necessarily translate into executable code and must be removed.
In other words, since policy-as-code is the definition, management, and enforcement of policies using code, software engineering best practices, such as version control, must apply.
The dependency structure must be mapped and tracked. The delivery organization must understand the impact of code changes and their magnitude. Does the deployment handle sensitive data? A general lack of context leads to technical debt that is only easier to fix in the early stages.
The organization must understand the socio-technical aspects of affecting policy.
If stakeholders are not part of the conversation early enough, they may distrust automation they cannot thoroughly examine or one that takes away their center of control and influence. The speed and efficiency of applying policy-as-code depend on a collaborative development approach allowing visibility and traceability of business purpose and design decisions.
Generally, policy-as-code intends to remove hard dependencies on a single IT team.
Further, policies, the environment, and the organization’s needs evolve. Policy validation must therefore be deliberately designed as part of #DevOps and #DevSecOps to discover violations and inconsistencies when it is less costly. Policy-as-code provides the needed #agility to address the evolution of standards and regulatory concerns.
Where does one get started?
Understanding the environment and the business-level problem the organization is trying to solve is a good start. There is a standard format for articulating and effecting these concepts as architectural models. Adopting the C4 architectural framework, for example, helps to communicate to stakeholders the intention of the innovation. What is the context in which the execution containers and components will be interacting, and what technologies will give the desired results?
AIOps allows Big Data and ML techniques to analyze data and discover causal relationships, potential risks, trends, and insights.
Separating the symbiotic relationship between AIOps and policy-as-code would lose content.
For as long as policies implement organizational intent, they will always exist. Turning the policies into code improves the efficiency of their implementation and validation.
Artificial Intelligence| Big Data| Deploying Artificial Intelligence systems safely and responsibly.
2yCombining A.I and RPA unlocks possibilities for businesses to unlock new value (Better execution on business process and faster standardized delivery of customer service). Interested to see how the convergence of these technologies will increase productivity going forward