Using AI in Industrial Systems: User Interfaces
This is a continuation of the Using AI in Industrial Systems: A High Level Architecture Overview article where I'll be diving deeper into the UI components of the whole system, why are they important and what impact do they have on the system as a whole.
If you don't remember the high level diagram overview from the main article, here's a refresher:
I'll be describing the UI components, and not just the one that ran in the local environment, but other UIs that were a part of the whole system as well.
User Interface
So, the main point of the user interface (UI) is to be a point or way of connecting the user and the computer. It allows the computer to output/present data to the user, and the user to input the data into the computer, thus controlling it in one way or the other. Overall user interaction with the computer is also called user experience (UX) and plays an important part when designing a UI.
Our UI could be roughly split into local and cloud UI. Where it's placed, depends on where it's running: either on an edge device locally, or in the cloud. I'll go through both.
Local UI
Local UI can be further split into device controller (DC) and monitoring UI.
Device controller UI (DCUI)
The main interface for controlling our devices was the device controller UI or DCUI.
It was developed as a fullstack containerized app, but I consider it mainly as a UI component.
Backend
The "backend" part was written in Go and was mainly just an interface to get the data to or from the DC.
Frontend
The frontend part was done in Angular and was fully configurable. The UI was running in a Chrome browser in kiosk mode, meaning it only showed our UI, and the user had no access to the underlying operating system everything was running on.
The access to the home screen could be protected with a PIN code, which would provide a quick and secure access to the DCUI.
The home and subsequent screens had a grid like pattern that would allow us to put widgets of various types (text and time displays, a/sync buttons, toggle switches, number inputs...) on the grid, thus allowing us to fully customise the UI to fit the client's needs.
The number of screens and types of widgets on each screen's grid were configured via a JSON file.
This UI also supported showing popups so that various info, warning or error messages could be shown to the user.
The home screen also had a special DC bar at the top, allowing the user to start, pause, resume or stop the device and to see what state the device was currently in.
Since the DCUI was developed as a web page, it was also accessible remotely via our VPN and SSH port forwarding, giving us the option of remote monitoring and troubleshooting.
We could also add HTML iframes into the DCUI's screens, meaning we could extend its functionality even further. For instance, we could show what the AI was seeing (images or video stream) to make sure the model was performing as it should. Or, we could add a different AI view where the user could help the AI learn on and classify new cases, giving the model feedback for improvement. Another option was showing the user a 3D view of the device and its current position with annotations showing what part of the device might be at fault.
Another benefit of the DCUI being set up as a web page, was that we could update the UI on-the-fly (deploy a new version of its Docker container), and just refresh the page (in the browser), without the need to stop the device from working. This vastly improved and simplified our UI deployment/updating process.
Since people of different nationalities and thus different native languages were using the DCUI, we also added support for instantaneous language change, without the need to restart or refresh anything.
Device Monitoring UI (DMUI)
The device monitoring UI was written in Angular and gave us access to the current device's metadata, logs, position (via a 3D model and 2D graphs) and video feeds. It also captured anomalies detected by the DC and allowed us to replay them with replaying all the above data around the time of the anomaly.
In order to access the UI, you had to login first. At first authentication was done via MS Azure, but later we migrated it to Keycloak because we wanted to give our clients the option to manage their own users.
Since the monitoring UI was developed as a web page, it was also accessible remotely via our VPN, giving us the option of remote monitoring and troubleshooting.
Recommended by LinkedIn
Operating System and AI Monitoring UIs
We were monitoring various devices' OS and AI stats to see if our deploys were running as they should, and to try to detect possible issues as soon as possible.
We used netdata for monitoring various devices' OS metrics which was good for the devices that were online, but wasn't really useful if the device went offline, because we could diagnose why (did it run out of memory, issues with the OS, a hardware fault...?). We were, however, able to spot some memory and CPU load issues using netdata and fix them before they became a bigger problem.
For AI monitoring we used Prometheus, first with Elasticsearch but later we switched to Grafana.
Cloud UI
The cloud UI was mainly a monitoring UI, but allowed us to manage fleets of our devices as well.
Fleet management UI
Since we were operating with multiple devices stationed at different locations, we needed a way to see all of their current statuses in a single UI, and to allow our clients to see the devices in their fleet as well to monitor their status.
Backend
The backend was used to store the current metadata coming from our devices, some basic device info, users and user-device groups. The latter (groups) allowed us to limit the users to see only the devices they were the same group in.
The BE was written in NodeJS and was running on our web server hosted on GCP.
Frontend
The frontend showed the current devices' metadata (also useful in cases when the device went offline since we could see its last reported status) and allowed for editing some basic device info, as well as manage users and user-device groups. It was written in Angular and was also running on our web server hosted on GCP.
The UI also had 3 access tiers: superuser (us), admin (the client's admin) and user (the client's users). Each tier allowed for different levels of management and access to devices.
In order to access the UI, you had to login first. At first authentication was done via MS Azure, but later we migrated it to Keycloak because we wanted to give our clients the option to manage their own users.
Data monitoring UI (DMUI)
We were also sending some of the data, like some devices' logs, to Elasticsearch to be processed and stored temporarily to enable remote monitoring and troubleshooting, especially in cases where the device was offline either due to connection issues or because of hardware malfunction.
The DMUI was mainly comprised of various Elasticsearch dashboards and machine learning (ML) graphs.
The dashboards showed various data like the numbers and rates of different error states the devices were in, their current and past statuses and some of the devices' logs.
The machine learning was used mainly for detecting anomalies in the devices' data, helping us detect issues before they became real problems, and sometimes giving our clients insight into the issues with their production lines they couldn't detect with their monitoring processes.
Cloud monitoring UIs
We used various cloud services from AWS, GCP, MS Azure and Elasticsearch to name a few, and we used their dashboards to monitor their stats, statuses and try to spot any issues with our deploys.
Learnings
Even though we thought our UIs were (very) intuitive, we had to go through quite some iterations to make it actually more intuitive for our users, mainly the not-that-technically-savvy device operators. It was an interesting and humbling experience trying to put myself in the shoes of a device operator, whose way of thinking and interacting with the device's UI doesn't really make sense, if you also don't really understand their priorities and workflows. It turns out that "intuitive" is a relative expression, if you don't really share the same experience and ways of thinking with the other person.
In the end, we didn't manage to polish our UI to such perfection in terms of making it so intuitive, that no one had problems with it, or caused problems because of misunderstanding it. What helped reduce users' errors and false input (because that's going to happen regardless of how well your UI is designed), was setting up more checks preventing the device's operation if we suspected things weren't set up as they should.
AI was also used to help in such cases. Adding it to our system increased overall complexity and opened a new can of worms by way of false positive detections. Sometimes the AI wouldn't let the device to continue operating because it thought there's a problem with its input or output, but there was none, just a false positive detection by the AI. That could happen because the model wasn't 100% accurate, or some environmental change, like a ray of sunlight hitting the production line just the right way, distorted the lighting of the image the AI processed too much. We even had cases of device operators shaking, moving or tilting our cameras, which messed the AI's "regions of interest", resulting in incorrect classifications.
Such false positive detections started the discussion of introducing an "AI bypass" option (via the UI), but that could mean that the AI would always be bypassed (based on what we knew how the device operators were using our devices), making it not just useless, but a disadvantage, since a lot of resources were spent to make the AI prevention system, making our solution more complex and expensive, but not any better for it. At that time, we decided to try to improve the AI model and not to implement the "AI bypass" option, but that option never really went away and was always present in the back of our minds, ready to be implemented if something really had to be done about this, and no better solution was possible.
Conclusion
UIs are an important component of any system, since that's usually the way the system is controlled. The UIs what we used were mainly used to either control our devices (also used externally by our clients), or to monitor them or their support services (mostly just for internal use).
All of the customer facing UIs were developed by our contractors, which I managed and coordinated to make sure that they delivered what we needed, and that all was according to our specs. The "web app" nature of these UIs allowed us remote access to them and possible updates with no downtime.
And UIs, regardless of how perfect and intuitive they seem to us, may be none of that to the rest of the users. You should keep that in mind when developing and testing them, to do so with the feedback of the actual end users, and maybe you won't need the AI prevention system to cover for the "less than perfect" UI 😉