A deliberate approach is essential for AI to be net positive, and that human-in-the-loop (HILT), and its natural successor reinforcement learning from human feedback (RLHF), are essential components of this.

What is HILT?  

To understand HILT, we must first understand machine learning. Machine learning is when the computer is able to learn from data. LLM (large language model) chatbots are made possible by machine learning. A concrete and easy to understand example of machine learning can be found in computer vision. If you want to teach the computer to recognize cars, you give it 10,000 pictures of cars, and 50,000 pictures that are not cars (boats, buildings, mailboxes, etc.) and you tell the computer, “Sort it out.” Thanks to machine learning, it learns to recognize cars and it does it better than it ever could without machine learning.  

But here’s the problem: once a computer is taught wrong, or makes an assumption on its own that is wrong, absent human intervention, it will keep doing it wrong. An example would be training the computer to recognize faces by telling it, “Put all these different faces into clusters, and we’ll call each cluster a different person”. Then, one day, the camera captures a picture of your face from a funny angle so that you look a bit like someone else in your office. Suddenly, you and this other person can end up in the same cluster and the computer could think the two of you are the same person. Seems like a minor problem…until that person is charged with a crime, and you are picked up by police.

That’s why machine learning needs to be supervised to be effective. When there is a lower confidence score where the computer isn’t sure if this is you or someone else, a human can be brought in to make the call. Now the computer has some important feedback and will do a better job in the future telling you and your coworker apart. 

This type of supervised learning is HILT. While the concept has been widely recognized as a crucial safeguard, as AI systems become increasingly complex and operate at breakneck speeds, the ability of humans to effectively monitor and respond to potential malfunctions is called into question. 

Who watches the watchers?

In certain scenarios, the reaction time of human overseers may simply be too slow to prevent costly errors or unintended consequences. This has led to the suggestion that AI itself may need to play a role in policing itself or other AI systems, especially in high-responsibility environments like security. 

While this idea has merit, it is not without its own risks and challenges. Relying too heavily on AI to monitor itself could potentially lead to compounding errors, biases, or blind spots, making it crucial to maintain a balance between AI-driven oversight and human judgment. 

Supervision means safety

Humans provide an essential safety net, offering judgment, intuition and domain expertise that can catch issues AI might miss, given the black box nature of current deep learning models and the risk of unintended biases from underlying data. Or, simply, as Michael Polanyi noted, human knowledge has a tacit dimension — we know more than we can tell. Human experts have valuable insights that are difficult to fully codify into AI. They can also adapt more flexibly to novel situations and exercise common sense reasoning in ways AI struggles with.

A hybrid approach combining AI automation with human oversight can be optimal in many cases. AI can handle the majority of monitoring tasks, doing the majority of the work; while humans provide high-level guidance, handle edge cases and make final decisions in high-stakes scenarios. This requires developing human-aligned AI systems. Techniques like runtime monitoring, formal verification and interpretability methods can help build trust and catch flaws. We also need to train human overseers to effectively monitor AI.

Security is a high responsibility use case that necessitates HILT

Security is a high responsibility use case for AI because it involves making decisions about human liberty, property and even life safety. It is an industry where HILT is visible, specifically as remote guarding, when it is paired with AI video surveillance. Teaming up humans with AI in this instance provides the best of both worlds.  

Computers can watch all the video all the time, without any degradation of performance, whereas humans inevitably fatigue from the monotonous task. The AI can detect when something potentially concerning occurs, e.g. there are people in the apartment parking garage after midnight. The AI, however, is not good at determining if the people in the garage are there appropriately or not. The act of unlocking a car door with a key versus a crowbar is too similar for AI to reliably differentiate This is when humans become the best final arbiters: if they determine that the people in the garage are in fact trying to break into cars, they can “talk down” to them over networked speakers. Then they can escalate to police, and otherwise intervene in other ways that AI can not be trusted to always do appropriately.

So, what comes next?

The next step in the above remote guarding example, is to take the humans’ judgment and actions and feed them into the computer so the computer can start to automate more and more of these tasks going forward. This is RLHF, which just means the system gets smarter over time as it benefits from the human training. 

Systems that efficiently incorporate RLHF from the start will be the ones that will win. By learning from human input, these systems will get smarter faster than systems that don’t. In high-stakes areas like security, human oversight will remain critical for longer. Maintaining a balance between autonomy and oversight will be key to ensuring the safety, reliability and alignment of these ever-evolving AI systems.

Is my AI racist?

Human-generated labels also play a crucial role in reducing bias in AI systems. By carefully curating and annotating training data with human perspectives, we can help ensure that the AI models learn from a more balanced and representative set of examples. This HILT approach to data preparation can help mitigate the risk of AI systems perpetuating or amplifying biases present in the original data. However, it is important to recognize that humans themselves can introduce their own biases.

For example, let’s say you want to build a system that can detect criminal behavior, like someone breaking into a car vs jiggling a stuck key. Now let’s assume you are training off of real security footage. It might be the case that one race is overrepresented in this video footage you collected. If so, then the system might accidentally train off of that race to predict criminal behavior. This might cause more people of that race to get called in for questioning/arrest/trial/conviction than others for the same behavior. 

The twist is that if your humans that are in the loop come with racial biases, then RLHF might make things worse and not better. Let’s say the humans tend to mark one racesubjects as participating in criminal behavior more often than others based on the same behavior, because of their own racial biases. In this case, RLHF would just perpetuate those human biases back into the model, rather than weed them out.

Staying on the right track

Is AI going to be an evil Skynet or a benign Oracle? Arguably that is up to us, and the approach we take right now, in terms of what we develop as engineers and what AI we as consumers elect to purchase and use. HILT, and its derivative RLHF, will help keep us on the right track.

Ultimately, by combining the strengths of AI automation with HILT and including RLHF, we will likely see the fastest and most responsible AI development across all industries.