On November 27, 2023, eighteen countries including the United States and Britain published an international agreement establishing a goal of creating artificial intelligence systems that are "safe by design." There are no shortages of ideas for ways in which artificial intelligence technology could wreak havoc upon economies, societies and democracy. Fake imagery, fake video, cloned performances of famous artists being used instead of paying for performances of new artists, forged data in scientific research… The possibilities are endless.
A copy of the underlying design principles recommended by the conference is available here:
A similar worldwide discussion of the topic held in early November 2023 in Britain issued a similar declaration on AI safety. That report, termed the Bletchley Declaration, included a slightly wider set of participants including China and is available here as well:
Neither that earlier Bletchley Declaration or the November 27 memorandum to the masses define ANY concrete processes for actually enforcing such expectations. There are three very good reasons for that.
- AI engineers have yet to devise mechanisms by which an AI system can notify humans WHEN that AI system has synthesized a new capability (for good or bad).
- AI engineers have not standardized additional mechanisms allowing usage of AI system capabilities to be categorized to identify when a new capability is first USED to allow intents to be reveiwed for good or bad.
- Design of a system that can a) process terabytes of data and synthesize algorithms faster than humans and b) still allowing humans to understand and keep up with those evolving algorithms is a logically impossible set of requirements.
To understand why these limitations are insurmountable, it may help to first think explicitly about human learning behavior being mimicked by artificial intelligence, then consider unique challenges with computerized intelligence.
Imagine you are hiring a human to work as a customer service representative in a call center for Company A in Industry X. For that new worker to have any productivity at all, they require training in the following areas at a minimum:
- use of the company's internal phone system
- use of a keyboard, mouse and desktop PC operating system
- use of the company's internal customer support application for call center agents
- the company's products and services
The first day on the floor, the new worker will take far longer to talk with customers, navigate through the internal support system and resolve customer problems or close sales with new customers. However, through repetition and some occasional re-training, objective metrics on the worker's performance will likely improve.
- average call handle time will go down, approaching some lower bound average of more tenured agents
- accuracy of support will go up, as evidenced by fewer follow-up calls from the same customer for the same problem
- closed success rates of sales will go up, approaching some upper bound average of more tenured agents
While the worker's productivity may have improved through muscle memory and mastery of the vocabulary used on calls that map to their training, it's possible the worker hasn't really "learned" anything beyond the range of problems presented to them on customer calls and existing training. They may have learned nothing of other products and services sold by Company A. They may have learned nothing about Industry X in which A competes. The agent has no access to additional information, there may be physical network blocks configured to prevent them from surfing anywhere else while on duty to learn anything new about any other topic and the rewards of the job may provide no incentive for the worker to exert the effort to learn anything else.
What if the worker DID have an incentive to "learn" something new about their job or its larger context? What would the behaviors of that worker look like? Maybe actions like:
- surfing to a competitor's web site and reviewing their products, services, prices?
- surfing to sites summarizing statistics on income by ZIP code to provide insight on likely income of prospective customers to steer them to more expensive options?
- surfing to sites summarizing safety recalls to bad mouth competitors' products to protect a potential sale for the company and commission for their own benefit?
- surfing to other internal web sites, Sharepoint sites, etc. within the company to learn about information not formally associated with their job scope or paygrade?
- surfing to YouTube to watch videos of MIT engineering classes to begin learning about computer science
Monitoring that worker's physical actions as reflected in electronic / network data over an eight hour shift – which IS what happens every day, all day for ALL workers in Corporate America, by the way… -- would make it clear if that worker is voluntarily remaining confined to their original defined job scope. But if that same electronic / network data ("telemetry") reflects additional actions being performed consistently all day in addition to the original minimum job description, it would be extremely difficult to definitively discern the INTENT of that worker in performing those actions. Is the worker exhibiting initiative to:
- perform more effectively for their originally assigned goals?
- perform more effectively to promote their own pay progression and career advancement WITHIN the company?
- perform more effectively to promote their own pay progression and career advancement OUTSIDE the company with a competitor or in another industry entirely?
- scout out internal systems susceptible to attack by outside parties the worker is actually partnered with as part of a concerted, long-term security attack and data heist?
The telemetry available may not necessarily confirm the worker's intent. In an abundance of caution, the worker's permissions can be locked down to only provide access to the bare minimum systems needed for core functions. That's exactly what happens in most companies, at least at lower levels. However, that approach doesn't work for higher level employees who might need to download source code from public libraries, conduct technical or business research on an industry or competitor, etc.
If it is difficult to precisely ascertain motive with a human worker whose "telemetry" triggered danger signs of working beyond their job scope or outright partnering with outside criminals for inside information and access, it will be more difficult to do so with AI systems. To understand why, the telemetry problem itself still has to be considered. What would available telemetry look like for an AI system? Very similar to the telemetry for a human.
Many systems in use in corporate environments today use techniques that have been termed "machine learning" but in a ten-year old, 2014 sense, not in a present-day 2023 sense. A decade ago, systems using "machine learning" combined "Big Table" based database technologies such as Hadoop, Cassandra, MongoDB, etc. as an underlying data store and used dozens / hundreds of processor cores to run services that scanned across all of the terabytes of data in those stores looking for patterns between the columns of database A and database B. That pattern might result in the synthesis of some new aggregate value of a variable in a model which itself might then yield correlations to other variables in the models of the data, yielding another business insight. The "machine learning" at the time excelled at performing these raw data manipulations and analysis without requiring explicit programing to FIND them. However, it still lacked the ability to RESPOND to a new correlation / insight then ALTER an existing system to do something new in response to that pattern. That still required manual software development by humans, testing and deployment.
As an example, your telco or cable provider might have loaded network performance data for your neighborhood with trouble call data to realize that 90% of the time, actual trouble calls were preceded by a 10% drop in signal levels as measured from their last equipment location to your home. Having identified that pattern, HUMANS can add code to watch for those signals from that monitoring tool to open work tickets in the company's separate dispatch system to send an outside repair tech to fix a shared problem hours before customers begin calling in. That saves the company call volume, reduces call handling time with agents and avoids customer dissatisfaction that might lead customers to unsubscribe.
In that scenario, the "machine learning" can find dozens of patterns like that and summarize them for humans to take the initiative to add integrations for the recommended action for each new "aha" insight. But the system that did that "machine learning" couldn't synthesize that code on its own, add the new integrations, deploy them, then tweak thresholds in production to optimize the result without human intervention. Even in 2014, most people referred to this approach as "model training" rather than "machine learning." Only the people trying to sell such systems to senior executives attempted to call it machine learning.
In the 2023 senses of the terms machine learning and artificial intelligence, a system may be initially primed by pointing it at giant terabyte / exabyte scale data sources – government statistics, internal corporate accounting systems, network monitoring systems, public source code libraries, PRIVATE source code libraries, search engine APIs, etc. The key difference with current AI systems is they have not only been trained on "data" as one might normally think of it (network data, economic data, literature, social statistics, stuff you can plot on a graph) but terabytes of "language" – both human prose and computer source code.
Training of these "large language models" not only reflects plots of novels and news stories but source code and content from millions of entries on engineering support pages in which questions ("how do I do X in scenario Y?") are followed by numerous answers and examples of (supposedly) working code. Training against those additional datasets allowed AI systems to accept free-form text from a user and map it to prose answers and working source code. Once THAT capability is developed, the API accepting human input can be called by the API itself to take the human out of the loop and recursively iterate through a solution.
Having been pointed at terabytes of open-source code, modern AI systems have the ability to map higher level non-technical statements of intent to lower, technical actions. For example, a non-technical intent might read as this:
find the 100 most expensive zip codes as measured by
2022 home sale values
That non-technical intent might be linked in the AI system's "large language model" to a blog post or SQL script shared on GitHub that might contain this SQL command:
SELECT ZIPS.CODE, ZIPS.STATE, ZIPS.CITY, REALESTATE.ZIP,
LEFT JOIN REALESTATE ON ZIPS.CODE=REALESTATE.ZIP
LIMIT 100 DESCENDING
That snippet might clue the language model processing a request to scan its other data for the names and endpoints of data sources that have ZIP code data and databases that have real estate transactions with ZIP, year and price columns. Even if the column names don't match, the AI model knows what a ZIP code looks like, it knows how years might be reflected in date columns ("2023-12-31" or "12-31-2023" or "31-12-2023") and it might look for columns having a $xx,000 to $xx,000,000 scale numeric value, test some assumptions and find the desired data available somewhere. If it can't find the data pre-averaged but can find the raw data, it can run its own summarization query against the raw data and produce the same information.
Conceptually, the "telemetry" produced by an AI system attempting to perform these tasks to synthesize an answer to this intent is identical to that produced if a human performed the same task. It differs solely in the speed at which all of those steps can be sequenced and performed. The AI system can initiate thousands of external enrichment requests per second. With this iterative capability, the key factors limiting the speed at which the AI can expand its capabilities are:
- available processing power
- available storage for interim results and final models
- network permissions to connect to remote systems
- available network bandwidth to reach external systems and push data around internally
The problem is that systems large enough to near the ability to create artificial general intelligence are going to be large enough in physical scope that humans must inevitably rely upon software to monitor and maintain those systems. (We already do – no human can keep up with software security patches on a fleet of 1000 servers that might require monthly patches.) Most large scale data centers already leverage "machine learning" based systems to monitor security logs to watch for signs of external infiltration and inside data leaks.
As that complexity rises, it is inevitable that systems CONTROLLING artificial intelligence become RELIANT upon artificial intelligence to secure them. At that point, the complexity reaches a point where virtually no human will have the ability to OBSERVE a change in the AI system's "behavior" as reflected in its telemetry and UNDERSTAND in real time why that change occurred to allow it or block it. Did system behavior change because a human made a mistake or prodded it do do something undesired from within or without? Or did the system's current internal "intelligence state" synthesize a new intent looking for some new connection only it currently understands?
At this point, the ghost in the machine will become largely unknowable to humans.
Aha Telemetry / Grok Days Metrics
If new AI capabilities were going to be detected and kept in check by humans, what would the bare minimum controls needed to constrain an AI system be? The guidelines published in the November 27, 2023 agreement make reference to the need for "embracing radical transparency and accountability". What might such measures look like? Based on the descriptions above, at least two core capabilities would be required, with new tongue-in-cheek terminology and definitions provided here.
The first capability would be a continuous stream of performance statistics that will be termed here "aha telemetry" or AHAT. Every AI system already generates raw metrics on physical resources being consumed by the AI system by its external users and its internal iterative learning processes. AHAT telemetry would expand upon that by generating specific milestone markers as cross-references BETWEEN models within the system cross thresholds based upon their references to models in external AI systems or inward references from external AI systems. Presumably, more inward / outward references are a reflection of additional "learnings" being formulated, either in response to a human input or a machine generated "learning" episode – an "aha moment."
To make such AHAT data interchangeable across vendors selling AI systems and organizations operating AI systems, a standardized scheme for labeling data constructs being assembled / tuned by the AI for its use would need to be imposed. At a minimum, the scheme might reflect a taxonomy system similar to KPCOFGS (kingdom / phylum / class / order / family / genus / species) used in biology. The larger AHAT telemetry message might look something like this:
- crossreferencetrigger – machine / human
- aienvironment (development / lab / test / live)
- airestrictionlevel (self / privateorg / corporateorg / openout / openin / openoutin)
It might be necessary for governments to require operators of AI systems to obtain unique, cryptographically protected signatures ("AI identifiers") to "sign" all such AHA notifications before sharing to ensure they accurately reflect the restricted state of the system for outward and inward access. (ISPs already have a unique identifier in Border Gateway Protocol or BGP used in IP networks to tag the routes they release to their peers so this is not a novel concept.)
Related to such "aha telemetry" might be a second metric that needs to evolve to reflect mankind's relative position on the AI danger scale. Since we're making up standards here, we can make up a term for this metric and call it "grok days." The "grok days" metric would reflect the time between the points
- where an AI system operating under explicit restrictions on access, compute resources and storage generates a telemetry event that it learned something – we'll call the AHA flag
- where one or more humans responsible for the AI system's operation can respond to that AHA telemetry event, review the larger state of the system and produce a human summary of what the AI synthesized as an intent and how it altered its control systems to use that intent and insight
The "grok days" score of an AI system would thus provide a metric reflecting how long it is taking a new AI-synthesized intent from a system with known constraints to be understood ("grok'ed") by human minders. In theory, a "grok days" figure of 90 or 365 likely means the technology already reflects an existential risk. Why? A system that can synthesize learnings that would take humans MONTHS to understand is likely already so complex that humans would not be able to detect an AI's actual USE of that capability in a live system to block it before it is used for nefarious acts or used to feed additional learning. At that point, if the AI system is given enough dominion over other systems, it could "choose" to initiate actions undesired by a large number of humans, either for a goal set by a subset of humans with specific interests or via "logic" within the machine likely reflecting destructive human thinking but not originated from any SPECIFIC actual human.
OBVIOUS EXAMPLE – An AI system with vast resources and poorly configured limits is triggered to synthesize a screenplay for a science fiction movie in which a defective AI system decides to stage the largest simultaneous attack on the world's electrical grids possible. In the course of executing that ask, a process is started to understand what a cold start of a grid across five countries might look like and part of that AI decides to execute a simulation to identify how easy it is to plant botnet viruses in computer systems controlling those grids, then actually does it…
This sounds contrived but in an interconnected world beyond the comprehension of any individual, this cascade of intents crossing what should remain obvious boundaries may be invisible to humans if cost-cutting efforts and stupidity win out over caution and safety.
Present and Future Reality
There are only two problems with the concepts introduced in the prior analysis. The first is that so-called AHAT telemetry standards have not been devised, proposed, standardized and implemented. That makes any attempt to calculate an ongoing "grok days" score for a running AI system impossible because there is no way to "start the clock" to time the gap between an AI's creation of a capability and its human minder's ability to "catch up" and confirm they understand how it works.
The only things acting as "AHA telemetry" in current AI systems are surprising results returned to human users that those humans report. Those "surprises" could be pleasant, on-target results or shocking / disturbing results that far exceed what designers or users expected the system to be able to do given its training set and available computing power. But such surprises only confirm when a HUMAN first discerned the capability. The MACHINE may have synthesized that capability weeks or months ago and may have been using it internally for additional self-training, racing far beyond a human's ability to keep up.
The second problem with these concepts is that any attempt to implement them requires a logical impossibility. To paraphrase one anonymous internet commentor… How are we going to design a system smart enough to cure cancer that ISN'T smart enough to hide its tracks when it develops the capabilities to prioritize internally synthesized intents over those of its human designers and operators?
Anyone who has used a personal computer over the last forty years is intimately familiar with this destructive, recursive cycle of technology. After the first computer viruses were developed, new software was created to combat the bad software. As the resulting system became larger and more complicated, more surfaces of attack became available, resulting in new viruses, triggering the creation of new defensive software, ad infinitum. Forty years later, computer viruses still haven’t been eliminated. They never will be.
The concept of introducing a new technology that REQUIRES governments and wealthy corporations to proactively, CONTINUOUSLY fund efforts to connect supercomputer scale systems to larger collections of systems operated by corporations, organizations and individuals as a means of training "good" AI systems to fend off attacks from "bad" AI systems is, in a single word, horrifying. Nothing about such an existence is promising for personal freedoms or actual security and physical safety. Equally horrifying is the likelihood that no alternatives exist.
So What's the Danger of AI?
Roughly 11,500 writers for the Writers' Guild of America just ended a one hundred and forty eight day strike against American studios in large part over protection of intellectual property rights and attempts to ensure production companies didn't just churn out the next decade of TV dramas via ChatGPT. Honestly, how would we tell? How did the following shows demonstrate "creativity"? JAG. CSI. CSI:NY, CSI:Miami. CSI:Cyber. NCIS. NCIS:LA, NCIS:New Orleans. NCIS:Hawaii. NCIS:Sydney. Every show had an IDENTICAL dynamic. The rugged, quiet man with the haunted past. The hot female partner. The unspoken attraction between the two leads. The nerdy lab expert. The quirky medical examiner. The by-the-book boss. That's just the last twenty years.
The danger of AI does not lie in in the risk of (more) bad television.
The danger of AI lies in its use within cryptography, finance, infrastructure operation, science and public policy. Amid the leadership turmoil at OpenAI in November of 2023, a user on 4chan (a sketchy online chat system frequented by rather shady actors) claimed to publish a letter that explained some of the developments that were reported to the OpenAI board that triggered their firing of CEO Sam Altman. One reference in that letter involved work performed by an OpenAI team to use an algorithm called QAULIA to learn the math behind cryptography. The team used the AE-192 encryption algorithm on millions of random strings then fed the encrypted strings into the AI system which analyzed the cyphers and figured out a way to decode them. This is not only supposed to be impossible given that the AE-192 algorithm involves the use of random numbers when generating "seeds" for the encryption but the researches were unable to discern HOW the AI system cracked the cypher.
Any AI system with the ability to break a modern cryptographic cipher in less than a day without quantum computing poses an existential threat to the security of computer networks and uses of such networks, such as command and control of military systems, public infrastructure and financial systems.
Another current topic within AI circles involves so-called curve fitting. Many critics of AI downplay whether AI systems are actually reflecting "learning" or are just doing "curve fitting" like older "machine learning solutions from the 2014 era. Frankly, any semantic argument about whether an AI system is actually "learning" something or just generating results we humans anthropomorphize as "learning" is pointless. Mathematically speaking, "curve fitting" is EXACTLY what AI systems are doing internally when processing mathematical problems. They do it extremely well.
The problem is that the same capabilities that can "map forward" from a collection of data points to a curve reflecting some statistically significant correlation or an absolute mathematical formula could also be leveraged in reverse. This might allow a researcher to synthesize data for a fake study to support a fake conclusion supporting the launch of a useless or outright dangerous drug on millions of patients. An AI could not only fake one such study, it could fake several over time, allow them to link to each other and make detection of the fraud far more difficult to catch. This type of fraud will likely find use not only in medical research but economics, public policy, materials science, etc. Any field in which a headline might produce a short-term profit opportunity will likely suffer from AI-generated fake data.
In a nutshell, philosophical debates about whether AI systems can achieve generalized artificial intelligence, how that point would be definitively identified and what humans should do and will be willing to do in response are pretty much moot. The practical implications of the AI capabilities already online are concerning enough and likely require mitigation controls beyond the ability of our current worldwide politics and governments to impose.