AI - Mathematical Formulation of Intelligence
Motivates aspiring AI researchers to master the maths. About how AI is nothing but the maths run by a computer. And also about how ML is a way to implement AI.
Ideas presented in this blog are from my own understanding. Feel free to email me for any corrections.
Introduction
There is an aspiring AI Researcher named Aire (appended ‘e’ because it makes the name sound like a name of a person. Pronounce it as “air”). He once came to know about AI. And since then it has fascinated him a lot. The fact that it is a revolutionary technology and can significantly improve human lives in all the ways excites him. To say the least, AI has already begun to positively disrupt these areas: healthcare, education, agriculture, science and engineering, waste management, driving assistance, etc. And to say a bit more, AI will also help in far-fetched problems like finding habitable planets.
AI is a revolutionary concept because it combines the power of computers and of humans. A computer with AI has the efficiency, accuracy, untiringness, unforgetfulness of computers and the intelligence of human beings. So, it can do what humans can do with all the qualities of a computer just mentioned. And also it can achieve those goals very easily which are comparitively hard for humans, like linear regression.
But there is one important issue with the use of AI, which is its significant carbon footprint. But Aire knows that AI has the potential to turn the table. It can do more good than harm. It can help the environment by creating appropriate materials (e.g., for carbon capture), clean energy (e.g., fusion energy), reducing its own carbon footprint by creating efficient AI, and in many other ways.
This article explains why maths is a requirement to create AI and comes to the conclusion that, in order to become an AI researcher, one must master the required mathematics. Apart from this, it explains the relation between AI and ML.
Defining Intelligence
Let’s first define intelligence for the purpose of this blog. There is an agent or doer (could be a human or computer). An agent has intelligence when it has two functions - learning and action. An agent with intelligence can achieve goals, and it achieves them through the “action” function. If an agent does not have enough knowledge to achieve a goal, then it first acquires the required knowledge through the “learning” function (e.g., humans learn to drive before driving a car).
The “action” function takes a goal, existing knowledge, its previous outputs, and the environment as inputs to the function. And outputs a value which is used to achieve the goal. An agent achieves the goal by iterating over the action function. It takes the state of the environment and previous outputs of the function, and creates new output such that it progresses towards the goal. This sequence of outputs is used to achieve the goal. Number of times the action function is used may vary from one to many, depending on the goal. When the “action” function is called, we can say that an agent did a thoughtful action. Why thoughtful? Because the function creates an output so that it moves closer to its goal.
Let’s understand action function with respect to human intelligence. Humans achieve goals by taking input (through the senses) from the environment and giving it to the action function inside the brain. Then the action function gives output in the form of electrical signals to external organs. Output makes external organs act in order to achieve the goal. If the goal is not achieved, then action is taken again until it is achieved. If the agent has enough knowledge, then every action moves the agent closer to the goal.
E.g., for a goal of folding a t-shirt, your brain output signals which move your hand and arm. Folding a t-shirt requires multiple movements, therefore the action function is called multiple times. Another example is writing a maths proof where the action function determines which equation to write next, given the context of what is already written. Hence, the action function is called for each step of the proof. Goals where hand movements are required, like writing a maths proof, will always require more than one action. But a goal of classifying an object requires only one call to the action function. E.g., when you look at an apple, you immediately recognise it as an apple.
We want an agent to achieve a goal. If an agent has the required knowledge, it will use the “action” function to achieve it successfully. If existing knowledge is not enough, then the agent will fail to achieve the goal. So, it must first acquire the required knowledge. Learning is the process of acquiring the required knowledge and is done via the “learning” function.
Learning happens in two phases. In the first phase, an agent acquires enough knowledge to achieve a goal. After acquiring enough knowledge, the second phase begins, where the agent learns while successfully achieving the goal. In the second phase, it updates the knowledge such that it achieves the same goal next time with better efficiency and accuracy.
An agent learns the hard way. That is, it learns through experience of applying the action function multiple times. For a goal, if there is no risk in applying the action function with insufficient knowledge, then learning happens in the actual environment. An example of such a goal is folding a t-shirt. If there is risk in applying the action function, then first-phase learning happens in a dummy environment, and second-phase learning happens in an actual environment. E.g., humans learn to drive in a controlled environment before driving in a city.
The “learning” function takes environment, existing knowledge, output from the action function, and feedback as inputs to the function. And outputs updated knowledge. An agent learns by iterating over this function and action function alternatively. Updated knowledge is existing knowledge plus the knowledge gain. Knowledge gain is created in three major ways. The first way is via considering action function output and feedback in the form of supervision or reward. E.g., you learn to drive under the supervision of an instructor. Or you are learning on your own by feeling good (as a reward) when you successfully pass a hurdle. The second way is via considering the changed environment when action is performed. E.g., harder you press the accelerator, faster the car runs. Third but not the least is learning by simply observing the environment. E.g., learning by observing a driver. An agent may depend on some or on all three ways of learning.
This is an abstract description of intelligence, which is enough for this blog. But every aspect of human-level intelligence can be explained (which I will explain in a dedicated blog) through this two-function framework.
About Computers
A computer represents everything in numbers and can do mathematical operations on these numbers, like addition, multiplication, etc.
Mathematics is Required
Aire wants to be an AI researcher. Artificial Intelligence (AI) is really a concept or abstract idea of a computer having intelligence. The aim of AI researchers is to make this concept a reality or implement this concept by writing a computer program. To implement AI, they need to implement the two functions of intelligence.
What these functions do is that they create (not fetch) the output by using the input. That is what the purpose of the “learning” function is: to create the required knowledge which an agent does not have already. It creates by relying on the input of the function. Let’s understand creation of output by the action function through an example. When you have a goal of writing an essay on some topic. The action function in your brain receives inputs and sends the output in the form of electrical signals about “what to write” and “how to write” (describing how to move a pen through hand). Whether we have written an essay on the same topic before or are writing it for the first time, we will write it differently every time we have to write because of different situations. Every call of the action function creates signals (about what words to write and how to write) which depend on inputs of the function - the goal (write an essay for a school exam or for the national competition on X topic), the environment (different inspirations depending on a place of writing), what is already written and, of course, your knowledge (which varies with time). The space of all possible inputs is large, and we can’t store the output for each input configuration. Therefore, the action function creates the output depending on the input configuration and does not just fetch it. That is, an essay we write is not stored somewhere in the brain, but we create it.
The outputs and inputs (e.g., environment, goal, knowledge, feedback, etc.) of these functions would be represented in the form of numbers in a computer. And in a computer, mathematical operations are required to create any numbers, e.g. output numbers from input numbers. But for fetching numbers (data) from the memory, maths operations are not required. It uses the address of the data and brings the data through the bus from that memory address.
So the point I am making here is that a researcher has to implement the functions of intelligence in the form of mathematical functions. Then these functions can be written in a programming language, which is an easy part. The real difficulty for AI researchers is how to formulate the two functions of intelligence in mathematics.
To come up with these maths functions, an AI researcher has to first gain the knowledge of intelligence and mathematics. Like a physicist who gains the knowledge of the world via observation and intuition. And learns the maths (Einstein at least once had to learn some topics of maths before he could formulate his theory) required to convey that knowledge in the mathematical language.
To gain the knowledge of intelligence, a researcher could observe himself, other human beings and animals. Or via intuition. Or by learning from cognitive scientists, neuroscientists or others who study the human (and animal) brain and intelligence. After learning about intelligence, a researcher will have some idea of how to formulate the functions which allows him/her to search for appropriate maths which can be used to formulate.
Existing Maths Formulations
I will quickly go over this section.
In the 20th century, a group of researchers were able to have some understanding of intelligence, which allowed them to find the appropriate mathematics to formulate it. Topics of mathematics they found appropriate were linear algebra, probability, and calculus. After the formulation, they were able to easily write the corresponding computer algorithms. Their way of formulating in mathematics the two functions of intelligence is known as Machine Learning (ML). And the corresponding computer programs are known as ML algorithms.
Deep Learning (DL) algorithms (a class of ML algorithms, where mathematical formulation is inspired by the human brain) became successful in 2012. With DL, researchers were able to implement Artificial Narrow Intelligence (ANI), that is, a DL algorithm can learn to achieve a single well-specified human-level goal. For e.g., an image classification algorithm can only do image classification. With this success, computers became intelligent assistants of humans and helped humans in unimaginable ways. To name a few, they helped humans in language translation, medical diagnosis and treatment, leaf disease classification, driving, science and in so many other ways. But it is still narrow intelligence. ANI is inherently limited in many ways (ANI can not do robust prediction, continual learning, generalisation, reasoning, planning, and learning with small sample complexity). To solve these problems, researchers have to create an AI algorithm which can achieve multiple human-level goals which is only possible when they formulate proper intelligence or more commonly called Artificial General Intelligence (in contrast to ANI). AGI means computers having human-level intelligence.
Recently researchers have created Large Language Models (LLMs) and Multi-Modal Models (I will refer to both of them simply as “LLMs”) which feel like human-level intelligence. Although the mathematical formulation of LLMs is different and better than the previous formulations of intelligence, the main success of LLMs is because of a larger number of parameters (more storage for knowledge), larger training data (richer environment) and large compute power. Each subsequent version of LLM was trained with the larger scale of each of these three factors. Hence, the accuracy and generalisation capability of LLMs kept increasing with the version. But other limitations (continual learning, etc.) of DL algorithms are still unsolved. Also, these three factors are limited and researchers cannot indefinitely keep increasing these factors. Also, physical understanding (computer vision) of the world is still far behind the progress in language-level intelligence. AI researchers have to find and are finding drastically better and newer mathematical formulations of intelligence. So, AI is still very much an active research area.
In the 20th century, there was another group who implemented the concept of AI by only implementing the action function. The knowledge required for the goal was fed to the computer manually. There was no learning function. This works for simpler goals where required knowledge can be fed easily. This implementation is called Symbolic AI or GOFAI (Good Old Fashioned AI).
But for complex goals like image classification, feeding the required knowledge is such a laborious task that a researcher would rather think about how humans learn and then try to formulate it in maths. The world is full of complex goals. It seems easier to find the learning algorithm that would allow the computer to learn to achieve all the complex goals than to do the laborious task of manually feeding the knowledge for each complex goal.
There is one advantage of Symbolic AI over ML. With Symbolic AI, a computer can do better reasoning than with the ML algorithm. It is because of the difference in the way knowledge is represented in these implementations. In Symbolic AI, knowledge is represented in the human-readable way (i.e., in words). Which makes it easy for the researcher to implement how humans reason in the action function by analysing the thoughts of humans (in contrast to understanding how neurons in the brain work). That is, while writing an essay, we think and write. We can analyse our thinking while we are writing and figure out “how we decide what sentence to write next, out of all the thoughts of potential next sentences”. Or when we write mathematical proof, we can analyse our thoughts about how we decide what step to write next. This analysis of thoughts gives insights on how humans reason, which allows the researcher to implement it in the action function.
After the popularity of DL, it was proved that the learning function works, making DL scalable in contrast to GOFAI. But to add reasoning in DL is not as natural or straightforward as in Symbolic AI. So Symbolic AI researchers have proposed a new concept of Neuro Symbolic AI, which combines the best of both worlds.
Road to Becoming an AI Researcher
As I mentioned above, during the nascent age of AI, researchers first understood intelligence and then tried to formulate it in maths. They didn’t have any literature about how to formulate it. They formulated it for the first time and got some results. Then they kept improving the formulation, and here we are in the age of LLMs.
It has been decades since the inception of AI as a concept and more than one decade since the success of DL. In these years, a lot of great researchers have studied AI and its formulations. So, aspiring AI researchers like Aire have rich literature at their disposal. By understanding the literature, they would be able to access all the knowledge of past and current AI researchers. Aire would know about their understanding of intelligence, what they have already tried, what formulations worked, what didn’t work, the limitations of current formulations and potential ways of improvement. And then it is easier (difficult per se but easier compared to starting from scratch) for Aire to propose novel and better formulations.
But since AI is formulated in maths, AI literature is written in the language of mathematics. So, Aire should first master the required maths (as mentioned above, linear algebra, probability, etc.) before he will be able to master the literature. This is in contrast to early AI researchers who first understood the intelligence and then worked on maths.
Conclusion
AI is a concept. To make it real, a researcher is required to formulate it in maths. So, aspiring AI researchers like Aire must master the required mathematics which allow them to master current literature and propose newer algorithms by formulating their understanding of intelligence.