3 AI - Artificial Intelligence

Smith

Limits of Science

A whole lot of complex phenomena have so far defied the approach that gave us the laws of physics and genetics. Language, cognition, society, economics, complex ecologies — these things so far don’t have any equivalent of Newton’s Laws, and it’s not clear they ever will.

This problem has been recognized for a very long time, and thinkers tried several approaches to get around it. Some hoped that all complex phenomena would be governed by emergent properties — that simplicity would emerge at higher levels of complexity, allowing us to discover simple laws for things like psychology and economics even without connecting those laws to the underlying physics. Indeed, this idea is implicit (or, occasionally, explicit) in the way economists try to write down simple mathematical laws of collective human behavior. People make fun of this approach as “physics envy”, but sometimes it really works; auction theory isn’t derived from physics, but it has been able to make very effective predictions about how much people will pay for Google ads or spectrum rights. Ditto for “gravity models” of trade, migration, retail shopping, etc. Sometimes emergence works.

Sometimes, though, it doesn’t — or at least, it doesn’t yet. But in psychology, in macroeconomics, in natural language processing, and many other domains, the search for laws of nature has been mostly stymied so far, and it’s not clear when real progress might ever be made. Wigner goes so far as to postulate that some domains of human knowledge might never be described by such simple, generalizable principles.

Other approaches for getting around the problem of complexity — chaos theory, complexity theory — yielded interesting insights, but ultimately didn’t succeed in giving us substantially more mastery of the phenomena they dealt with. In the late 20th century, the problem of complexity was like a looming wall up ahead — as scientists found more and more of the laws that could be found, a larger and larger percentage of the remaining problems were things where laws seemed very hard or potentially even impossible to find.

Control without understanding, power without knowledge

In 2001, the statistician Leo Breiman wrote an essay (pdf) called “Statistical Modeling: The Two Cultures”, in which he described an emerging split between statisticians who were interested in making parsimonious models of the phenomena they modeled, and others who were more interested in predictive accuracy. He demonstrated that in a number of domains, what he calls “algorithmic” models (early machine learning techniques) were yielding consistently better predictions than what he calls “data models”, even though the former were far less easy, or even impossible, to interpret.

This raises an important question: What is the goal of human knowledge? As I see it — and as Breiman sees it — the fundamental objective is not understanding but control. By recording which crops grow in which season, we can feed our families. By understanding that germs cause disease, we can know to wash our hands or get a vaccine, and lower our risk of death. In these situations, knowledge and understanding might be intrinsically satisfying to our curiosity, but that satisfaction ultimately pales in importance to our ability to reshape our world to our benefit. And the “algorithmic” learning models that Breiman talks about were better able to deliver their users the power to reshape the world, even if they offered less promise of understanding what they were predicting.

Why should we care about understanding the things we predict? To most of us, raised and inculcated in the age of science, that might seem like a laughable question, but there actually is a good reason. “Understanding”, in the scientific sense, means deriving a simple, generalizable principle that you can apply in other domains. You can write down Kepler’s laws of planetary motion, but Newton’s laws of motion and gravitation let you generalize from planetary orbits to artillery shells. Collapsing observed phenomena to simple, generalizable laws and then expanding these laws again in some other domain to allow you to control other phenomena is fundamental to the awesome power of science. So because you and I sit at the end of 400 years of science being the most powerful tool in the world, we have naturally been taught that it is very, very important to understand things.

But what if, sometimes, there are ways to generalize from one phenomenon to another without finding any simple “law” to intermediate between the two? Breiman sadly never lived to see his vision come to fruition, but that is exactly what the people who work in machine learning and artificial intelligence are increasingly doing. In 2009 — just before the deep learning revolution really kicked off — the Google researchers Alon Halevy, Peter Norvig, and Fernando Pereira wrote an essay called “The Unreasonable Effectiveness of Data” that picked up the argument where Breiman left off. They argued that in the cases of natural language processing and machine translation, applying large amounts of data was effective even in the absence of simple generalizable laws. A few excerpts:

Sciences that involve human beings rather than elementary particles have proven more resistant to elegant mathematics…An informal, incomplete [list of the grammatical rules that define] the English language runs over 1,700 pages. Perhaps when it comes to natural language processing and related fields, we’re doomed to complex theories that will never have the elegance of physics equations. But if that’s so, we should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data…

So, follow the data…Represent all the data with a nonparametric model rather than trying to summarize it with a parametric model, because with very large data sources, the data holds a lot of detail. For natural language applications, trust that human language has already evolved words for the important concepts. See how far you can go by tying together the words that are already there, rather than by inventing new concepts with clusters of words. Now go out and gather some data, and see what it can do.

The basic idea here is that many complex phenomena like language have underlying regularities that are difficult to summarize but which are still possible to generalize.

The ability to write down farming techniques is power. The ability to calculate the path of artillery shells is power. And the ability to have a machine reliably and consistently write paragraphs as clear and helpful as the one above [ChatGPT example] is power, even if we don’t really understand the principles of how it’s doing what it does.

This power is hardly limited to natural language processing and chatbots. In recent years, Google’s AlphaFold algorithm has outpaced traditional scientific methods in predicting the shapes of folded proteins.

We are almost certainly going to call this new type of prediction technique “science”, at least for a while, because it deals with fields of inquiry that we have traditionally called “science”, like protein folding. But I think this will obscure more than it clarifies. I hope we eventually come up with a new term for this sort of black-box prediction method, not because it’s better or worse than science, but because it’s different.

A big knock on AI is that because it doesn’t really let you understand the things you’re predicting, it’s unscientific. And in a formal sense, I think this is true. But instead of spending our effort on a neverending (and probably fruitless) quest to make AI fully interpretable, I think we should recognize that science is only one possible tool for predicting and controlling the world. Compared to science, black-box prediction has both strengths and weaknesses.

One weakness — the downside of being “unscientific” — is that without simple laws, it’s harder to anticipate when the power of AI will fail us. Our lack of knowledge about AI’s internal workings means that we’re always in danger of overfitting and edge cases. In other words, the “third magic” may be more like actual magic than the previous two — AI may always be powerful yet ineffable, performing frequent wonders, but prone to failure at fundamentally unpredictable times.

But even wild, occasionally-uncontrollable power is real power.

AI and Economic

In the past few decades, as economics has moved away from theory and toward empirics, the most important innovation has been the use of natural experiments — situations where some policy change or seemingly random difference allows you to tell yourself that you’re looking at causation, rather than just correlation. This is different than what I call “history”, because you’re doing more than just documenting facts; you’re verifying causal links. But it’s also different from science, because a lot of the time you don’t exactly know why the causal links are there. In a way, a natural experiment is its own sort of black-box prediction algorithm.

A number of subfields of econ, however, are so complex, with so many feedback systems, that they’ve largely resisted the natural experiment approach. These include not just the study of business cycles (what most people call “macro”), but also the study of economic growth, international finance, and a number of others. In these fields, theory (including “structural estimation”) still rules, but predictive power is very low.

Might we apply AI tools to these hard problems, in order to predict vast economic forces without needing to understand them? A recent paper by Khachiyan et al. argues that the answer is “yes”. The authors use deep neural nets (i.e., AI) to look at daytime satellite imagery, in order to predict future economic growth at the hyper-local level. The results they achieve are nothing short of astonishing:

For grid cells with lateral dimensions of 1.2km and 2.4km (where the average US county has dimension of 55.6km), our model predictions achieve R2 values of 0.85 to 0.91 in levels, which far exceed the accuracy of existing models, and 0.32 to 0.46 in decadal changes, which have no counterpart in the literature and are 3-4 times larger than for commonly used nighttime lights.

This isn’t yet AlphaFold, but being able to predict the economic growth of a few city blocks 10 years into the future with even 30% or 40% accuracy is leaps and bounds ahead of anything I’ve ever seen. It suggests that rather than being utter incomprehensible chaos, some economic systems have patterns and regularities that are too complex to be summarized with simple mathematical theories, but which nevertheless can be captured and generalized by AI.

Khachiyan et al.’s paper raises the possibility that in a decade or two, macroeconomics might go from being something we simply theorize about to something we can anticipate — and therefore, something we can control. The authors suggest place-based policies, transportation infrastructure construction, and disaster relief as three possible applications of their work.

Comment to Noah

“…some economic systems have patterns and regularities that are too complex to be summarized with simple mathematical theories, but which nevertheless can be captured and generalized by AI” - yes indeed. AI may be helpfull in demonstrating the our capitalist economic system grows with increased consumption of nature - no decoupling whatsoever. Applying convolutional neural networks to daytime satellite imagery predicts microspatial changes in income and population at a decadal frequency. What are these predictors? Visible asphalt and cement! To site from the paper: Convolutional Neural Networks (CNN) extracts economic information that is latent in spectral data. Asphalt, cement, gravel, soil, water, vegetation, and other materials vary in their reflectance intensity across the light spectrum. The presence of these materials varies enormously within an urban area: more vegetation and loose soil in green spaces; more asphalt and cement around motorways; more steel and wood, together with concrete, in houses and buildings. That’s economic growth! (Comment to Noah Smith on AI in economics: https://noahpinion.substack.com/p/the-third-magic/comments)

Noah Smith (2023) The third magic - A meditation on history, science, and AI

3.1 AI Economics Applications

Khachiyan Abstract

We apply deep learning to daytime satellite imagery to predict changes in income and population at high spatial resolution in US data. For grid cells with lateral dimensions of 1.2km and 2.4km (where the average US county has dimension of 55.6km), our model predictions achieve R2 values of 0.85 to 0.91 in levels, which far exceed the accuracy of existing models, and 0.32 to 0.46 in decadal changes, which have no counterpart in the literature and are 3-4 times larger than for commonly used nighttime lights. Our network has wide application for analyzing localized shocks.

Khachiyan Excerpts

Recent work in remote sensing and computer science uses convolutional neural networks (CNNs) to predict outcomes from multi-spectral daytime satellite imagery at high spatial resolutions.

In our context, a CNN extracts economic information that is latent in spectral data. Asphalt, cement, gravel, soil, water, vegetation, and other materials vary in their reflectance intensity across the light spectrum (e.g., De Fries et al., 1998). The presence of these materials varies enormously within an urban area: more vegetation and loose soil in green spaces; more asphalt and cement around motorways; more steel and wood, together with concrete, in houses and buildings (Zha et al., 2003). The shapes of these materials exhibit similarly wide variation: irregular edges in green spaces, intermittent grids of grass and roofing material in suburbs, larger rectangular clusters in apartment complexes and shopping malls, and compact, interconnected grids in urban centers. It is this complexity that makes a neural network powerful—the network learns the mapping of materials and shapes to the level of economic activity and changes in materials and shapes to changes in economic activity. As an empirical regularity, the features learned by the network are often organized into a hierarchy of complexity (Zeiler and Fergus, 2014), in which early layers learn to identify simple features, such as edges or basic shapes, and subsequent layers learn to compose these simple features into complex objects, such as office buildings, industrial parks, suburban developments.

We show that applying convolutional neural networks to daytime satellite imagery predicts microspatial changes in income and population at a decadal frequency.

Khachiyan (2021) USING NEURAL NETWORKS TO PREDICT MICRO-SPATIAL ECONOMIC GROWTH (pdf)

Kevin Kelly

So far, out of the perhaps dozen of cognitive modes operating in our minds, we have managed to synthesize two of them: perception and pattern matching. Everything we’ve seen so far in AI is because we can produce those two modes. We have not made any real progress in synthesizing symbolic logic and deductive reasoning and other modes of thinking.

It is those “others” that are so important because as we inch along we are slowly realizing we still have NO IDEA how our own intelligences really work, or even what intelligence is.

A major byproduct of AI is that it will tell us more about our minds than centuries of psychology and neuroscience have.

Kelly (2023) Interview with Noah Smith