A Reckoning for Moore’s Law

Why upgrading your computer every two years no longer makes sense.

It was hard to tell at the time — with the distraction of the Y2K bug, the explosion of reality television, and the popularity of post-grunge music — that the turn of the millennium was also the beginning of the end of easy computing improvements. A golden age of computing, which powered intensive data and computational science for decades, would soon be slowly drawing to a close. Even with novel ways of assembling computing systems, and new algorithms that take advantage of the architecture, the performance gains as predicted by Moore’s law were bound to come to an end — but in a way few people expected.

Moore’s law is the observation that the number of transistors in dense integrated circuits doubles roughly every two years. Before the turn of the millennium, all a computational scientist needed to do to have more than twice as fast a computer was to wait two years. Calculations that would have been impractical became accessible to desktop users. It was a time of plenty, and many problems could be solved by brute-force computing, from the quantum interactions of particles to the formation of galaxies. Giant lattices could be modeled, and enormous numbers of particles tracked. Improved computers enabled the analysis of genomic variations in entire communities and facilitated the advent of machine-learning techniques in AI.

Fundamental physics limits will ultimately put an end to transistor shrinkage in Moore’s law, and we are close to getting there. Today, chip production creates structures in silicon that are 14 nanometers wide and decreasing, and seven-nanometer elements are coming to market. At these sizes, thousands of these elements would fit in the width of a human hair. Feature sizes of less than five nanometers will probably be impossible because of quantum tunneling, in which electrons undesirably leak out of such narrow gaps.

But the actual reckoning began in 2005, when clock-speed improvements plateaued. With Moore’s law came so-called Dennard scaling, the observation that smaller transistors can be switched faster and operated at lower voltages, thereby keeping overall power consumption stable. Thus, the clock frequency of CPUs (the rate at which transistors can be switched on and off) doubled roughly every two years, too, while the electrical power consumed per CPU was largely constant. The principle that the increased power draw of a faster transistor would be offset by its decrease in size broke down when small became too small. Given that all the power consumed by a CPU is eventually converted into heat and that cooling CPUs with something other than air was expensive, clock frequency largely did not change, even though transistors continued to shrink.

To keep the spirit of Moore’s law alive, engineers used the extra transistors to implement new operations that could do more per ‘tick’ of the clock. The transistors also improved the speed of memory access and eventually enabled engineers to place multiple CPUs on a single computer chip. What used to be thought of as a CPU was now called a core, and there are now chips with as many as 32 cores. Machines as capable as a mainframe are now on a single piece of silicon. We are in the age of parallel programming: from individual operations acting on multiple operands to parallel workflow.

Person using computer in room with lots of wires
CALCULATING SCIENCE: 7,000 cores were installed in the basement of the Flatiron Institute when it first opened in 2016.

To make more powerful computers, manufacturers are putting even more cores together, essentially using cores to tile regions of silicon with high-speed interconnects running between individual components. The combined cores have the same density and clock frequency as existing chips but are physically larger and so use more total power. Hence, this plan for continued improvement will lead to hotter chips. Modern CPU chips already dissipate more wattage than electric cigarette lighters. Nonetheless, they can be cooled simply and effectively by fans blowing chilled air over them. There are many materials more efficient than air for cooling, but few are as clean, nontoxic, and nondamaging if they leak out. To handle the next generation of chips, the general trend is a move to direct water cooling. Water carries away 4,000 times as much heat by volume as air does, but the resulting systems are more expensive, more difficult to maintain, and more damaging when they fail.

The slowdown of technology improvements is manifesting itself in industry and in the scientific community. During the decades of rapid improvement, systems were often retired after three years because new equipment might be at least four times faster for the same amount of power. This, in turn, drove the computing industry to bring new systems to market quickly. With the slowed pace of improvements, computing nodes are now often sold with five years of support, reflecting expected service life. Concerns about power consumption and longevity sometimes result in a user’s high-end laptop being faster, on a core-by-core basis, for solving some problems than a computing node of a high-performance cluster, although the latter in aggregate is thousands of times more powerful than the former.

The need to upgrade facility infrastructure to support higher power systems and the general slowdown of computing improvements are leading to some stagnation in the computing resources available, although other technologies, such as graphics processor units, are still improving rapidly. It will require a concerted effort to improve efficiency to maintain the speed of scientific progress in the absence of raw hardware improvements.

As interesting scientific questions continue to be asked, the volume of data continues to grow. The days of effectively free, large, yearly performance gains are over, so we will need to address the data crunch with the resources we have. Computing systems are becoming more parallel and structured. The growth in the number of transistors has opened the possibility of not only more cores but more layers of caching. Efficient coding will require higher levels of parallelism to take full advantage of the breadth of resources and a deeper understanding of memory caching and input-output subsystems. New algorithmic approaches will be more important than ever.

The news may sound dire. But resource constraints can be constructive, forcing people to be more creative and efficient and increasing the importance of deep technical understanding. Together they may well yield results that more than make up for what has been lost to the slowdown.

Ian Fisk is co-director of the Flatiron Institute’s Scientific Computing Core.

This article is part of the “Five Things We Used to Think Were True” section of the foundation’s 25th anniversary book.