Software scalings

Software complexity

How does the size of software scale with the number of features? Some folks assume that the scaling should be linear:

With his newfound power, he built his operating system on top of his own hardware from scratch in 12K SLOC, with a footprint of 200 kilobytes. For comparison, OSX runs in on ~86M SLOC with a footprint of 3 gigabytes, built by one of the wealthiest companies in the world. Now, perhaps OSX is more feature complete than Oberon, but certainly not by a factor of ~40 000X. Something was lost along the way.

Fredrik Holmqvist, Brooks, Wirth and Go.

This might be true if each feature were modular or independent, like adding a new separate application. In that case, adding a feature only requires adding new code to support that particular feature, without changing other code for preexisting features.

However, there are other ‘features’ which aren’t modular. For instance, consider adding the ‘security’ feature. Security is one of those attributes that can’t realistically be slapped onto an existing system — it may require a total rethinking of many core design choices. Now consider a feature list such as “multi-user, secure, multi-language, multi-processor, performant, …”. These are all attributes that a homebrew OS probably doesn’t have, but a mainstream OS needs.

The majority of your complexity budget is burned on these things interacting.

Hillel Wayne Reject Simplicity, Embrace Complexity

Adding each of these ‘features’ might multiply the difficulty of the problem and the complexity of the solution, because each feature interacts with others. That would be an exponential scaling. Taking the ratio of the logs of the size of the two systems yields 1.78, which would imply that OSX should have 78% more ‘features’ than the homebrew OS. This is super hand-wavy and likely pessimistic. Maybe the scaling should be sub-exponential, because presumably not every feature interacts with every other one. The point is that OSX only looks bloated if you assume linear scaling, which is almost certainly too optimistic. It’s possible that most of the complexity is inherent complexity, not accidental complexity.

Probability of a software project running late

In an interesting blog post (Why software projects take longer than you think: a statistical model), Erik Bernhardsson shows that to a decent approximation, the time to complete a software project formed a log-normal distribution around the estimated completion time. That is, the completion times behave as if you sampled from a normal distribution and then took the exponential of that number, and scaled the actual completion time by that random factor. This is somewhat surprising, for the following reason.

One way to estimate the time it takes for a project: breaking down work into chunks, estimate the time it takes to complete, and then sum up the times for the chunks. If you break the chunks small enough, the central limit theorem says that the error in estimating the overall completion time should be a normal distribution, regardless of the shape of the distribution of the errors for the individual chunks. So, it might come as a surprise that the error distribution is log-normal instead.

Bernhardsson notes the issue I raised in the previous paragraph in a footnote, but doesn’t speculate about the implications. It might be related to the exponential nature of the complexity of software as a function of the number of features. If that is correct, then the normal distribution inside the exponential would represent the number of features that the project needs to deliver.

At a lower level, software projects are more aptly thought of as tree structures of tasks and subtasks than lists of steps. Recall the parable of yak-shaving: there are many seemingly-irrelevant details that become critical as one zooms in on the reality of accomplishing even a mundane task. See also Reality has a surprising amount of detail. My thought is that the estimation error isn’t due to estimating the duration of the steps. Rather, it’s caused by adding or neglecting certain tasks (along with all their subtasks, if any). Examples of neglecting a major task could include an “unknown-unknown” technical problem that crops up during development, or failure to identify all the stakeholders and their requirements.

A project can be modeled as a random recursive tree, where each note is a task or subtask. The estimation task would be modeled as adding or dropping branches at random, to produce the estimated project structure from the actual project. We would like to know the probability distribution for the size of the actual project given the size of the estimated project, which also requires us to know the true distribution of project sizes. It might be easier to work the other way, by postulating a model for the process of going from the estimated project to the real project by adding branches with some probability.

It would be interesting to run some Monte Carlo simulations (for instance, using RandomTree.jl) to see what assumptions about the trees and the pruning/growing probabilities it would take to reproduce a lognormal-like distribution. It would also be interesting to get real-world data on the estimated and actual work breakdowns for comparison. Fields with less uncertainty in the project structure (residential construction?) might have something closer to a normal distribution for completion times.

Is public or private funding better for research and innovation?

I’ve collected a pile of quotes on risk-tolerance and how best to fund breakthroughs. There’s a dominant theme: big-government sponsorship is inimical to the truly-novel research that is necessary to generate breakthroughs. The argument goes like this:

Progress in research entails doing truly novel things, which means there are risks. Conversely, this implies that low-risk activities won’t yield major breakthroughs. Structural incentives for federal bureaucrats, for grant-approving committees, and hence for individual scientists ensure that low-risk, short-sighted activities are favored. Therefore, the progress of publicly-funded research is hobbled. In the private sector, the inefficient are winnowed out by natural selection. In contrast, the government is an assured stream of funding, so ineffective processes are not culled.

The counterpoints I’ve found are mostly to the effect that private funding isn’t necessarily immune to perverse incentives, nor does it necessarily have better ways of selecting winners (in fact, private-sector due-diligence is probably scientifically less rigorous). The upshot seems to be that private funding is the worst kind — except for all the others.

The following two sections are my thoughts; skip them if you want to go straight to the quotes.

Should we give up on public funding?

Writing off public-sector funding of science & research is a poor strategy for the long term. The government is the natural vehicle for socializing the inherent riskiness of scientific research, because:

  • It can most effectively “average out” the fluctuations in success (ie, it can afford a larger portfolio & longer time-scale than private firms)
  • The benefits of scientific research accrue to the whole of society, so it makes sense for society as a whole to sponsor it, rather than relying on the enlightened self-interest of corporations, which may result in a suboptimal expenditure (although perhaps over-eager investors are subsidizing the common good)

I question whether risk-aversion is inherent within the government. There are counterexamples (the Manhattan & Apollo projects, DARPA,…), but interestingly they primarily are within the context of wartime (WW2 and the Cold War). My hypothesis is that a lack of existential competition on the international scale is a precondition for the stagnation of federally-sponsored research. Not that natural selection happened at the national scale, but because the possibility of it was internalized as a willingness to take big risks and swing for the fences. However, there are interesting alternate hypotheses:

  • The financial sector infected the government with a risk-management philosophy starting around the 1980’s. This is particularly interesting because it ties into society-wide/global trends. This would implicate the private sector as well.
  • Every organization ossifies with old age. State-sponsored science was young and effective once upon a time, but it has inevitably aged.

What to do?

Aside from a war or power-cycling the entire enterprise, is there any way to improve the system? Perhaps, although I don’t claim to have a solution, I’m just trying to stimulate discussion. It’s a hard problem, no doubt. Without the feedback mechanism of the market, selection pressure for grantmakers is not natural selection. All that remains are subjective evaluations, or artificial quantitative measures (like publication count) that are subject to being gamed. Any feedback loop is also limited in effectiveness by the timescale it operates on. As the timescales approach the duration of a career, the chance for feedback goes to zero. Unfortunately, the issues are too complex to rely on anything less than human intelligence (ie, algorithms that could be back-tested).

The DARPA model and the COTS program are two interesting ideas for how to change the incentive structures of government-sponsored R&D. In both, a key insight is to reduce the amount of control exercised by the funding agency over how the funding recipient spends the money. However, there’s a tendency for reversion to the mean: unless the upstream incentives or selection pressures for the bureaucrats are modified, controls will creep back in after the inevitable screw-ups (ex: Solyndra). DARPA is supposed to have addressed this with term limits, such that there is no pressure to justify the decisions down the road b/c there’s no career path for the project managers to maintain. It’s not clear that this has been successful, long-term: there’s no pressure to avoid failures, but corresponding little reward for success. With COTS, the fixed-price contract absolutely changes the incentives compared to the price-plus contracts typical of defense. While the program has been successful so far (SpaceX delivering cargo & astronauts to the ISS), the real question is whether it will survive its first major failure.

On to the quotes:

Government-funded research is doomed to stagnation

It is time for corporations and entrepreneurs to recognize that they can no longer rely on governments to fund and on academia to conduct fundamental research. Instead of doing translational research and simply bringing academia’s fruits to market, they have to become bolder and take responsibility for what the future will look like, fund fundamental research, and bring to life a new vision for philanthropy.


Firstly, since academia lacks the mechanism for competitive displacement, bloat accumulating over time and inevitably rising risk-aversion can grow without bound [11]. If a firm becomes inefficient, it collapses and is replaced by another one. If science becomes inefficient… it continues to take in money and people and, well, scientific progress slows down. Secondly, as I pointed out above, science is severely afflicted by the problem of misaligned timescales. Grantmakers’ planning horizons (note that I’m talking about specific individuals who make specific decisions, not abstract institutions that theoretically care about the long-term) are severely limited by their own career planning horizons and by their understanding of what it takes to work on fundamental problems with little short-term payoff.

Alexey Guzey, Reviving Patronage and Revolutionary Industrial Research

[S]oftware […] has a convex performance-to-value curve, as with creative fields. Most fields have concave curves, in that not screwing up is more important than hitting the top bars. This convexity means two things. First, it means that managing to the middle decreases total performance. Second, it means that risk (measured by the increasing first derivative of a convex function) is positively rather than negatively correlated with expected return.

michaelochurch on reddit

Now, the Valley tech scene does just fine with a 90% failure rate, because the win associated with backing successful execution of a really disruptive idea is much more than a 10x return on investment. The good ones return 50x, and the home runs deliver 100x or more.

The Innovation Dead End

“So why am I not an academic? There are many factors, […] but most of them can be summarized as “academia is a lousy place to do novel research”. [….] My supervisor cautioned me of the risks of doing work which was overly novel as a young academic: Committees don’t know what to make of you, and they don’t have any reputational prior to fall back upon. […] In many ways, starting my own company has given me the sort of freedom which academics aspire to. […A]cademic institutions systemically promote exactly the sort of short-term optimization of which, ironically, the private sector is often accused. Is entrepreneurship a trap? No; right now, it’s one of the only ways to avoid being trapped.

Colin Percival, On the Use of a Life

Why not just have the government, or some large almost-government organization like Fannie Mae, do the venture investing instead of private funds?
I’ll tell you why that wouldn’t work. Because then you’re asking government or almost-government employees to do the one thing they are least able to do: take risks.
As anyone who has worked for the government knows, the important thing is not to make the right choices, but to make choices that can be justified later if they fail.

Paul Graham, Inequality and Risk

In today’s university funding system, you need grants (well, maybe you don’t truly need them once you have tenure, but they’re very nice to have). So who decides which people get the grants? It’s their peers, who are all working on exactly the same things that everybody is working on. And if you submit a proposal that says “I’m going to go off and work on this crazy idea, and maybe there’s a one in a thousand chance that I’ll discover some of the secrets of the universe, and a 99.9% chance that I’ll come up with bubkes,” you get turned down. But if a thousand really smart people did this, maybe we’d actually have a chance of making some progress. […] This is roughly how I discovered the quantum factoring algorithm.

Peter Shor comment on Peter Woit’s blog

Shor concludes with an appeal for individual scientists to sneak off and do ‘bootleg research’ or use their spare time. I think this is misguided — we should fix the systematics, not rely on individuals to buck the incentive structure. Individual researchers working in their spare time are limited in the scope of what they can address. While quantum information theory requires little more than pencil and paper, fusion energy research, for example, is not amenable to this approach.

Well-meaning but disastrous government initiatives to support the startup ecosystem relentlessly pump money into the startup scene. This money is advertised as “free” or “non-dilutive”, but in reality it’s the most expensive kind of money you can imagine: it’s distracting, it begs justification, it kills creativity, and it turns your startup into a government work program.

Alex Danco, Why the Canadian Tech Scene Doesn’t Work

You cannot simply add money and create a tech scene. If you do, then either that money will be too freely available and attract the wrong kind of opportunists, or it’ll be like grant money that takes up so much of the founder’s time and energy that it distracts them from actually starting and running the business in the first place.

Alex Danco, The social subsidy of angel investing

There’s a tremendous bias against taking risks. Everyone is trying to optimize their ass-covering.

Elon Musk interview, in the context of major aerospace government contractors

There are some very interesting tangents here as well: Alex Danco’s remark about “money will be too freely available and attract the wrong kind of opportunists” raises an interesting question about the moral hazard of easy VC money and the Pareto front for funding effectiveness vs selectiveness.

Private-sector funding isn’t perfect

The private sector can also suffer from the same perverse incentives. Style is just as important as substance in raising capital. Investors don’t have the technical expertise for due diligence on highly technical topics. Individual researchers have inherent incentives to be risk-averse, regardless of the funding source.

Because, you know, when it comes down to it, the pointy-haired boss doesn’t mind if his company gets their ass kicked, so long as no one can prove it’s his fault.

Paul Graham, Revenge of the Nerds

As a friend of mine said, “Most VCs can’t do anything that would sound bad to the kind of doofuses who run pension funds.” Angels can take greater risks because they don’t have to answer to anyone.

Paul Graham, The Hacker’s Guide to Investors

Private money is in some sense philanthropic, in that most VCs are not able to effectively pick winners, and angel investors are motivated less by expected financial reward than by social credit. This harkens back to Renaissance patronage of the sciences (see also Alexey Guzey’s article).

[T]he median VC loses money. That’s one of the most surprising things I’ve learned about VC while working on Y Combinator. Only a fraction of VCs even have positive returns. The rest exist to satisfy demand among fund managers for venture capital as an asset class.

Paul Graham, Angel Investing

… angel investing fulfils [sic] a completely different purpose in Silicon Valley than it does elsewhere. It’s not just a financial activity; it’s a social status exercise. [….] From the outside, angel investing may look like it’s motivated simply by money. But there’s more to it than that. To insiders, it’s more about your role and reputation within the community than it is about the money. The real motivator isn’t greed, it’s social standing.

Alex Danco, The Social Subsidy of Angel Investing

Much of what counts for successful funding in the private sector is personality, not substance:

A lot of what startup founders do is just posturing. It works. VCs themselves have no idea of the extent to which the startups they like are the ones that are best at selling themselves to VCs. [6] It’s exactly the same phenomenon we saw a step earlier. VCs get money by seeming confident to LPs, and founders get money by seeming confident to VCs.

Paul Graham, What Startups Are Really Like

The Silicon Valley tech ecosystem is a world of pattern matching amidst uncertainty.We pattern match ideas, we pattern match companies, but most of all we pattern match people. […] If you’re too different, you won’t fit the pattern at all, so people will ignore you.

Alex Danco, Social Capital in Silicon Valley

This could be related to the fact that venture capital doesn’t have a great substitute for peer review:

First-rate technical people do not generally hire themselves out to do due diligence for VCs. So the most difficult part for startup founders is often responding politely to the inane questions of the “expert” they send to look you over.

Paul Graham, How to Fund a Start-up

As an aside, I suspect that the issue of hiring experts for due diligence is probably more on the demand-side. First, part of the prestige of being an investor lies in the appearance of discernment. Outsourcing that would reduce the social/emotional returns. Second, it creates a principle-agent problem. Third, there’s probably some value to NOT having technical experts involved, because they would decrease the diversity of what gets funded.

There’s no silver bullet

Innovative people gravitate to startups, so it’s hard to tell cause from effect:

There’s no magic incentive structure that fixes this problem. If you think startups encourage innovation because early employees are rewarded disproportionately for success, think again. Startups benefit from selection bias — people looking to play it safe don’t take a job with a startup in the first place.

Jocelyn Goldfein, The Innovation Dead End

To the extent that you only live once, you can’t socialize the risk of taking a big gamble on your project:

As individuals, we have no portfolio strategy […] When we fail, most rational people respond by trying to avoid dumb ideas and pick smart bets with clear impact the next time. […T]he self-same employees who are innovating and taking risks with you today are going to become risk averse and careful once they start failing — and sooner or later, they (and you) will fail.

Jocelyn Goldfein, The Innovation Dead End

Stagnation is a function of the age of an organization. (Cf. Collapse: How Societies Chose to Fail or Succeed – complex societies respond to challenges by increasing in complexity, until diminishing returns set in.)

I have a hypothesis that a lot of the useless bureaucracy isn’t a function of government/private, or size of organization, it’s time. As an organization gets older, it’s gotten burned more times, and created more rules to deal with it. For example, all the “normal” space launch companies and NASA […] killed astronauts, and had a huge number of rockets explode […] This resulted in the creation of rules to attempt to reduce risk. […] SpaceX last year had a failure where they destroyed a customer’s satellite during an on-pad test where the satellite didn’t really need to be mounted. […] I’ll bet SpaceX is going to be a lot more cagey in the future about test firing rockets with payload aboard.

Eventually, a private company will go out of business, but the Government won’t declare bankruptcy. So there’s less limit on governments getting bloated with stuff like this.

CatCube on SlateStarCodex

In practice, there’s no surefire algorithm to tell the difference between truly novel good ideas and truly novel bad ideas.

[I]nnovative ideas are roughly indistinguishable from dumb ideas. If something seems like a clearly good idea, it is also an obvious idea. So everyone is doing it! The big ideas that fundamentally change the status quo usually have fatal-seeming flaws that keep (almost) everyone from pursuing them….

OK, OK, so picking good ideas is hard, but that’s what defines great innovators, right? The ability to tell the “crazy” idea from the “crazy awesome” idea? It would be nice to believe that — but empirically speaking, there’s no evidence to support it. The best track record of repeated technical innovation in high tech is the collective Silicon Valley startup scene. And guess what — 9 out of 10 VC-backed startups fail. The absolute best-of-the-best, most profitable VCs? Maybe a 25% success rate.

Jocelyn Goldfein, The Innovation Dead End

Maybe this mistaken attitude toward risk actually came from the business world and infected the government (which would explain why the government used to be able to do innovative things).

[T]he mantra of this technocratic system of management is the word “risk”, which if you do a word analysis, didn’t really exist in political coverage until the mid 80s. It comes from finance, but as economics colonised the whole of politics, that word spread everywhere, and everything becomes about risk-analysis and how to stop bad things happening in the future.

Politics gave up saying that it could change the world for the better and became a wing of management, saying instead that it could stop bad things from happening.

Adam Curtis, quoted in an interview The antidote to civilisational collapse

Intuitionism, probability, and quantum

This is a bit speculative. If you have expertise in this area, I’d love to hear your thoughts. The basic idea is that one can get something like intuitionist logic as a meta-logic derived by relaxing from a functional truth-valuation to a relational one. This logic, when extended to probability, might form a useful basis for modeling common-sense reasoning. A slight reinterpretation might form the basis for a novel perspective on quantum probabilities.

Intuitionism by ‘lifting’ classical logic

Classical logic considers truth-valuation to be a function from statements to a Boolean (denoted as 0,1 here). Relaxing the valuation from a function to a relation leads to a new valuation that is a function on the power-set of the Booleans: { {}, {1}, {0}, {0,1} }. Let’s say that the relation is “there exists a proof that.” We can have a proof-valuation that says “statement A is related to 0 and 1” which is read as saying that there exists a proof that A is true and also a proof that A is false — which means we have derived a contradiction. If the proof valuation is {}, then we say that there does not exist a proof about the truth value of A — which means that A is undecidable. Relating a statement to either {0} or {1} means that we can prove that A is true (and not false) or vice versa, which is what is meant by constructivist T/F values. Denote the proof-valuation {0,1} as C for ‘contradiction’, and {} as N for ‘undecidable’. Let’s denote the proof-valuation function as V.

We can also talk about membership of 0,1 in the return value of V: “0 in V(A) and not 1 in V(A)” is equivalent to V(A)={1}=T which means that S is proven to be true and not false. This is stronger than in classical logic where we might derive a contradiction without knowing it. That is, proving S “is” true is not incompatible with proving that it is also false in classical logic — the axioms are inconsistent in that case. The proof-valuation thus seems a bit too strong: V(A)=T is equivalent to “A is proved true and this system is consistent.”

We might wish to make weaker statements that do not assert consistency of the system. In that case, we could state that “1 is in V(A)”, which means that it is possible to prove that A is true, but without making any claim about whether A can also be proved to be false. This corresponds to the classical-logic proof that a statement is true, at least in a system subject to the incompleteness theorem. Another way to state this: V(A) is in {{0,1},{1}}.

We can make other ‘vague’ statements like this as well. Let’s also let denote the set { {1}, {0} } as R for a ‘real’ statement. All statements in classical logic fall into this category, so that the law of the excluded middle holds. Denote { {}, {0,1} } as U for ‘unreal.’ Denote { {} ,{0} ,{0,1} } as D for ‘not true’ and { {}, {1}, {0,1} } as G for ‘not false’, and M and L for { {1}, {0,1} } and { {0}, {0,1} } for ‘possibly true/false (but also possibly inconsistent)’. It’s important to note that saying V is in D (‘not true’) is not inconsistent with saying that V is in G (‘not false’): if V is in their intersection, then V in { {}, {0,1} }, which means that either V is undecidable or the system is inconsistent.

These values live in the powerset of the powerset of the Booleans: we’ve now climbed up two meta-levels above classical logic. We might want a third valuation, distinct from the truth- and proof- valuations, to discuss this level. Let’s call this the justification-valuation, J. Thus, J(A) = R is read as “there is justification that the proof-value of A is R.” One can continue up this meta-ladder indefinitely, but for now this is as high as I want to go. At this level, we can already discuss many interesting topics.

For instance, we can see that classical logic, as noted, is tantamount to the assumption that all statements have J(A) =R: all are well-formed AND the system of axioms is consistent. Proof-by-contradiction is reified: the proof-value of A is exactly C={0,1}. Arriving at a contradiction means that the system is not closed in R. We also have a more natural way to express statements that are undecidable, or might be undecidable. Providing a constructive proof takes the justification-value of a statement from a more-vague to a less-vague state. Framing intuitionist logic as simply dropping the law of the excluded middle prevents discussing the reasons why a proved truth-value may not be forthcoming (ie, distinguishing undecidability from contraditions).

Intuitionism and probability

In this section, I’m going to slightly reinterpret the meaning of the valuation V: rather than proof, I’d like to talk about V as “having empirical evidence (+ reasoning) that demonstrates.”

Suppose Q(A) is a ‘metaprobability’ of A: the probability distribution over the possible values of V(A). The ‘regular’ probability P(A) for a statement to be either true or false because the conditional probability for P(A) given that V(A) is in R: that is, assume the question is ‘real,’ what are the relative probabilities of the truth values? This corresponds to the common notion of probabilities, but it’s not the whole story because intuitionist logic seems to be somewhat more natural for folks. Constructing probability theory for intuitionist logic might resolve some unintuitive effects of using standard probability theory for everyday reasoning.

Let L be the indicator function that denotes whether the question implied by A is a real question or not, ie L(A)={1 if V(A) in R, 0 if Q(A) in U}. Q(A | L(A)=1) denotes the probability of the statement being proved at all, and the total probability is Q(A)=Q(A|L(A)) * Q(L(A)). The conditional metaprobability Q(A|L(A)) is all that exists in classical probability theory. This new theory relates to Dempster-Shafer theory, in which probabilities need not sum to 1: in this case, it is because the metaprobability does sum to 1, but there are additional states other than T/F that the probability can ‘leak’ into.

If Q is high for V(A)=C, this indicates we have strong belief that a contradiction exists — we’ve assumed contradictory information. This results in revising our model, questioning assumptions. By operating at this meta-level, our reasoning system can cope with contradictions without breaking down.

If Q is high for V(A) = N, this means we don’t have enough information to trust our answer. This kind of situation drives us to gather more data (observation or experiments).

If Q=0.5 for V(A)=T and V(A) =F, then we are very confident that our reasoning+evidence is correct, but that we cannot answer the question better than chance. This allows us to distinguish from the situation where we have no evidence — where Q is large for V(A) = N. In this situation, we might still assign a small but finite metaprobability to each of V(A)=T and V(A) = F, according to Laplace’s principle of insufficient reason. But more importantly, now we can also encode that we have very little confidence in our determination that these two probabilities are equal! This is the key issue motivation for Dempster-Shafer theory. Their theory gets hung up on how to interpret the missing probability; much the same as the difficulties with the use of ‘Null’ in programming & databases. Here we have two distinct possibilities (for under/over-constrained truth values, respectively N/C) with very clear meanings.

As far as betting goes, if an agent has that meta-probabilities for T,F are both small, the agent might prefer not to bet, even if the odds ratio between T & F is large. Alternatively, perhaps one could assign the remaining meta-probability equally between T & F, for the purposes of taking the expectation. In that case, the effective log-odds would be roughly zero, even though the log-odds of the T/F split is quite large. Then the risk/reward tradeoff looks very different. This needs to be fleshed out.

Intuitionistic extensions to Bayesian reasoning

Bayesian probability theory is an extremely powerful tool, but it has limitations. The following quote speaks to a key limitation, one I’ve experienced first-hand:

It is very important to understand the following point. Probability theory always gives us the estimates that are justified by the information that was actually used in the calculation. Generally, a person who has more relevant information will be able to do a different (more complicated) calculation, leading to better estimates. But of course, this presupposes that the extra information is actually true. If one puts false information into a probability calculation, then the probability theory will give optimal estimates based on false information: these could be very misleading. The onus is always on the user to tell the truth and nothing but the truth; probability theory has no safety device to detect falsehoods.

G. L Bretthorst, “Baysian Spectrum Analysis and Parameter Estimation,” Springer-Verlag series ‘Lecture Notes in Statistics’ # 48, 1988, p30-31. [Emphasis mine]

I should disclaim this by noting that the log of the evidence (that is, the Bayesian generalization of the ‘chi-squared’ value) does indicate when an event is extremely improbable according to a model. Nevertheless, there are pitfalls here, which intuitionist probability theory could avoid, by providing a way to explicitly represent contradictory and insufficient information in a direction orthogonal to the odds ratio between the probabilities for truth and falsity. It seems to me that probability is overloaded, but not because probability theory needs to be modified. Rather, common-sense reasoning is more akin to intuitionist logic, while probability theory is the natural extension of classical logic. Thus, creating a probability theory appropriate for intuitionism might prove valuable for modeling common-sense reasoning.

As a concrete example, consider the following scenario posed by N. Taleb: you are told that a coin is fair, but then you watch as the coin is flipped and results in 100 ‘heads’ in a row. You are offered a bet on the next toss. How would you bet? A naive application of probability theory would result in still believing the initial assertion that the coin was fair, while more sophisticated application would conclude that the coin toss had been rigged. We would be less surprised to have been lied to than to see an event with 30 orders of magnitude of improbability. However, if we had rounded our initial probability of being lied to down to zero (suppose we had reason to trust the person tossing the coin), then we would never be able to update our probability to a finite value – Bayesian updating is always multiplicative.

We have some freedom to determine the updating rules for intuitionist probabilities. In particular, one might desire that probability is transferred away from ‘classical’ T/F values and toward C when we have contradictory evidence/assumptions. Also, by reserving probability for N when we have little/no information, we could set T and/or F to zero and still allow them to become finite later. I haven’t worked this all out yet.

Intuitionism and quantum probability?

I have a hunch that one could interpret the complex probability amplitudes in quantum mechanics by a similar process of ‘lifting’ the state space from classical truth values to some other meta-valuation, possibly the ‘observation’ valuation. That is, one would say “the probability that statement A is observed to be true is 50%”, rather than saying “the probability that the statement is true is 50%.” This is in the spirit of the Copenhagen interpretation of QM. Note that, before an event has occurred, the probability of its outcome having been observed to be anything is zero! That is, the N-value (not enough information) comes into play naturally. This also takes care of the counterfactual difficulties: since the measurement was not conducted, the probability of the outcome will always be 100% on the N-value.

On the other hand, we can rule out the possibility of contradictory observations: the C-value always has zero probability, because objective reality is single-valued. I suspect that this, together with the usual normalization condition, suffices to determine the Born rule. Together they reduce the degrees of freedom of the meta-probability from 4 to 2. This allows mapping the meta-probability to a complex number, which also has two degrees of freedom. I haven’t figured out how to do this mapping yet.

Possible interesting consequences: wavefunction collapse might viewed as the transition from indeterminate state (N) to definite state (T/F). There could be a change in the reason that the marginal probabilities are 50/50 for the outcome of one of the spin measurements in a Bell entanglement experiment when the polarizers are perpendicular: initially, it would be due to lack of evidence (ie, all metaprobability is concentrated on the N-value), but once the measurement on the other side has been made and the result transmitted, the probability switches to being 50% true and 50% false, because now the outcome is a matter of classical probability. Strange quantum-information effects such as negative conditional entropy and superdense coding might be easier to comprehend in this interpretation.

This idea was partly inspired by the two-state-vector formalism, which holds that the wavefunction is not a complete determination of the quantum state. Rather, two wavefunctions are necessary, one from the past and one from the future. Representing the incompleteness of the state via constructive/intuitionistic logic seemed natural: the ‘truth’ of theorems are contingent upon the actual historical progress of mathematics. Rather than being a Platonic ideal that is true or false for all time, theorems only assume a ‘truth’ value once a proof (or disproof) exists. This same contingency (or contextuality, if you will) naturally applies to statements deriving their truth from the observed outcomes of experiments.

[Edit: This was inspired in part by Ronnie Hermens’ thesis: “Quantum Mechanics: From Realism to Intuitionism –A mathematical and philosophical investigation“]

IDL gotchas and misfeatures

IDL is a popular language in the plasma physics & astrophysics world. It has a number of pitfalls and poor design choices that lead to nasty pitfalls and gotchas. Here’s a list of the ones I’ve encountered. Maybe this will save some poor grad student a headache.
“Versions of IDL prior to version 5.0 used parentheses to indicate array subscripts. Because function calls use parentheses as well, the IDL compiler is not able to distinguish between arrays and functions by examining the statement syntax.”
This causes all kind of problems. One nasty one is that a function that IDL can’t find, which has keyword arguments when called, will give a non-descriptive ‘syntax error,’ because it is interpreted to be an attempt to index an array using a keyword argument! This is a very common problem when dealing with a new codebase, because if you don’t set your bash config file up to export the correct directories in the IDL_PATH environment variable so that IDL can find the libraries where utility functions are defined, then you’ll get dozens of errors like this. It took me forever to realize what was happening! 😡
There is no ‘try.’ Error catching works by jumping backward in the file to the place where the error-handler is located, executing whatever is in that block, and then trying the failed statement again. There is no way to go around a statement if it doesn’t work.

You can ‘declare’ variables by passing them to a function! They are essentially null variables, and then, in lieu of the procedure being turned into a function (with a return value), the output of the function is placed into the null variable as a ‘side effect.’ This is extremely confusing when reading someone else’s code, if you are coming from another language that makes any kind of sense. Because a function can only produce a single output, some people prefer this backward method of returning data from functions.

The ‘help’ command only helps you if you already know whether an argument is a function, procedure, or data structure. Because you can index an array using parentheses rather than brackets, sometimes you can’t tell if an object is a function or an array by looking at the context!

Common blocks: if you ‘reference’ a common block (or load a save file for that matter) it pulls in the variables as named in the block/file and dumps them straight into the local namespace! This can overwrite other variables of the same name. It’s like python’s “from numpy import *” but for data structures. You can optionally supply a list of aliases for each of the variables you want to ‘import’ from the common block, but if you want the n-th one only, you have to assign junk names to the other n-1 variables that come before it! You can’t just index in, or even address it by name!

Automatically leaves you at whatever place the error occurred in your code. Good for debugging, terribly confusing though when you aren’t used to it, and expect to be back at global scope.

Imports modules on the fly when you call a function. Unlike python where you must name the module first (although you can be sloppy in python and import * from a package, which has a similar effect).

In conjunction with the problem of being able to declare a variable by passing it, you are able to pass an undefined variable, and the traceback indicates that the problem is occurring inside the called function, not the calling one! This one will gaslight you hardcore, because you assume that the incoming variable exists or otherwise your call would have failed, not the function evaluation!

The .compile command doesn’t work inside a .pro file, unlike an import statement in python. You have to use .compile at the interpreter or something @’ed into the interpreter. It’s not clear if there is a way to get around this restriction and specify which file you want to compile a procedure from, other than messing with the IDL path. So the scripting environment and the programming environment don’t behave the same.

If you slice an array down to length = one in some dimension, that dimension only goes away if it is on the end (ie, if you do x[*,i,*] you get back x[6,1,7], but x[*,*,i] =x[6,9] and x[i,*,*] =x[1,9,7].

If you make a mistake when broadcasting arrays, you won’t know it, because array combination just truncates to the smallest length along each dimension.

If your plots all show up red instead of whatever color they were supposed to be, make sure to state “device, decompose=0.” This allows color numbers to be interpreted in terms of a color table, rather than as hex representations of RGB colors. You’re probably using “loadct,39” to load the jet-like color table. Set “!p.color=0″ to set the text/axis color to black, then set ” !p.background=255″ to set the field of the plot to white, and finally, “!p.multi=0” to get rid of subplots.

Suppose you make an anonymous structure s that contains an array x[5,10], among others, and then you make an array of those structures: ss[i]=s for i=0 to 20. Then you look at ss.x and see a [5,10,20] array! However, you can’t index this array properly: the last dimension is an illusion. If you select any one of the elements from the first two dimensions (ie, ss.x[4,5]) you get back an array that is 20 long. But, you can’t do this: ss.x[*,*,0], because that last dimension is a phantom. You have to do this: ss[0].x[*,*] to access the array that way. So confusing!

Also, I just got an overflow from using 201*201 — the default for a string literal is too short!

If you do a contour overplot, you have to specify the levels for the second contour plot explicitly (can’t use nlevels!) or else it will default to the set of level values from the previous contour, which means you generally don’t see the second set of contours and have no idea why not (could have been the x/y range either!, or a color issue, or…)

Fusion power density

In this post, I’m going to consider the power density issue for fusion energy. The focus for now is on the technical obstacles, rather than the economic motivation for increasing power density. I’ll be referring back to the previous post summarizing the Jassby & Lidsky critiques.

Fuel choice

The deuterium-tritium (D-T) fusion reaction involves significant disadvantages due to tritium & high-energy neutrons. If we eliminated radioactivity and tritium, that would nullify all the political/societal issues, two of the three economic issues, and one of the purely technical issues, while simplifying another. Lidsky proposed changing research policy to look for reactor concepts that could use alternative fuels. So, what’s the catch?

The maximum fusion reaction power density (at a given reactor pressure) is ~50x more for D-T than for any other fuel choice. This is not only bad for the reactor economics, but may prevent the reactor from functioning at all. Lowering the power density at given conditions also makes it that much harder to achieve net energy gain. For instance, if a reactor using D-T had a gain of 25, switching to aneutronic fuel would make the gain <1, so the reactor would not produce power.

The next best fuel choice, deuterium – helium-3, requires mining the moon or the gas giants – this rules it out for the foreseeable future. It also only yields a 10x reduction in neutrons (or maybe more, but at a further cost to the power density). This reaction would eliminate concerns about tritium leakage and breeding. It might relieve some of the difficulty of materials selection, replacement rates, and radioactive waste, but the reactor would still be too radioactive to repair easily, and could still produce weapons material easily.

The deuterium-deuterium reaction has the 3rd highest reaction rate. Like deuterium – helium-3, this choice also avoids tritium breeding and handling. Better yet, it doesn’t require exotic space mining. However, neutron production is only reduced by a factor of ~2 compared to D-T. (Possibly the neutron production could be lowered somewhat further, but again this would come at the expense of power density.) Pure deuterium reactors might be useful in hundreds/thousands of years if lithium (used for breeding tritium) becomes scarce – the oceans have enough deuterium for millions of years of consumption.

The 4th reaction is hydrogen-boron. The peak power density is about 500x lower than D-T. In fact, it’s just barely above the power losses due to X-ray radiation – leaving very little room for any other losses to be allowed.[1] However, this reaction produces very little neutron radiation & activation, and does not involve tritium. Boron & hydrogen are both plentiful. This would be the ideal fusion fuel – if massive breakthroughs in plasma confinement could be made.

The up-shot is that in order to maximize the power density, D-T is the best choice. The next questions are, how much power density can we get, and how much do we need?

Power density: getting it

The thermal output power density of a light water reactor is around 50-100MW/m3, considering the volume of the pressure vessel — can fusion match this? The answer is ‘yes’ — at least in principle. This doesn’t imply that fusion can compete economically with fission, nor that pushing the power density this high optimizes the economics of fusion. Nonetheless, it’s an area where Lidsky’s critique doesn’t hold up any more.

Fusion power density scales as the plasma pressure squared: at the optimum temperature, the DT reaction yields 0.34 MW/(m3 bar2). For a magnetically-confined plasma, the plasma pressure is less than or equal to the magnetic field ‘pressure’ (energy density) — the ratio of the two pressures is called ‘beta.’ The magnetic pressure scales as the magnetic field squared, with a coefficient of about 4 bar/T2.

The ‘beta’ ratio needs to be as high as possible — this favors concepts like the FRC, (100%), Z-pinch (100%), magnetic mirror (40-60%), spheromak (~40%), or reversed-field pinch (~25%), compared to tokamaks, which top out around 10%, and stellarators (1-5%). (The Z-pinch is unique because it doesn’t have external magnetic coils – a current flowing through plasma supplies the magnetic field. The maximum achievable field strength is not limited by the capability of superconductors.)

To put some numbers on the power density: Assuming the ‘beta’ ratio is 100%, then for a 5 T magnetic field, the maximum possible fusion power density is around 3400 MW/m3. (5 T is about the limit with existing ‘low temperature’ superconductors.) However, when averaging over the plasma volume, the achievable power density is perhaps 20% of this number, because the pressure rises gradually from the plasma edge to the center. Still, that’s around 680 MW/m3. Suppose the plasma is cylindrical with radius about 1 meter, and it has a shield of about 1.5 meters thickness surrounding it (to breed tritium, extract heat, and protect the magnets). The power averaged over the volume of the outer cylinder would be around 100 MW/m3.

Thus, it’s possible in principle to have comparable power density to a fission reactor, even using magnetically-confined fusion. Lidsky assumed a tokamak with ~5 T magnetic field (the limit given the existing superconductors at the time), which only has about ~10% ‘beta.’ Thus, the power density would be 100x lower, around 1 MW/m3.

The high-beta approach is one way to attack the power density problem. Higher magnetic field possible with new REBCO superconductors is another avenue. If 16 T is possible, as seems to be the case, then power density of a 10%-beta tokamak would be the same as a linear device with 5 T field and 100% beta – around the 100MW/m3 mark. Combining high-field superconductors with a 100%-beta reactor could potentially allow advanced fuels to reach power density near 100MW/m3 as well.

Power density: dealing with it

In the majority of this section, I’m assuming we stuck with D-T fuel. I’ll address hydrogen-boron at the end of this section. For the D-T reaction, 20% of the fusion power is released as charged particles (helium nuclei), which heats the plasma. For our hypothetical 1-meter radius cylindrical plasma column, the ratio of the charged-particle power to wall surface area is about 20 MW/m2. This is comparable to the heat load on re-entry, and only 1/3 of the heat flux at the surface of the sun! Beam dumps and divertors for tokamaks are required to withstand 10-20 MW/m2. This appears to be close the limits of what is achievable with known materials. Also, the thermal conductivity of materials tend to degrade under neutron radiation, as the crystal structures become disorganized. Thus, even if we can produce comparable power density to a fission reactor, we may not be able to cope with the resulting heat flux. Is there a work-around?

For toroidal devices (tokamak, stellarator, RFP), the plasma is topologically trapped inside the coils, so the charged-particle portion of the power must[2] exit through the wall. Linear systems like the FRC and mirror get a free pass — the magnetic field lines can extend out of the cylindrical vessel and flare out, so that the heat is deposited over a larger surface area. Some of the power will still be radiated onto the vessel wall, but it might be as little as 10% of the total heat flux for D-T plasmas. (Note that in tokamak designs, the heat flux is concentrated at the divertor, so the problem is even worse than if the heat were distributed uniformly. It may be possible to spread the heat uniformly by intentionally introducing impurities to increase the X-ray radiation from the plasma, however.)

However, there’s another problem beyond heat flux. For our hypothetical D-T reactor, the neutron flux escaping the plasma is 80 MW/m2. This translates to about 900 displacements per atom (dpa) per year at the first surface, for steel. Steel is likely to only survive about 100-200 dpa before needing replacement. Replacing the first wall several times per year is probably a show-stopper, as it: (1) eats into the capacity factor (2) increases operations & maintenance costs (3) results in a large volume of (low-level) radioactive waste.

Stellarator & tokamak designs typically call for ship-in-a-bottle robotic assembly of the blanket & plasma-facing components inside the cage formed by the magnet coils. The estimated time to repair/replace the first wall is in the range of months — clearly this cannot be done every month! Hence, existing tokamak designs are driven to low power density in order to prolong the life of the first wall. The ARC study proposed disassembling the magnets & lifting the inner components out in one piece — still far from simple & quick.)

Linear devices have an advantage from a maintenance perspective, compared to toroidal designs. However, even if replacement is quick & simple, it’s better to maximize the lifetime of the wall components, to reduce radioactive waste. Some materials may survive longer than others, but there are not many elements to chose from, in order to avoid producing high-level waste.

An obvious way forward is to replace the solid material walls with flowing liquid metal or liquid salt. The liquid should contain lithium for breeding tritium. A layer of 50 cm of liquid FLiBe salt would reduce the flux from 900 to around 10 dpa/yr, allowing a 20-year lifetime for the first solid surface. Another option is lead/lithium alloy. Liquid first walls solve both the neutron damage problem and the heat flux problem, if the flow is fast enough. There are several drawbacks, however:

  • Splashing of droplets into the plasma must be prevented — splashes could extinguish the plasma unexpectedly
  • Plasma sensors and actuators (RF antennas, particle beam or pellet injection, etc) would be hard to accommodate
  • If the liquid is metal, the magnetic field can increase drag and result in large energy consumption for pumping the liquid
  • There may be corrosion problems, especially for liquid metals, but also for salts
  • The coolant temperature must be kept low enough not to poison the plasma due to heightened vapor pressure — this restricts the thermodynamic efficiency of the turbines used to produce electricity.

Power handling: with hydrogen-boron

For the hydrogen-boron reaction, essentially all the charged-particle heat flux would emerge as X-rays hitting the wall. At 100MW/m2, a liquid first surface is probably a necessity for this fuel as well, to handle the heat flux. The choice of liquid is more flexible, since tritium breeding is not required.

Fusion ‘Fuel Rods’

Fission reactors reduce the heat flux challenges they face by splitting the fuel up into many long, thin rods to reduce the volume-to-surface-area ratio. In principle, this could be done for fusion as well. The problem with splitting up the fusion plasma is that fusion gain is dependent on having good thermal insulation of the plasma – and reducing the volume-to-surface-area reduces the insulation value, so to speak. For fission reactors, this effect is actually beneficial, as it keeps the temperature at the center of the fuel rods below the melting point of the fuel, whereas fusion reactions need to be kept hot at the center. I don’t want to dive into the physics of plasma transport at this point in series, but for now I’ll say that it seems unlikely that the ‘fuel rod’ approach would work.


All of engineering is trade-offs. Optimizing individual components of a system in isolation doesn’t generally lead to the optimal system. The optimal fusion reactor might not involve pushing the power density all the way to the maximum. Nonetheless, it appears to be possible in theory to achieve power density comparable to a fission reactor, contrary to the assertions of critics. The trade-off is that liquid first walls would probably be required, to cope with the extreme neutron &/or heat fluxes produced. Liquid first walls have their own disadvantages. Power densities around 10 MW/m3 or less would be more feasible. It remains to be seen if this is sufficient to make fusion economical.


[1] (See Fig. 4 of “Fusion reactivity of the p-B11 plasma revisited” by S.V. Putvinskiet al,Nuclear Fusion 59 076018 (2019))

[2] It might be possible to convert most of the plasma heat exhaust to some form of directed energy (the Carnot efficiency of a heat engine operating at thermonuclear temperature is > 99%), but it’s not been demonstrated for a thermal plasma.

Summary of fusion critiques

This is the third post in a series on nuclear fusion. The goal of the series is to address assumptions made in critiques of fusion energy, with an eye toward solutions. Lidsky’s and Jassby’s critiques overlap a bit. Here’s a combined list of the main arguments, broken into 3 categories. In this series, I’ll focus on the issues that impact the economics. The technical & societal/safety issues do feed into the economics, so I’ll comment on those places as they come up.

Summary of critiques

  • Economic issues:
    1. Fusion reactors will have lower power density while being more complex, compared to fission reactors. Therefore, they will be larger, more expensive, and slower to construct, hence uneconomical.
    2. Radiation damage will require frequent replacement of core parts of the reactor. This replacement, being complicated & robotic/remote, will be slow. Therefore, the down-time fraction of the plant will be large, and the operations & maintenance cost will be high, making it uneconomical.
    3. The complexity of a fusion reactor makes accidents more likely. An accident that releases radioactivity would lead to decommissioning of the plant because repair would be impossible. Therefore fusion plants are economically risky.
  • Safety/societal issues:
    1. Although fusion reactors will (with appropriate material choices) have less high-level radioactive waste, they will produce much more lower-level waste (as compared to fission reactors).
    2. Fusion reactors can be used to breed plutonium (weapons material) from natural uranium. Tritium is also a weapons material.
    3. Accidents & routine leakage will release radioactive tritium into the environment.
  • Primarily technical issues:
    1. Tritium breeding has narrow margins & probably won’t be successful.
    2. Fusion reactors require siphoning off some of the electricity output to sustain/control the reaction. This ‘drag’ will consume most/all of the output electricity (unless the plant is very large).
    3. Materials that can withstand large heat fluxes and large neutron fluxes and don’t become highly radioactive are hard to find.

Assumptions used in critiques

There are a number of assumptions made in the critiques, some of which are unstated. The most consequential ones are:

  1. The deuterium-tritium reaction will be used.
  2. The reaction will be thermonuclear (ie, no cold fusion, muon catalysis, highly non-Maxwellian distributions, etc)
  3. Reactors will be steady-state magnetic confinement devices.
  4. Specifically, the magnetic confinement device will be a tokamak.
  5. Magnetic field coils will be limited to about 5 Tesla field strength.
  6. The reactor first wall will be solid material.

Not all the critiques depend on all the assumptions. I’ll indicate which assumptions are involved in each critique item. Notably, 7 of the 9 critiques involve the radioactivity & tritium usage of fusion. This motivates considering ‘aneutronic’ fusion reactions. However, aneutronic reactions produce orders of magnitude lower power density (all else being equal) versus deuterium-tritium. Thus, most fusion efforts focus on deuterium-tritium despite the radioactivity & tritium concerns. My next post will discuss the fuel cycle trade-off as part of the power density discussion.

Anti-perfectionism advice

I used to subscribe to the Mark Twain philosophy:

It is better to keep your mouth closed and let people think you are a fool than to open it and remove all doubt.

Mark Twain

It was an excuse I used to justify my perfectionism. Here are some antidotes that have helped me overcome this tendency & put my thoughts out here.

A book is never finished. It is only abandoned.

Honoré De Balzac

Ideas get developed in the process of explaining them to the right kind of person. You need that resistance, just as a carver needs the resistance of the wood.

Paul Graham, “Ideas

See also: rubber duck debugging

Writing is nature’s way of letting you know how sloppy your thinking is.

Leslie Lamport

Constantly seek criticism. A well thought out critique of whatever you’re doing is as valuable as gold.

Elon Musk

The best way to get the right answer on the internet is not to ask a question; it’s to post the wrong answer.

Cunningham’s law

It is better to be interesting and wrong than boring and right.

Fred Hoyle

Don’t worry about people stealing your ideas. If your ideas are any good, you’ll have to ram them down people’s throats.

“Howard Aiken, quoted by Paul Graham in “Googles

Don’t worry about being perfect. Make it bad, then make it better.

Robert Glazer

If you want to be certain then you are apt to be obsolete.

Richard Hamming, You and Your Research (1986) p187

We have a habit in writing articles published in scientific journals to make the work as
finished as possible, to cover up all the tracks, to not worry about the blind alleys or
describe how you had the wrong idea first, and so on. So there isn’t any place to publish,
in a dignified manner, what you actually did in order to get to do the work.

Richard P. Feynman, Nobel lecture 1996

Survey of energy costs for comparison with fusion

My previous post introduced Lidsky’s The Trouble With Fusion, which claims to show that fusion won’t be economically competitive with nuclear fission power. Of course, beating fission on cost isn’t even setting the bar very high. In this post, I’m taking a quick look at the competition & where the market is heading.

TL;DR: fusion needs to compete directly on wholesale electricity costs in the future. Current prices are around $40-50/(MW*hr), but fusion should be shooting for $20-25/(MW*hr) in order to take over rapidly and stay ahead of renewables. Fusion is capital intensive, like fission. To achieve this low target, fusion plants need to cost less than $2/W (which is about twice what combined-cycle gas turbine plants cost). A study of several innovative low-cost fusion approaches found costs in the range of $5-13/W, maybe reaching $2-6W when scaled up.

The on-ramp

Peter Thiel lays out an argument that diving directly into the wholesale electricity market is not wise:

Any big market is a bad choice, and a big market already served by competing companies is even worse. This is why it’s always a red flag when entrepreneurs talk about getting 1% of a $100 billion market. In practice, a large market will either lack a good starting point or it will be open to competition, so it’s hard to ever reach that 1%.

Zero to One, p54

It’s better to have a so-called “Tesla Roadster” or a “hair-on-fire use case:” some smaller market where your initially-more-expensive first-generation products can compete. This market needs to be sealed-off from the broader market. In other words, your product must have some special feature that prevents generic products from being substitutes. In terms of distinguishing features, neither fusion’s safety advantage over fission, nor its zero-carbon advantage over fossil fuels, are enough to induce utilities to pay a premium for fusion above the market rate. Even if a carbon tax were levied, fusion would still be on the same playing field as fission and renewables. Fusion, like fission, is primarily suited for baseload, not dispatchable power.

Thiel specifically addresses clean energy’s on-ramp:

Cleantech companies face the same problem: no matter how how much the world needs energy, only a firm that offers a superior solution for a specific energy problem can make money.

Finding small markets for renewable energy solutions will be tricky — you could aim to replace diesel as a power source for remote islands, or maybe build modular reactors for quick deployment at military installations in hostile territories.

Zero to one, p 170-171

Currently-envisioned fusion reactors are not suitable for remote locations — they are too large to be shipped pre-assembled, and the demanding assembly process calls for a high-tech supply chain that doesn’t exist in remote locations. Consequently, fusion start-ups typically target the wholesale electricity market directly (see for instance the Deployment section at Type One Energy’s home page). Some exceptions: Phoenix Nuclear is using low-gain fusion reactions to produce neutron sources for medical isotope production or neutron imaging. Similarly, TAE Technologies has a medical physics spin-off: boron neutron capture therapy. These low-flux applications could partly bridge the gap to fusion, but I suspect another intermediate will be necessary. Fusion-fission hybrid reactors have been explored, both for power as well as for breeding fission fuel or burning fission waste. The bottom line is that they are susceptible to melt-downs (being fission reactors), so they don’t have fusion’s safety advantage, and they have no advantages over fission breeder reactors. Industrial process heat and thermochemical fuel production are other uses for fusion power output, but fission could address these — yet it has not.

To summarize: fusion needs to compete on the wholesale energy market.

Energy costs

Here is a good source for energy costs. From this, I gather that the median LCOE of new nuclear energy is $80/(MW*hr), whereas natural gas combined cycle turbines (CCGT) is around $50/(MW*hr). Advanced nuclear might be as low as $40-60/(MW*hr). Renewable energy + storage is also in this territory, however. In sunny areas, LCOE as low as $30/(MW*hr) for photovoltaic-based electricity is already possible. Projections along the learning curve indicate that LCOE for PV as low as $10/(MW*hr) may be achieved by 2030. According to this article, storage might add $50/(MW*hr) on top. However, in the face of seasonal variations in renewables, it is cheaper to overproduce renewable energy rather than store it in batteries long-term. This opens the door for low-cost membrane-less hydrogen electrolyzers to step in. This online simulator is a fun way to explore zero-carbon energy possibilities. In particular, underground hydrogen storage looks like a promising way to deal with seasonal fluctuations. While the round-trip efficiency of electrolysis/compression/combustion of hydrogen is low, it doesn’t matter much since the energy will be inexpensive during times of over-production. The upshot is that while current electricity prices are in the $40-50/(MW*hr) range, renewables could put downward pressure on prices starting in the next decade.

This report shows that existing coal and nuclear can operate even if the cost of new plants is twice the going rate — but of course new plants won’t be built. This is the case right now. Basically, existing plants already have paid down their up-front costs, so they can operate below the cost of new generation. Because the capital costs of CCGT plants are only a small part of the LCOE (fuel is the dominant contribution), the gap between new & existing CCGT plants is smaller. In order to displace existing generation, the LCOE of new fusion generation would need to be around half the LCOE of new coal or nuclear — that is, around $40/(MW*hr). It would be better to significantly undercut even that price point, to stay ahead of renewables, to make a compelling argument for significant investment, and to allow some margin of error in case projections prove to be overly optimistic. Let’s set a goal of $20/(MW*hr) for fusion. What implications does that have?

Cost structure of nuclear energy

The fuel cost for fission is small — the capital cost dominates, followed by followed by operations & maintenance (O&M). Fusion will probably be similar, because fusion fuel is even less expensive than fission fuel. Using the formula in the above document and assuming a 25-year payback period, a 7%/yr financing rate, and 95% capacity factor, a capital cost of $1/W adds about $10/(MW*hr) to the LCOE. The cost estimates of nuclear fission plants are around $6/W, so the capital cost adds $60/(MW*hr) to a new plant, while the O&M expenses are around $20-30/(MW*hr). This is consistent with estimates of the LCOE of new nuclear.

A large combined-cycle gas turbine plant costs about $1/W or thereabouts — this is the lower limit for any turbine-based power plant. Suppose that we could produce a fusion power plant for $2/W. The resulting LCOE contribution is then around $20/(MW*hr), before we add any O&M costs. So, we’re hoping to get a fusion power plant that costs only about twice what it costs to build a CCGT plant, and has low maintenance costs, more like a gas plant than a fission plant. This is consistent with a DOE study that found that $2.2/W would be low enough to rapidly decarbonize the electricity market.

Cost studies of innovative fusion concepts

ARPA-E has spawned a program aiming to get the cost of a 1st-generation fusion plant down to around a few billion — see slide 16 of this presentation. A 2017 costing study was performed by Betchtel & others on behalf of the ARPA-E. The costing study report estimated $0.7 to 1.9 billion for 150 MW plants — about $5 to 13/W. Although the cost and size of the experiments are smaller than equivalent tokamak-based ones, the economics don’t work out because the power output is also low. The report speculates that $2-6/W might be possible if these designs are scaled up. Interestingly, the reactor itself only comprises 15-30% of the total plant cost in this study.

For comparison, the ARC design study found a cost of $5.3 billion for a reactor (not the full plant) that would produce 250 MW — about $20/W. They calculated $4.6 billion for the magnet support structure alone — this seems like an overestimate to me. They applied a factor of 100x in cost for that component compared to the cost of the raw steel. They estimated a cost of around $400 million for the raw materials, which is around $1.4/W.

I’m not aware of many recent cost estimates on fusion power plants. If you know of more, please send them my way.

Doing the impossible

All things which are proved to be impossible must obviously rest on some assumptions, and when one or more of these assumptions are not true then the impossibility proof fails—but the expert seldom remembers to carefully inspect the assumptions before making their “impossible” statements.

Richard Hamming, You and Your Research (1986), p182

We’re all familiar with the many dignitaries, including Kelvin and Edison, who declared heavier-than-air flying machines were impossible just a few years prior to the Wright brothers’ success at Kitty Hawk. In the field of nuclear fusion, there’s been a tendency recently to make references or analogies to Kitty Hawk. The question arises: is fusion at a Wright brothers moment, or a Da Vinci moment? (Da Vinci’s ideas for flight were ahead of their time — the Wright brothers’ were punctual.)

Rather than speculating on the future, here are a few examples of failed impossibility proofs related to nuclear fusion. I’ll tie it all together at the end with some further thoughts about the value of impossibility proofs as pointers to solutions.

Although it may not seem like it, the levitron magnetic top is quite relevant. The levitron consists of a small spinning ceramic magnet hovering above a larger one. There are no strings, no superconductors, and no electronics involved. According to Earnshaw’s theorem, static levitation using only permanent magnets is impossible. The inventor of the levitron, Roy Harrigan, was told by physicists that what he was trying to do was impossible — just another perpetual motion machine. Harrigan persisted, and eventually perfected the device. This is remarkable, because the levitron requires careful tuning — a great deal of patience must have been required. Once Harrigan demonstrated his invention, physicists were able to explain it: Earnshaw’s theorem is correct, but it only applies to static magnets — the levitron is actually in an orbit. (Source: Spin stabilized magnetic levitation, by Martin, Helfinger, & Ridgway, Am. J. Phys. 65, 4, 1997)

There’s a close analogy between the levitron and the field-reversed configuration (FRC) concept for nuclear fusion. Both the FRC and the levitron can be thought of as magnetic dipoles oriented the ‘wrong way’ in an external magnetic field. If it weren’t for rotation, the dipole would flip end-over-end to align with the external field. (In the FRC, the rotation in question is in the form of individual particles with orbits that are roughly the same diameter as plasma itself.) The FRC has several advantages over the tokamak. For instance, for a given external magnetic field strength, the FRC can support approximately 10 times the plasma pressure, leading to 100 times the fusion power density. TAE Technologies (my current employer) is pursuing this concept, as are others (such as Helion Energy and Compact Fusion Systems).

Here’s another example where a static plasma is unstable, but a moving one is stabilized: the shear-flow stabilized Z-pinch, pursued by Zap Energy. The Z-pinch is literally the textbook example of an unstable plasma configuration, because it is very simple to analyze, and it was one of the first attempted configurations for a nuclear fusion reactor. It has the advantage of not requiring external magnetic field coils — the magnetic field is generated by a current flowing through the plasma itself. The caveat is that the plasma must be in direct contact with electrodes at both ends which create the current — this is bad news for both the plasma and the electrodes! (Side note / self-promotion: the physics of the shear-flow Z-pinch is very similar to a phenomenon I discovered – flow stabilization of a cylindrical soap bubble. You can try this one at home.)

The violation of seemingly-innocuous assumptions is what links these examples. Earnshaw’s theorem assumes static levitation. The theorem is true — it just doesn’t apply to the levitron. Similarly, the ‘proof’ of instability of the FRC assumes that particle orbits are very small — a good assumption in other magnetic confinement configurations, but not for the lower magnetic fields involved in the FRC approach. In the case of the Z-pinch, the static plasma assumption seemed reasonable — the stabilizing effect of sheared flow was not appreciated for many years, so static plasma assumptions appeared to simply make the math easier.

…what is proved, by impossibility proofs, is lack of imagination.

J. S. Bell, On the Impossible Pilot Wave, Ref.TH.3315-CERN (1982)

Can we turn Hamming’s insight into a recipe for achieving the impossible? Bell’s quip gets to the heart of the problem: recognizing which assumptions to break, and in what manner, requires creativity, imagination, and perhaps intuition and luck as well. Not every assumption is critical, in that violating it will yield a solution. Also, there may be many ways in which an assumption could be violated or modified — not all of them productive. The productive paths are not always obvious, either. There’s also the question of which assumptions can be violated, practically-speaking.

Impossibility proofs are not without value — they rule out certain areas of search space, which can save time and effort exploring fruitless avenues. Assumptions simplify reasoning by removing variables — we can’t always work from first principles. Removing assumptions makes reasoning and calculations more difficult, so it’s important to be selective about which ones to remove. A useful impossibility proof is one whose assumptions are few, explicit, and necessary for the conclusion (that is, without the assumption a solution would become possible).

Returning to fusion energy: Lawrence Lidsky, former director of the MIT fusion research program, spelled out an ‘impossibility proof‘ for economically-competitive fusion energy in 1983. In a nutshell, Lidsky argues that fusion devices using the conventional deuterium-tritium fuel won’t be economically competitive with nuclear fission. For both fusion and fission, the dominant contribution to the cost of electricity is the cost of capital — the fuel costs are small, due to the enormous energy density of the fuel. On the other hand, fusion reactors will have lower power density (by 10 or 100 times) compared to fission, so they will be proportionately larger than an equivalent fission reactor core. Fusion reactors are also more complicated technology. Fusion reactions also cause more neutron damage to solid materials, requiring frequent maintenance. Therefore, fusion will be more expensive than fission — which is already not very economically competitive.

Lidsky’s critique of fusion has a large number of assumptions, some of them implicit, and not all of them necessary — in other words, it could be improved. I’m working on a series of posts addressing his assumptions, to see what can be learned from the arguments. My next post will be a look at contemporary energy economics, followed by consideration of how to achieve high power density in a fusion reactor.

Antifragile software?

Software systems are often fragile. One of the causes of software rot is changes to dependencies. In theory, updated versions of dependencies should bring improvements to the systems that use them. In practice, the result is often to introduce bugs.

Unison programming language seeks to address the robustness of software by freezing definitions.

Unison definitions are identified by content. Each Unison definition is some syntax tree, and by hashing this tree in a way that incorporates the hashes of all that definition’s dependencies, we obtain the Unison hash which uniquely identifies that definition. This is the basis for some serious improvements to the programmer experience: it eliminates builds and most dependency conflicts, allows for easy dynamic deployment of code, typed durable storage, and lots more.

When taken to its logical endpoint, this idea of content-addressed code has some striking implications. Consider this: if definitions are identified by their content, there’s no such thing as changing a definition, only introducing new definitions. That’s interesting. What may change is how definitions are mapped to human-friendly names.

A tour of Unison

What if it were possible to go beyond code that is merely robust to changes in the external environment, but improves monotonically as the environment changes? Can we have software that is antifragile? I think so. My idea is to hybridize the dispatch mechanism of Julia with a declarative language, taking some cues from Unison and automated/interactive theorem provers.

Julia’s dispatch system allows automatic use of more specific methods that are more computationally-efficient for particular problems — there’s potential here for software that grows better with time. The shortcoming of Julia’s approach: it’s up to subsequent developers to ensure semantic coherence of the set of methods that share a function name.

In order to ensure semantic correctness, functions must be identified by declarative specifications, like a Hoare triple. If this is done, then automatic substitution of methods can take place without causing errors. User-friendly names in the code are mapped to the function (declarative specification) using something like Unison’s hashing method; I call this hash table the ‘catalogue.’ Dispatch is generalized: the dispatcher receives the specifications and the argument types-tuple. It then select the method (AST) that implements the function (specification) for the specific types. There would be a ‘library’ of available methods, something like the ‘formal digital library’ of Nuprl or other theorem-provers. The library would include proofs (or some type of certificate) showing that methods obey the specifications associated to them. Annotations such as computational complexity could be added, to help the dispatcher select between various implementations according to performance.

You could program in either declarative or imperative style, when appropriate. In declarative style, you define a function using pre/post-conditions. If your specification matches an existing one (perhaps even partially), you could get a prompt with the name of the function(s). If not, you can assign a new function definition to the catalogue.

If the function is pre-existing, there are probably pre-existing methods also. In the case where there are no methods in the library matching a function definition & type signature, you could: revert to imperative programming and write the specific method, or invoke an automated (or interactive) implementation tool. The latter is made easier because the specifications of all existing functions are available.

The imperative approach proceeds as usual, but what happens at the end is different. Rather than the resulting AST being mapped directly to the user-friendly name, it is entered into the library along with the specifications it obeys and the type signature. (Also, either a proof or an ‘IOU’ stating that the new method satisfies the pre/post-conditions must be added to the library.) Then, in order to invoke the new method, a user-friendly name is associated to the function specifications in the catalogue. You then invoke the function by name, which refers to the specification, and then the dispatcher identifies the new implementation.

In theory, you never have to worry about regressions — it doesn’t matter if the dispatcher selects a different implementation (method) from the library in the future, because the specification ensures that the new method does the same thing as the old one. In fact, it’s likely that your code will improve in performance over time without you modifying it, because if someone comes up with a better implementation for your particular use case, then it will get used automatically. This is the real potential for ‘antifragile’ code.

In practice, this could go off the rails in two ways. First of all, the specifications might be incomplete. This would allow the dispatcher to substitute a non-equivalent method. If this occurs, it can be remedied by refining the specification such that it discriminates between the ‘old’ method that was giving the correct behavior, and any ‘new’ ones that do not. This provides a mechanism to progressively formalize your code — it’s not necessary to start out with a full formal specification. You can start out with something that works, written imperatively, and discover the specification as you go. This bypasses one of the major hurdles for adoption of formal methods for programming. Note that the refinement of the definition happens by the Unison mechanism, so that there are no unintended side-effects.

A second way regressions could occur: a library method doesn’t obey the specifications it claims to. This can happen if a certificate is accepted — demanding a proof eliminates this possibility. Allowing certificates is a compromise to ease adoption of the language — the proof can be supplied later (or, a disproof!). Proofs could exist to varying degrees of abstraction.

Other interesting side-effects of this programming system include the following. First, there is the possibility for automatic code ‘reuse.’ Imagine that you wrote a method to calculate the standard deviation of a set of numbers. You might have defined it directly in terms of summing over the squares of the numbers, normalizing by the cardinality of the set, and subsequently taking the square root. Later, you might define the ‘sum of squares’ function calling it ‘L2_norm’ for instance. If you return to the definition of the standard deviation, you could ‘refresh’ the representation of the definition, and the expression ‘L2_norm’ would appear in place of the longhand form. The method would have the same hash, because the AST would remain the same — the shorthand representation would be the only change.

Other benefits: program synthesis would become easier. The availability of the pre/post-conditions in the library of methods provides a synthesizer with a formalized semantics to grab onto. Human programmers could benefit as well from the ability to look at the list of properties satisfied by functions. Interactive tools like ‘semantic autocomplete’ for functions or methods might be possible.