Risk & innovation in science

Peter Shor made a remark on Sabine Hossenfelder’s blog about groupthink in physics:

It’s not just that scientists don’t want to move their butts, although that’s undoubtedly part of it. It’s also that they can’t. In today’s university funding system, you need grants (well, maybe you don’t truly need them once you have tenure, but they’re very nice to have).

So who decides which people get the grants? It’s their peers, who are all working on exactly the same things that everybody is working on. And if you submit a proposal that says “I’m going to go off and work on this crazy idea, and maybe there’s a one in a thousand chance that I’ll discover some of the secrets of the universe, and a 99.9% chance that I’ll come up with bubkes,” you get turned down.

But if a thousand really smart people did this, maybe we’d actually have a chance of making some progress. (Assuming they really did have promising crazy ideas, and weren’t abusing the system. Of course, what would actually happen is that the new system would be abused and we wouldn’t be any better off than we are now.)

So the only advice I have is that more physicists need to not worry about grants, and go hide in their attics and work on new and crazy theories, the way Andrew Wiles worked on Fermat’s Last Theorem.

He added:

Let me make an addendum to my previous comment, that I was too modest to put into it. This is roughly how I discovered the quantum factoring algorithm. I didn’t tell anybody I was working on it until I had figured it out. And although it didn’t take years of solitary toil in my attic (the way that Fermat’s Last Theorem did), I thought about it on and off for maybe a year, and worked on it moderately hard for a month or two when I saw that it actually might work.

So, people, go hide in your attics!

It’s true that many great innovations came about from working ‘in the attic:’ Einstein working as a clerk, Shannon developing information theory in secret at Bell Labs, J. S. Bell developing his theorem on his sabbatical, Shor’s work, and many more.  While it may be the best move for an individual scientist to do given the current system, it is a suboptimal solution from a societal perspective — and we should not take the status quo as a boundary condition! Here’s my initial response:

Like you, I’d lay the blame on the poor quality of science management we have, not the scientists. The problem is the evident risk-averse strategy being pursued, squashing innovation, combined with unwillingness to take an appropriate amount of responsibility the decision-making process on the direction of research. The solution is to replace some of the present set of bureaucrats with people (such as venture capitalists) who have the experience and temperament to manage high-risk, high-reward endeavors (which is exactly what science is). Doing bootleg research may be the best strategy for individual scientists to pursue innovation given the current climate, but we need to treat the root of the problem, which is the current climate preventing innovation.

My inspiration came from the following article: The ‘feel-good’ horror of late-stage capitalism. Here’s the gist:

In the feel-good feel-bad story, irrefutable proof of an institutional failure is sold as a celebration of individual triumph. And it’s the desperate, cloying attempts to trumpet the latter as a means of obscuring the former that gives these pieces their distinct, acrid aftertaste.

We don’t need higher wages; just have an amazing CEO give you his car! Who cares if you can’t support a family on one job? The fix is simple: Get two more jobs!

Shor’s remarks constitute the same refrain, transposed to a different key.  Our society is pushing risk onto individuals, when we should be socializing it, individual success stories to the contrary notwithstanding. We could benefit from a more risk-tolerant approach to science management, to supplement the more ‘business-as-usual’ approach.

I also take issue with Shor’s straw-man version of an alternative to the present system of grant review.  Presently the fox is guarding the hen-house, but the solution isn’t to just fling the doors of the coop wide open.  Instead, we need independent review from outside physics. Yes, this implies reviewers who aren’t likely to fully grasp the theory, which does create an information asymmetry and the potential for abuse — so it’s important to perform some due-diligence.

However, there will be dead-ends no matter what. Even the brightest of us may disagree about whether an approach will pan out — it’s research after all! The cost of funding a few wacky ideas along with one breakthrough may be worth it compared to the present approach of funding relatively staid approaches that almost certainly won’t result in a breakthrough.

I’m also concerned because the plausibility of claims of groupthink dovetails with the desires of ‘climate skeptics’ who would like to portray the scientific consensus on climate change as merely a result of a liberal echo chamber.  To be frank, it’s not too difficult for me to imagine how that could happen, although I think in the case of climate change the evidence is really there.

Jason Gorman on managing software complexity

Don’t Succumb To “Facebook Envy”. Solve The Problem In Front Of You
by Jason Gorman, Dec. 1, 2017
Gorman says the key thing for future-proofing code is not to anticipate what you may want to do with the code in the future and build in parts for just that purpose, but to make your code such that it is easy to change. This of course makes a ton of sense: you can’t tell what direction you are going to need to change the code in the future, or otherwise you would just write the code that way! So don’t do anything constructive in some particular direction: just use general principles that make your code easy to change.

Part of that means making it easy to understand, and reducing the amount of connascence that links all the parts together in a fragile way. Good code + good coders = adaptability.  That’s better than trying to make the code ‘robust’ from the start by designing it to do all the things out of the box. Don’t add features unless there is a known need. Once something gets into the code, it’s hard to eliminate it without complaints / dependencies get built on top of it. So write as little as possible from the get-go.

Other good points that Gorman has: My Solution To The Dev Skills Crisis: Much Smaller Teams

The way to deal with complexity of code is to break the functionality into appropriately-sized chunks with weak interactions through well-defined, limited interfaces. The chunk needs to be small enough that a single developer/pair can comprehend/build it. Breaking the chunks down too far and distributing over many people increases the costs of coordinating the people — this like trying to get 9 women to have 1 baby in 1 month.

On the other hand, if you assign too large a chunk to one person or set of people, the complexity will be too great to comprehend, and your developers will get bogged down. Adding new people to speed things up will not work because they will get confused and make mistakes. The key point is that the boundary conditions between chunks need to be aligned with the domain of responsibility of a sufficiently cohesive chunk of developers (probably not more than two). If you have too many people on a chunk, you effectively start to blur responsibility for changes. This gets really bad with a large chunk, b/c then people need to understand the changes made in other parts of the chunk, b/c they are all interlinked due to the lack of a interface boundaries. Also, they have to comprehend a larger system to make their own changes.

Not Gorman’s work, but the ‘mythical-man-month’ is evidently the assumption that the rate of progress scales linearly with the number of people working on a chunk of code, and the amount required scales linearly with the size of the code. This is obviously false.

What Do I Think of “Scaled Agile”?

Basically, Gorman is debunking a bunch of fads in software development management. This post touches on the subject of knowledge beyond the human scale, and also perverse incentives.

Methods like SAFe, LeSS and DAD are attempts to exert top-down control on highly complex adaptive organisations. As such, in my opinion and in the examples I’ve witnessed, they – at best – create the illusion of control. And illusions of control aren’t to be sniffed at. They’ve been keeping the management consulting industry in clover for decades.
The promise of scaled agile lies in telling managers what they want to hear: you can have greater control. You can have greater predictability. You can achieve economies of scale. Acknowledging the real risks puts you at a disadvantage when you’re bidding for business.  That is, there is money to be made helping people stay in denial about unpredictability. Well, that’s nothing new: look at religion.


Iterating is THE Requirements Discipline

When we iterate our designs faster, testing our theories about what will work in shorter feedback loops, we converge on a working solution sooner. We learn our way to Building The Right Thing. … So ask your requirements analyst or product owner this question: ‘What’s your plan for testing these theories?’ I’ll wager a shiny penny they haven’t got one.

Another idea I’m getting from Gorman’s blog: the idea that user requirements are dumb. If you want to intelligently solve the user’s problem, you can’t expect them to explain it to you like you are a computer yourself — precisely, that is. You’ve got to grasp the concept, put yourself in their shoes. Because humans can’t communicate the way machines do: they communicate by inference, not by specification, due to bandwidth limits. In principle, you could use a ‘POV gun’ to inject someone with your perspective, but we aren’t there yet.

This is exactly the same problem as trying to teach AI how to solve problems in a way that humans would find acceptable: it’s going to fail unless the AI figures out how to read human minds via inference, because human communication just isn’t up to the task of transmitting that kind of information. Making your code easy to change is mandatory, because you are going to develop it iteratively, rather than monolithically in one go, because you need to test hypotheses about users’ requirements experimentally — you have to implement a thing in order to see if that’s what they wanted. Until they have an implementation in front of them, you can’t get at all the requirements.  Evidence-based business.

Software Craftsmanship is a Requirements Discipline

Try as we might to build the right thing first time, by far the most valuable thing we can do for our customers is allow them to change their minds. Iterating is the ultimate requirements discipline. So much value lies in empirical feedback, as opposed to the untested hypotheses of requirements specifications.

Crafting code to minimise barriers to change helps us keep feedback cycles short, which maximises customer learning. And it helps us to maintain the pace of innovation for longer, effectively giving the customer more “throws of the dice” at the same price before the game is over.

It just so happens that things that make code harder to change also tend to make it less reliable (easier to break) – code that’s harder to understand, code that’s more complex, code that’s full of duplication, code that’s highly interdependent, code that can’t be re-tested quickly and cheaply, etc.

And it just so happens that writing code that’s easy to change – to a point (that most teams never reach) – is also typically quicker and cheaper.


Understanding and beyond

This is another ‘compendium’ post… mostly just musings and interesting articles I found on the topic of what happens when we deal with scientific knowledge that is too complex for any one person to entirely comprehend. Is it really knowledge, then?

Considering the ‘science market’: http://michaelnielsen.org/blog/science-beyond-individual-understanding/
The market can be ‘wrong’ (ie, bubbles), is compared to similar effects in the scientific community (fads in research).  Also, unlike central planning, the market doesn’t require any one actor to have the task of knowing the supply and the demand and making them meet; the market price, to the extent that it exists, serves as a kind of force balance in an active optimization problem. The price is a nonlocal variable that balances everything. Interesting to think about.

Here’s an example of something similar in computer science: formal verification of a (minimal!) Linux kernel.  Very cool stuff.  Basically, they used a semi-automated, interactive theorem prover to demonstrate that the kernel does what it is supposed to, and is also secure.  (Lots of caveats, of course, about assumptions, but still really impressive.  Also, astonishing figure: typical cost to check mission-critical code: $1000/line.  Software seems easier than it actually is.)  They make the point that a Linux kernel, when considered as a mathematical object, is orders of magnitude more complex than the structures that mathematicians like to study.

What do we mean by physics ‘intuition’ and ‘insight’?
This article makes me think about the mind poised in a metastable state in the free energy landscape of the brain, and then some thermal jitter lobs you over the threshold and down into a new, better, minimum. I wonder if this can be seen in jumps in training performance in neural networks?

As for sudden recognition of a pattern that had not been identified before, it indicates a relaxation process where something that was highly surprising becomes unsurprising *given the identification of the new pattern as a pattern (as such).* The idea is that you can shorten the description length by saying this is an instance of pattern X, with (y,z) parameters. In other words, a pattern is an identifier (state) of some new state space (which, at a minimum, distinguishes between things that are representatives of this pattern and things that are not). Then, the surprisal of the world (when it is in fact instantiating the pattern) given that this variable takes on the value is_pattern_instance=True, is much smaller than the unconditional surprisal, which means the variable is_pattern_instance is highly informative about whether the world is in this pattern.

This would explain away the surprisal of the world when it forms the pattern. At the meta-level, this explains explicit coding: whatever variable takes it upon itself to become the carrier/embodiment of the information element is_pattern_instance needs to be more-or-less coherent & integrated. You couldn’t have such a condensation/reconnection if the input data was partitioned and sent to two or more non-interacting subsystems.

Seems like experiences of unconscious/subconscious processing are direct experiences of neural networks at work. Intuitive hunches are predictions that may/may not be correct, but they are fast because they are neural-net-approximations. Developing intuition means training your neural network on things it isn’t used to. Understanding means going beyond intuition to use your explicit reasoning/language facilities to step through the inference process, which can be more accurate & flexible. We might be at the verge of understanding what it means to think.

Modeling and analogy: seeing a phenomenon as like an instance of a pattern can lead to trying to reason about it (or just having expectations about it) based on the behavioral properties of the pattern. Perhaps there is an expansion of the pattern: “It’s like a field, but without commutativity!” or “It’s like a set, but with a definite order!” Either a generalization, or specialization.

Is the goal of science understanding (having the ability to simulate or mentally manipulate) or modeling (having a tool that can produce the answer)? What happens when, for instance with chaos theory, you get into territory where you have to chose? You use the tool to produce answers, and then you start developing an intuition about the results of the tool, and try to see patterns. Similarly with QM or stat mech. At some point, you may develop new intuitions.  Whether you have understanding or not, I’m not sure.

I would say that human understanding/knowledge has been the goal of science, because there have not been any other intelligent systems capable of processing information. Modeling capability and human understanding have been degenerate up to the invention of computers.  Knowledge consists of two parts: information in a specific form combined, with an intelligent system that is capable of reasoning with the information in that form.  An appropriate automated reasoning system could then be said to possess knowledge, to a limited degree.  I suspect that in the future, pragmatism will prevail: why should we discriminate against non-human knowledge?  Especially if it gets to be as much or more useful than human knowledge.  Two apt quotations along these lines:

Since computing has become cheaper than thinking, the reader should not be afraid to use a simple tool and pound the problem into submission.

— John F. Monahan, Numerical Methods of Statistics


The question of whether Machines Can Think… is about as relevant as the question of whether Submarines Can Swim.

— E. W. Dijkstra (1984) The threats to computing science

Here’s another good article along these lines that summarizes some of the points that had been flopping around in my head, and also makes a connection to the necessary evil of education in science: “The End of Science”: Can We Overcome Cognitive Limitations?

The entropy fallacy

Focus: Why We Can’t Remember the Future
Philip Ball, May 2, 2014:  Physics 7, 47

The author contemplates a canonical though experiment: a gas is isolated in one of two chambers connected by a narrow channel with a rotor/turnstile, which is initially held fixed. The other chamber is evacuated. The claim is that entropy increases when the rotor is allowed to turn so that particles can move from one chamber to the other. Entropy does increase, but the illustration is totally wrong:

To see this, recall that the molecules started mostly in the left-hand chamber and are gradually equalizing their numbers on both sides of the rotor. Imagine “running the movie backward” (according to Newtonian equations) from the future reference time to the readout time and seeing the molecules collectively move back toward the left-hand chamber. That extremely improbable event can only occur from one very specific arrangement of the molecules at the future time. If, before running time backward, you made any small changes, say, in the molecules’ positions, new collisions would occur during the time reversal that would rapidly set the system on a completely different course. The molecules would take the much more probable path of equalizing the populations and would not get close to the original state of the system at the readout time.

This is wrong. There’s not just one possible configuration that, when run forward, would result in all the particles being shoved into one half of the device.  For every possible configuration with all the atoms in one half of the device, if there is a forward track that achieves approximate equipartition, then there is a path backwards from this supposedly ‘equilibrium’ state to a very-non-equilibrium state. The densities are the same.

The odds of finding the gas in one half to start with are just as low, a priori, as the odds of finding the gas in some state that is eventually going to spontaneously move into the left chamber — but these states are instantaneously indistinguishable from ‘random’ states. The reason it doesn’t seem that way intuitively is that we do have ways of sweeping all the gas from one side to another (piston) but we don’t have methods of setting up the positions and velocities so that the particles will all propagate from equipartition toward concentration on one side, or of measuring if such a state has been handed to us. So it’s really a bias in our way of thinking based on limitations in measuring/manipulating. We can produce certain kinds of highly (a priority) improbable states easily, but not others. So we don’t see how both are equally improbable on the basis of the fractional number of states occupied by either situation.

On a related note: suppose I send you a list of coordinates and velocities for the particles. It appears random. Are they random? Answer yes or no. Then I tell you that if you run the simulation forward for 10 seconds, all the particles wind up in the left half of the device. Are the values I sent you random? Answer the question again. The fraction of phase space occupied doesn’t change.  The configuration hasn’t changed, but your perception of it as random has been erased by the new information I gave you — the perception of randomness is subjective, because we prefer macroscopic patterns to microscopic ones.

To return to the correct explanation for the entropy increase, I’ll refer to a previous post: Statistical and quantum mechanics from information theory.  The entropy that is created is not microscopic entropy (which is neither created nor destroyed, because volume is preserved in phase space).  The information isn’t destroyed: you still know that the system is in a very particular subset of states — those that, if propagated backward, return to having all the gas in one chamber. The information just hides.  It gets converted from macroscopic degrees of freedom to microscopic ones.  We can’t use it unless we can compute the trajectories and then make predictions for the microscopic state, or reverse time.

There are a couple interesting counterexamples to this general behavior: systems that appear to ‘randomize’ but then spontaneously return to an organized state.  One example is plasma wave echos, where a damped-out wave reappears later, because the information was still concealed coherently in the motion of particles in phase-space.  Another is the “discrete cat map:” a method for scrambling images that eventually unscrambles the image again, if you apply it enough times.   (I couldn’t very well not mention this at some point, given the name!) Basically, these systems live in a restricted space that eventually returns them to their initial configuration — there are extra conservation properties that are not obvious at first glance.  These systems are the ‘exceptions that prove the rule.’

Consequences of missing a rung on the ladder of abstraction

I’ve been inspired by another one of Bret Victor’s excellent presentations: Up and Down the Ladder of Abstraction. The gist: it helps to be able to visualize/manipulate a model simultaneously at multiple levels of abstraction.  I’ve been using the concept of a “ladder of abstraction” to think about a number of topics.  In particular, the consequences of “missing a rung” on the ladder — generally resulting in an accident in both the real world and in the metaphorical one as well.  Basically, the failure to distinguish between rungs on the ladder leads to mixing of different behavior into a single layer of implementation, killing modularity and creating incidental complexity.

Case 1: Digital scientific notation

Konrad Hinsen’s idea about distinguishing computable models from implementations distinguishing computable models from implementations is essentially recognizing we’ve been missing a rung from the ladder of abstraction.  The core issue is that we don’t really (yet) have a tool for creating representations of that are intermediate between the informal-yet-highly-abstract mathematical notation as used in scientific papers (ie, LaTeX equations) and the actual implementation in a formal imperative programming language.  Therefore, we mix implementation details specific to a particular programming language along with refinements like specifying boundary conditions, discretization schemes, algorithms, etc.  This prevents these other refinements from being abstracted out of the particular representation (Python, C, etc).  If we were to write out these refinements in a portable representation, it could be extremely powerful.  It would create a degree of modularity between models and computer languages: the same model could then be implemented in multiple languages, and any given language would also be able to incorporate models originally implemented in any other language!

Case 2: multiple dispatch & subtypes

The Julia programming language has a really interesting type system: instead of using inheritance, the subtype/supertype relations define a ‘type lattice’ amongst ‘abstract types’ which may have ‘concrete types’ as implementations.  The type system, combined with multiple dispatch, allows you to do some really cool things, in particular some tasks that I found intractable in Python/C++.

Here’s the case I was working on: adding units (like meters per second) to numbers and adding coordinates to arrays (that is, naming the dimensions and associating 1-D coordinates for each dimension) in Python.  The xarray package defines arrays with coordinates, and the pint package wraps any multidimensional array with units.  Unfortunately, they cannot be used together: the arithmetic operations defined for pint don’t know how to handle xarrays, and vice versa. In order to make these two interoperate, one of them would have to ‘know’ about the other, which creates a nasty connacesent mess full of wrapper functions. (The MxN problem.) This is a direct consequence of object-oriented programming: methods belong to one class, and Python’s lack of a way to talk about super/sub-types — Python relies on duck-typing, which basically eliminates the ability to control types effectively.

Julia has no such problem: the AxisArray package and the Unitful package cooperate straight out of the box, without ‘knowing’ about each other, because the dispatcher handles finding the correct methods for arithmetic based on the types of the objects in question.  Very cool.  (Also, broadcasting is built into Julia more natively than Python, so there’s no problem here either.)

Case 3: design patterns

A design pattern is basically an abstraction that it is not possible to (effectively) express within a programming language, and which therefore has to be expressed outside of it, in style guides or in examples in books.  This causes problems because it becomes impossible to reuse the abstraction, because it is mixed up inside the implementation details of each particular instance.  The existence of a design pattern requires a bit of inductive inference to detect: programmers keep needing to implement some feature that the language doesn’t provide.

Here’s a nice summary of the problem (“Revenge of the Nerds” by Paul Graham, 2002):

 [T]here is a name for the phenomenon, Greenspun’s Tenth Rule:  “Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp.”

If you try to solve a hard problem, the question is not whether you will use a powerful enough language, but whether you will (a) use a powerful language, (b) write a de facto interpreter for one, or (c) yourself become a human compiler for one.

The entire article is a really fun read, by the way.

Perverse incentives & unintended consequences

This post is mostly a compendium of interesting articles I’ve read lately on the theme of unintended consequences of the incentive structure for various human behaviors (science, business, security) or even for machine behavior.

The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities by Lehman et al, 2018.

Researchers select fitness measures for artificial evolution or genetic algorithms, in order to promote certain behavior.  Sometimes, the computer finds a loophole or otherwise ‘games the system’ by exploiting a gap between the behavior that measure was supposed to be a proxy for, and the behavior that actually satisfies the measure.

Faulty Reward Functions in the Wild by OpenAI blog, Dec. 2016

In a racing-type game, the machine learned to go in a tight loop, hitting the same mile-markers repeatedly to rack up points.  In this way, it outscored even human drivers who followed the intended course. One view is to say that this is ‘cheating,’ or failure of the machine to understand the intention of the game. Another way to look at it, though, is to realize that the machine was more creative than humans in this case, who were bound by their preconceived notions about how to win a race, and weren’t able to think outside of the box.

[Edited to add this on 2018/08/25.]

The natural Selection of Bad SciencePaul E. SmaldinoRichard McElreath in Royal Society Open Science, 2016. 

The basic premise is that, despite admonitions of statisticians toward scientists that it is important to use large sample sizes, two factors prevent it.  (1) Small sample sizes are easier.  (2) Small sample sizes are more likely to lead to spurious positive results which are ‘statistically significant’ according the p-test.  Because positive results are more publishable than negative results, scientists who get positive results get published and promoted faster than those who do not; therefore, the scientific community evolves in this direction, either because scientists learn that this method is effective, or because the ones who resist are more likely to be outcompeted.

As an aside, it seems to me that even if the results turn out not to be reproducible, it tends not to matter because (1) people tend to not report negative results because they are harder to get published, so it may not come to light (2) often times it’s not worth the effort to attempt to reproduce someone’s results because this isn’t considered as meritorious as a ‘new’ result.  One thing that can happen is a result becomes well-established despite being mistaken; then, there is an incentive to overturn it, creating a ‘paradigm shift’ and ‘rewriting the textbooks,’ which looks good for whoever can pull it off.  But it’s not worth doing until the result is well-established — refuting an early-stage result isn’t ‘sexy’ enough.  This is pathological. It makes for very erratic science.  I suspect that part of the reason people tend to (rightfully) ignore headlines like “Scientists show that coffee actually is good for your heart!” which comes out about 2 years after a headline to the opposite effect.  (The other part of the problem is bad journalism, obviously.)

Why the Database Masters Fail Us, Jesper Larsson,Oct. 3, 2008.

The article is about why mathematical research in database theory hasn’t be very useful for practical applications.

According to popular view, scientists, or more specifically academic researchers, have the task of advancing human knowledge.  … Unfortunately, it does not quite work that way.

The primary concern of most academic researchers is to produce papers… In daily work, the immediate focus of most researchers is to impress peer reviewers. …[Consequently] Researchers tend to pick subjects that are currently in fashion, for which papers are in demand.

[Secondly,] Writers use language meant to be understood and judged as appropriate by other people like themselves.  …[P]ublications often seem impenetrable or irrelevant to people outside the field.

Third, once the papers are published, the work is finished as far as the researcher is concerned. Few researchers bother to take their findings any further.

Here’s a blog post about the negative unintended consequences of some of Google’s management practices: Why I quit Google. To summarize, there was a heavy emphasis on quantifiable measures of productivity as the key to getting promotions/raises, which lead to:

(1) Gaming the system by subdividing every task into tiny pieces to inflate the number of tasks completed in the task system.

(2) Failing to do things that support others because altruism is penalized.

To the promotion committee, my teammate’s project was the big, important work that demanded coordination from multiple developers. If they hornswoggled me into helping them, it’s evidence of their strong leadership qualities.

(3) Failure to add tests for bugs, because this increased the amount of detected bugs, which looked bad (even though it lead to elimination of bugs).

My quality bar for code dropped from, “Will we be able to maintain this for the next 5 years?” to, “Can this last until I’m promoted?”

(4) Separately, there was also a tendency to treat people as interchangeable from project-to-project, which effectively penalized the workers, not the managers, for inefficiency caused by switching people between projects.  This is an instance of the principal-agent problem: the people who make the decisions don’t have to face the consequences.

Tim Cook on quarterly earnings in the NYT:

Why would you ever measure a business on 90 days when its investments are long term?

Great question. My answer: because people are obsessed with quantifiability.  It’s a form of scientism, or scientifishness as I like to call it.

The No. 1 problem with computer security

Computer security policy focused on making people feel more secure instead of actually becoming more secure:

For example, in most environments, two attack vectors account for 99 percent of all successful attacks: unpatched software and social engineering. But instead of defending our environments in a risk-aligned way, we concentrate our efforts on almost everything else. [Scary-sounding contingencies that aren’t very likely.]

In general, most of us fear the wrong things too much. For example, most people fear dying in a plane crash or by being bit by a shark far more than they do the car ride to the airport or the beach, though the car ride is thousands of times more likely to result in serious injury or death.

For example, the typical threat or vulnerability matrix report will tell you how many malware programs your antimalware program detected and cleaned and how many unpatched vulnerabilities a vulnerability scanner found. This is mostly useless information.

A far better metric is how many malware programs your antimalware software failed to detect and for how long. Now that’s useful.

Worrying what an attacker did once they domain admin credentials were obtained is like worrying about your brakes after your car was stolen. Only by better protecting the left flank and preventing the car’s theft in the first place can you begin to make a better defense. Everything else is accepting defeat.

Security theatre for information security.  Basically, it reflects a bug in human nature.

Statistical & quantum mechanics from information theory

If you are like me, it is a relief to read a quote from a famous physicist who claimed not to understand statistical mechanics either:

“Thermodynamics is a funny subject. The first time you go through it, you don’t understand it at all. The second time you go through it, you think you understand it, except for one or two small points. The third time you go through it, you know you don’t understand it, but by that time you are so used to it, it doesn’t bother you anymore.” -Arnold Sommerfeld

Source: Angrist, Stanley W. and Helper, Loren G. (1967). Order and Chaos – Laws of Energy and Entropy (pg. 215). New York: Basic Books.

My impression of the “derivation” of thermodynamics in an undergraduate-level textbook was much as though someone had started building a house by nailing shingles up in mid-air.  Then, the author suspended the roof, walls, and foundation, all hanging from the shingles.  Miraculously, the foundation was at precisely the right altitude to rest on the ground.  One suspects that this is not, in fact, how houses are built, precisely because of the magnitude of that coincidence. This kind of exposition seems to result from an author who has become accustomed to something without really understanding it.

However, E. T. Jayne’s paper “Information Theory and Statistical Mechanics” explains everything!  Statistical mechanics is comprehensible, and even intuitive.  I urge you to take the time to read it.  Here’s my summary:

Assume there are a set of quantities with known values (energy, particle number, fraction of particles in one end of the apparatus, etc) that apply to a mechanical system at some initial time. We would like to say what properties of the system we can be sure of at some later time.  The key idea is that only information about conserved quantities is guaranteed to be true at later times. (For instance, energy is conserved for a cloud of gas trapped in a cylinder with perfectly reflecting walls, but the number of particles in the left half of the cylinder is not conserved.)

To make predictions about the far-future of the system, forget any information you have about non-conserved quantities.  Then, construct the maximum-entropy (ie, closest-to-uniform) distribution for particles in phase-space, which is consistent with the conserved quantities being fixed at their known values.  This is equivalent to letting any non-conserved quantities ‘relax’ in order to reach ‘equilibrium.’ (In this case, the energy & particle number would be conserved, but the fraction of particles on one side of the cylinder is not assumed to be conserved.)  This relaxation/forgetting, by the way, corresponds to ‘entropy production’ because one is losing relevant information and increasing uncertainty. Equilibrium statistical mechanics is powerful because it skips the details of the relaxation process and jumps to the asymptotic limit where any information about non-conserved quantities becomes irrelevant.

The justification for using the maximum-entropy as a criterion for choosing a probability distribution is this: assuming a distribution with more-than-minimal entropy is equivalent to assuming some information beyond the knowledge of the conserved quantities.  Therefore, in general, this extra information will be more likely wrong than right, because it concerns some non-conserved property.  The maximum-entropy distribution is the least-biased one (in the statistical sense of bias of an estimator), given whatever constraints one has.

I started wondering if the same situation applies with quantum mechanics. Lo and behold, I stumbled across an article on pilot wave theory. Here’s the gist: you can separate Schroedinger’s complex wave equation into two separate equations, one of which describes a wave, one a particle.  The particle has a definite (but unknown) trajectory, which is the so-called ‘hidden variable.’  The trajectory follows Newton’s laws with the addition of a ‘quantum force’ term resulting from the wave, which causes things like the two-slit interference effect.  The wave, in turn, responds to the particle (but in an explicitly non-local manner, due to Bell’s theorem!).

The interesting bit is that one can do away with the wavefunction collapse, because the particle is always localized.  The probability current for the particle can be shown to have the following property: if the position is distributed initially according to Born’s rule then the distribution will evolve in such a way as to satisfy Born’s rule at all subsequent times (see this preprint.)  The initialization of the probability distribution is then of utmost importance, because a failure of the probability to obey Born’s rule would allow the uncertainty principle (and therefore, causality) to be violated.

One could adopt Jaynes’s statistical justification for the equilibrium hypothesis.  That is: (a) quantum interactions are such that we (as macroscopic systems) can only obtain a limited amount of information about a microscopic system where quantum effects are important (b) given that amount of information available from measurement, the maximum-entropy distribution turns out to be the one given by the Born rule (ie, the wavefunction is defined by the available information and nothing more). The implication is rather disturbing: causality is only guaranteed by ignorance.  Or, alternatively, it’s an ’emergent’ property for macroscopic systems.  I’ll take this up again later.

There’s a subtly that I have glossed over: the maximum entropy distribution is well-defined only for a discrete phase-space.  In a continuous phase-space, in order for the entropy to be coordinate-system-independent, a measure or density of states needs to be defined.  This metric effectively selects out a preferred parameterization of phase-space.  In typical stat-mech, a Cartesian coordinate system in space & velocity is used, with the sole justification that it works.  There’s no reason that, say, the density of states might be uniform in spherical polar coordinates or something.  This was bothering me in grad stat mech, but I didn’t have time to figure it out.  The answer seemed to be, “well, it works, get on with the calculation.”

Jaynes has a method to address this, which goes beyond maximum-entropy (necessarily).  The idea is that if we translate the problem into an equivalent one by some change of coordinates, then we should get the same answers to all possible questions — in particular, the measure needs to have the same form.  Translation & rotation invariance requires a measure which is uniform in Cartesian coordinates.  As a proof by contradiction, imagine a measure that was uniform in spherical polar coordinates (which would cause a radial dependence, centered on the origin).  This is obviously not translation-invariant.