Mathematical notation is broken

Having spend the last ten years teaching students mathematical notation (while simultaneously teaching the mathematical concepts described by these symbols), I have often reflected on how efficient and amazing it is, and how unfortunately broken it often is.

Some notation shows off some of the power of mathematical thinking (for example, algebra), but some notation has clearly not been designed for clarity. In fact, my suspicion is that much of mathematical notation has been invented to save space.

Of course, a reason why one might one want to save space with mathematical symbols is because paper used to be expensive but I suspect this is not the main reason mathematical symbols are so tightly packed with information. It is also time-consuming to use more clear mathematical notation, and mathematicians love to be concise. In fact, I have often noticed that mathematicians often equate the length of a mathematical proof with its elegance, which over time may have supplied pressure to reduce the notation used to describe these proofs. A few mathematicians have contributed heavily to mathematical notation, most notably Leonard Euler, and these few mathematician’s desire for brevity has defined the notation we use today to communication mathematics.

Look at sigma notation for example. What does the letter sigma from the Greek alphabet have to do with finding sums of things? Absolutely nothing as far as I can tell. According to Dave Radcliffe, Sigma (∑) is short for summa (probably because they start with the same sound), which is the Latin for sum. Euler invented the symbol to use for summation, and we’ve been using it ever since. Essentially, we are using ∑ to mean sum for historical reasons.

Summation from 3 to 6 of i^2

The portion of this equation to the left of the leftmost equals sign is summation notation, which I have taught for years. I usually have to spend a class, sometimes two, explaining this specific set of notation. The brevity of the summation notation contributes little to the comprehensibility of this statement. It is essentially equivalent to the following:

Summation (i, 3, 6, i2) = 32 + 42 + 52 + 62 = 86

Unfortunately this notation requires us to memorize the order of the parameters in the summation function, but this is functionally the same as the previous notation, except one more piece of information is given to us; we know we will be doing a sum of some kind without having to memorize the meaning of sigma. With some work, we may be able to improve upon this notation more, and provide even more clarity.

Summation (index: i, start: 3, end: 6, function: i2) = 32 + 42 + 52 + 62 = 86

This notation is somewhat more clear the second option I suggested, since the parameters are defined within the notation. It is significantly longer to write than the original notation (takes up twice as much space) but it has a huge benefit of being significantly clearer. Further, one could imagine that if I were entering this notation into a computer, that the autocomplete function (which is common to code editors) could suggest parameters for me, as well as show me the definition of the parameter as I enter it. Finally, this notation is similar to how we define functions in computer programming (in some languages), and so when we teach mathematical notation, we will also be giving our students some ability to read computer programming code.

This issue about notation is not a trivial concern. The notation used to explain mathematical ideas is often a barrier to some students learning how to communicate mathematical ideas. Quite often students (and sometimes teachers) confuse learning notation for learning mathematics. 

Furthermore, notation which is excellent on paper may be somewhat less useful on a computer. I have spent many hours looking for solutions to make adding mathematical symbols to websites more convenient and have discovered that there is no easy way to do this. Every method has drawbacks, and no method is as convenient as adding the same symbols to paper. My conclusion in terms of using mathematical notation with computers is that one of two things (or both) will happen. Computers will develop more touch senstitive interfaces, and software developers will create software that recognize the current mathematical symbols, or we will start to change mathematical notation to be more easily inputted into a computer.

The one huge advantage of our current notation is that it is somewhat universal. Essentially the same notation is used around the world, and by choosing a more amateur friendly notation, we will be creating localized versions of the notation for each language which is obviously problematic. In a computer, this is easily resolved by making the names of mathematical objects translatable so that whomever is viewing a mathematical document can select their language of choice. In print, this is more of an issue, and so we should reluctantly continue to use our existing notation until we have more fully transitioned from our traditional print medium, but the more we use computers to communicate mathematics, the more likely it is that we should fix mathematical notation.


Here are a couple of critiques of this post: 



  • Kind of off topic with regards to what this post is about, but here’s another notation that I would suggest:

    sum(3, 6, (i -> i^2))

    When it comes to defining the index, you do so directly inside the function definition.

    Now look at the function definition itself.

    Looks ugly? Maybe. But it’s something that I have been working with when I’ve been programming lambdas in Python, and functions in CoffeeScript. Its notation looks very similar to the basics of lambda calculus.

    And if you teach your students the notation that I used here, they’ll already start learning some basic type theory, where functions can be tossed around here and there just like numbers, sets, and tuples.

    And type theory is one of those things that I find students overlooking all the time.

    A classical example is this:

    “In the equality f(x) = x + x, what is the function?” And students will respond f(x), which is wrong. f is the function, while f(x) is the expression.

  • David Wees wrote:

    Yeah, I knew there would be some more efficient way of expressing the notation, and you’ve certainly described it. The index is important because quite often we use summation notation on sequences where there is more than one variable.

    Typing is an important concept, and one I have struggled to explain to my students each year. They get the idea of whole numbers fairly easily, and they understand fractions, but decimals seem to be a bit hairy. I quite often find students who think that sqrt(2) is rational because the output they get on their calculator terminates. Anything we can do to help students get more opportunity to develop number type sense is worth it, IMHO.

    Also, a small quibble, but I would argue that f is the notation we use to denote the function, the function itself is the rule that we apply to any object. f is the label for that rule, and x is the index we use to help understand how it works. Still, I doubt many students could articulate that difference as you have suggested.

  • “Also, a small quibble, but I would argue that f is the notation we use to denote the function…”

    Oops, yeah, that’s what I meant.

    Speaking of which, students can also learn that there isn’t any need for the label f to begin with!

    Here’s an example:

    (x -> x + x)

    Now that’s a function right there. An anonymous function to be exact.

    And then, of course, we can assign that to f, like so.

    f = (x -> x + x)

    which is equivalent to f(x) = x + x.

  • I find it that the summation notation is a syntax used among people that have really, really delve deep into the field of mathematics.

    But for people that just want to express a summation, just for the sake of expressing, it should never hurt to do so in writing, and making up their own notations, if they wish.

  • Perhaps you could have pointed to a better example of “broken notation,” but I am not convinced that there is such a problem. Yes, the sigma notation is difficult for some students, but anyone who would actually end up using it would not have much trouble learning how it is set up. It is obviously useful in higher math, but it could be (and often is) left out of the high school math curriculum except for college-bound courses.

  • David Wees wrote:

    Well fair enough, I’ll see if I can find an example that is more likely to turn up in more schools. In any school that teaches calculus to all of their students (which as it turns out, we do), this issue may come up, but yes, summation notation is more typically covered in a college setting for the first time.

  • I agree with Greg. You might want to consider how math is unlocalizable. Why? Because it doesn’t need to be localized!

    You can say that the notations used to express a lot of the notions in math is kind of a language (although, not really). Now imagine if everyone started using their own notations objectively. This will be chaotic.

    The capital Sigma notation is going to remain forever, and there is nothing one can do.

    Personally, I don’t find it anymore difficult than the “function” notation that you proposed. And I am fairly optimistic people learn fast once they really dive into it.

  • Just to play devil’s advocate, you could make the argument that instead of writing ‘squared’ it might be clearer to write ‘multiplied by itself’, or even ‘added to itself the same number of times as the number itself’ – but language, like maths notation, is about expressing things as clearly and succinctly as possible.

    For an example of really broken notation, I’d nominate the fact that cos^2(x) means (cos (x))^2, but cos^(-1) (x) doesn’t mean (cos (x))^(-1).

  • David Wees wrote:

    Yeah, when I was righting this post the first time, the point I was trying to make was really clear in my mind. Now, I’m less sure of my point, or I feel that I have not articulated my argument well enough.

    Your point about cos2x meaning something completely different than cos-1x is probably better than my example. I wonder how many other examples are like this one?

  • Yes, mathematical notation is broken. In the late 1980s there was a fashion for something called Z, which was a style of mathematics used for specifying computer systems. Although Z has not made a huge impact on the world of computers it is still a great way to write mathematics. In particular, it has a great style for writing functions and a great way to show the types of mathematical objects. Applying Z thinking to school mathematics reveals some oddities.

    Take the so-called ‘expectation operator’ used in statistics and the mis-named ‘random variable’. A random variable is a function from outcomes (in an experiment) to real numbers. It is useful to do this so that outcomes become numbers and you can take averages and so on. However, this kind of function is not random, and it’s not a variable either. It’s a function whose type is something like OUTCOME –> REAL.

    The expectation of a random variable is then written as E[X] (using square brackets in some books for reasons that are never explained!). What is the type of E? Again, it is a function and it returns a real number, but what is it’s argument? X is of type OUTCOME –> REAL and contains no information at all about the probability distribution of the outcomes. That means it does not supply what is needed for calculation of the expected value.

    After some hours of internet research I still don’t know what to suggest as the correct type for E, but two candidate types emerge, representing three possible interpretations:

    E : (REAL –> REAL) –> REAL
    E : (P REAL –> REAL) –> REAL

    This needs a bit more explanation to make sense. The first is the idea that E takes as input a probability distribution (another function in fact) that takes individual real numbers as input and returns either a probability or a probability density. In usual terminology, the inputs to E are either a PMF or a PDF.

    The second idea is that E takes as input a probability distribution of a different type, one that takes as input a set of real numbers and returns a probability.

    All these possibilities would supply enough information for E to do its job.

    I suspect that the first type is the correct one (if there is such a thing) and that in fact E is ‘overloaded’ as computer people call it. This means that the name E is being used for two different functions with analogous actions, one for PMFs and one for PDFs, and which is used has to be divined from the context.

    Yes, mathematical notation is broken in many places and it starts to hurt people at school.

    What can be done? The first thing to understand very clearly is that there is no need for a universal agreement on better notation. We don’t need an international standard, for example, for beneficial change to occur.

    Each person can decide, on their own, what notation they will use. Most of the howlers in maths can be replaced by notations that are already in use and, usually, that are so easy to understand that little or no explanation is needed when writing for others.

    Unfortunately, one still has to deal with confusing notation by others.

    However, that leads to one of the main reasons a person might individually decide to clean up the notation they personally use: it gives you an edge. Your writing becomes easier for others to read and understand. You make fewer mistakes yourself. Translating ambiguous notation into nice notation is a great way to understand new mathematical ideas and techniques.

    So, if you want to reform the notation you use, then just start, but take care to choose notation that’s going to be easy for other. Most of this is about avoiding bad ways to write things.

    For example, with expectation, write E[Px] instead of E[X], where Px is already introduced and defined as a probability distribution (whose type is REAL –> REAL) for a variable called X. This idea is well established in probability theory and will be the probability distribution ‘induced’ by the random variable (and the underlying probability function on the sample space). So, it’s ‘legal’. It’s also quite intuitive that E would work on a probability distribution, and if you’ve introduced the distribution in a nice, simple way then your readers do not need to know the stuff about ‘induced’ distributions and you need never mention the phrase ‘random variable’ at all.

    A few years ago I wrote a long guide to writing maths well and section 5 focuses on this kind of issue. If anything, when I wrote that article I had underestimated the extent of broken notation.

Leave a Reply

Your email is never shared.Required fields are marked *