Scientific prose

Having recently moved country, with all the attendant difficulties imposed by the language barrier, I have begun to think a little about communication, how we inform one another of our thoughts and make ourselves understood. As a scientist I am particularly interested in the style of writing in scientific prose, since this at the same time needs both to convey very complex, novel ideas, and to be understood by the international scientific community, many of whose members do not have Engish as a first language. I also feel that, ideally, a good scientific paper should be at least somewhat understandable to the interested layperson, particularly in areas of science such as medicine where personal, corporate and state decisions will be based on published research. I do confess, however, that when writing my papers I write for scientists and not the general public!

Personally I can find reading scientific papers, even within my own field, more challeging than many other styles of English. Why is this? I can think of at least four explanations:

  1. Unfamiliarity of language. If I am informed that “the slithy toves did gyre and gimbol in the wabe1“, I don’t have a clue what any of those words mean. (Although, curiously, I can identify their parts of speech…)
  2. Unfamiliarity of concepts. I know what the words “closed”, “metric” and “space” mean in everyday parlance, but it takes a bit of effort to recollect from my undergraduate days what exactly a “closed metric space” is.
  3. Complexity of language. Scientific papers typically contain many long complex and/or compound sentences such as this: “We show that when mass loss is slow, systems of two planets that are marginally stable can become unstable to close encounters, while for three planets the timescale for close approaches decreases significantly with increasing mass ratio.” (Debes & Sigurdsson, 2002) Here a short main clause “we show” introduces a huge subordinate clause 35 words long, which itself is composed of three nested levels of clauses.
  4. Complexity of concepts. The above quotation is describing the effects of a star losing mass (as happens before it becomes a white dwarf) on any orbiting planets. Whether these planets are stable depends on their masses relative to the star’s, which increase as the star loses mass. Systems of two and three planets behave somewhat differently, but in both cases the question is whether, or on what timescale, the planets will approach each other closely. This complex set of phenomena, and the causal relationships between them, are described by the authors in the quoted sentence.

Clearly, to a large degree 1 and 2 are interdependent. Wovon man nicht sprechen kann, darüber muss man schweigen. We must have the necessary vocabulary to begin to discuss anything, for if we do not have words to describe something, how can we discuss it? On the other hand, scientific language tends to take familiar words and give them a very precise meaning. For example, one cannot properly understand a closed metric space simply by referring to the everyday meanings of the three component words. Increasing technical vocabulary and concepts go hand in hand when studying any field, sport as much as science, and together present one barrier to understanding a text. This is unavoidable, since it is not possible in any paper to explain all the technical terms used: that is a matter for the textbooks! It is, however, a reasonable assumption that anyone (scientist or layperson) interested in reading a paper will have some knowledge of the requisite background, and it is common to introduce less well-known terms. In my field, for example, it is customary to inform readers that a “debris disc” is an extra-Solar analogue of our Solar System’s Asteroid and Kuiper Belts: any professional astronomer, or layperson with an interest in astronomy, will understand that.

What I’ve been wondering is whether 3 is so necessarily dependent on 4; and, if so, whether that is a barrier to, or maybe rather an aid to, understanding. I.e., does the semantic complexity of scientific concepts—complex chains of causation, conditionals and counterfactuals, caveats etc.—necessarily entail a concomitant syntactic complexity when they are expressed in natural language? And if so, is this a hindrance to understanding them, or would lots of shorter, simpler sentences be harder to understand than one equivalent long one? To give an example, could the sentence from Debes & Sigurdsson quoted above: “We show that when mass loss is slow, systems of two planets that are marginally stable can become unstable to close encounters, while for three planets the timescale for close approaches decreases significantly with increasing mass ratio.” be rewritten something like this: “Some systems of two planets are marginally stable. We show that slow mass loss can render them unstable to close encounters. Systems of three planets have a timescale for close approaches to occur. We show that slow mass loss decreases this timescale significantly.” This is more verbose (43 words against 37), as there is some repetition (e.g., of “we show”), and breaking up the sentences means more pronouns had to be introduced. Question to you, readers: Do you find the original sentence, or my rewritten sentences, easier to understand?

While pondering that, it’s worth comparing scientific prose to other factual English prose written to inform at a relatively high level. To do this, I compared two texts: First, the abstract and introduction from the Debes & Sigurdsson paper from which the above sentence was taken; this had the triple advantage of being (1) close to hand, (2) related to my research, and (3) despite my complaints about the difficulty of scientific writing, actually a very well-written paper. Second, the opening paragraphs of “The History of Madrid” chapter from my Dorling Kindersley travel guide “Madrid”; this has the advantage of needing to introduce new vocabulary itself, in the form of Spanish words, persons etc., in the same manner as the introduction to a scientific paper.

The simplest measure of linguistic complexity might be to simply count the number of words in the sentences. This I did for each text, removing sentences containing semi- or full-colons (as they could just as easily be parsed as compound sentences or strings o simple sentences). For the scientific paper, I ignored any references in parentheses such as “…similar to the Solar system (Duncan & Lissauer 1998).”, but counted references essential to the meaning of a sentence such as “Duncan & Lissauer (1997) … found that…”, in this case, as 4 words.

There were 32 sentences in the Debes & Sigurdsson text, and 31 in the Dorling Kindersley. The former had an average length of 30.2 +/- 8.9 words, the latter 21.9 +/- 7.4. While the sentences in the scientific text are longer, with such a range of lengths the difference is probably not statistically significant, and I will need more samples of texts to say definitively whether the scientific paper’s sentences are longer. I should also draw from more than one source, since there might be great variations of individual style between authors.

On the other hand, just because a sentence is long, does that make it complicated? The sentence “The big red car, the small green car, and the huge black bus stopped at the traffic lights”, containing 18 words, I would say is less complicated than “The car which was big and red, and the bus indicating to turn left, stopped when they wanted”, despite having the same number of words. The former is a simple sentence with only one main clause, the latter a complex sentence with one main and two subordinate clauses, as well as a participial phrase. To attempt to measure this sort of complexity, I counted the number of finite verbs (those that can stand alone in a clause or sentence) in each sentence. This gives the average number of clauses per sentence, although it misses out constructions such as participial phrases, infinitive phrases, gerunds and the like. The results were: Debes & Sigurdsson, 2.25 +/- 1.14 finite verbs per sentence; Dorling Kindersley, 1.94 +/- 1.06 finite verbs per sentence. Again, we see a hint that the scientific text is indeed more complicated.

Although this needs to be confirmed with a larger study, this suggests that scientific prose may be more syntactically complex than other informative factual writing. Whether this is a hindrance to understanding, and whether it is practical to simplify our language, are of course further questions to explore…

Oh, and I just read through the draft of this, and Damn! do I write some long sentences!

Edit (25/09/11 17:46 UT): A t-test tells me that the difference in mean sentence length between the two texts is significantly different (p=0.00016, about three and a half sigma) but not the difference in number of verbs (p=0.26).

1OK, I know this isn’t science, but it’s a good example. Try this from last week’s Nature: “In contrast with previous assumptions, we report here that the nascent antizyme polypeptide is the relevant polyamine sensor that operates in cis to negatively regulate upstream RFS on the polysomes…”



Well, I’ve now got most of my hassle of moving jobs, country, language etc. sorted out now. (OK, the last one was a lie; I still can’t communicate. I just about managed to get some throat sweets this morning though.) Unfortunately, to my great distress, my favourite and most useful textbook, Solar System Dynamics by Murray & Dermott, is MIA after the move! While it awaits recovery or replacement (I’d rather it were recovery–it’s got 4 years’ worth of marginalia inside!) here are some more photos of Madrid…

I was surprised to learn on moving here how modern the city is. As I’d done some reading up on Spanish history before I came, I had learned that it was relatively unimportant in the Mediæval period, only being adopted as the seat of the monarchy during the reign of Felipe II (him of Spanish Armada fame) in the late 16th Century. So there are precious few Mediæval buildings left. An example is San Jerónimo el Real, built in the early 16th Century but extensively restored in the 19th.

San Jerónimo el Real

The church of San Jerónimo el Real, close to the Prado museum.

However, what surprised me more was really how few of the important buildings date to the Siglo de Oro, the “Golden Age” of the 16th to 17th Centuries. An example is Plaza Mayor, laid down at the turn of the 17th Century, albeit with some restoration following later fires &c.

Plaza Mayor

Plaza Mayor, surrounded by tapas bars. Rather pricey, but very tasty!

Many of the most famous buildings are rather modern: The Palacio Real, for example, built in the mid 18th Century…

Palacio Real

The South facade of the Palacio Real.

…and the Prado, built at the end of the 18th Century:

The Prado

The North facade of the Prado.

In my naïveté I had imagined that this time period, which geopolitically would be regarded as the time of the decline of the Spanish Empire, would have engendered less civic architecture, but that is not the case indeed! Development continued throughout the 19th Century, including buildings such as the Biblioteca Nacional:

Biblioteca Nacional

Biblioteca Nacional, the National Library. To make this post tangentially related to celestial mechanics, the figure seated at front right is King Alfonso "The Wise" of Castille, who ordered the compilation of the Alphonsine Tables of planetary positions, used in Europe throughout the Middle Ages and into the Renaissance.

The 20th Century saw transformation of Madrid’s skyline with the introduction of skyscrapers, one of the first being the Telefonica building:


The Telefónica building on the Gran Vía.

To one used to the more modest architecture of Cambridge, all the tall buildings here felt somewhat overbearing at first. I am, however, grateful for the shade they provide from the Sun here (note sparsity of clouds in these pictures!)

What I did on my holidays

If there hasn’t been much activity here recently, that’s because I’ve been:

  1. On holiday, visiting my family in Sheffield, a city rich in industrial heritage

    Bessemer Converter at Kelham Island Industrial Museum, Sheffield

    and surrounded by lovely countryside

    Hope Valley, Derbyshire, seen from Mam Tor. The nearby village is Castleton; the Hope Valley Cement Works are behind it, and Winnats Pass is to the right.

  2. Preparing for and undergoing my Ph.D. viva in Cambridge, where I said farewell to my beautiful office building

    The Observatory Building, Institute of Astronomy, Cambridge

  3. Attending a meeting in the Spanish Sierra

    Sierra de Guadarrama, just north of Madrid

  4. And settling into my new job in Madrid

    The picturesque view from my new office window!

Back to Celestial Mechanisation soon!