1. Linguistic units

Linguistic texts are a form of meta-language, i.e. language about language. These texts often quote linguistic data:

      Norman (1988: 92) 

Typically, different kinds of data (speech sounds, meanings, etc.) need to be distinguished. Each of these is transcribed in its own recognizable way.

Transcription conventions are useful because they can distinguish not just between different kinds of data, but also between these data and the main text in a linguistic piece of writing.


Without special conventions:   In linguistic transcription:

Instead of thus, you can use so. Both mean therefore, but thus sounds more formal, and so can also mean to that degree.


Instead of thus, you can use so. Both mean 'therefore', but thus sounds more formal, and so can also mean 'to that degree'.

The main principles are summarized below.

Linguistic transcription comes in different degrees of precision.

We distinguish two: a Strict transcription and the more popular Free transcription.

A. Strict transcription

Type of linguistic data; transcription   Examples from English

A referent (the worldy thing referenced by an utterance) is written in plain text, without any further convention.


door, game

A pronunciation (phonetic realization) is transcribed between square brackets.

One of the most popular orthographies used today is that of the International Phonetic Alphabet, or IPA (see and listen). This is the system shown here.


[dɔː], [dɔːɹ], [geɪm], [gɑɪm]

A form (phonological reality) is transcribed between slashes.

Perhaps confusingly, the letter symbols used to transcribe a form may also include IPA symbols, as with pronunciation (above). But their value differs.

Phonological form is always defined for a single language or dialect. Each letter symbol represents a phoneme, and authors are free to pick their own – as long as they document their choices.

This type of transcription requires a full phonemic analysis of the language or dialect in question.


/dɔr/, /gem/

A meaning (semantic reality) is trancribed in plain text between quotation marks.

To distinguish meanings from book and article citations, meanings can be transcribed between single quotation marks, and citations between double quotation marks.

But conventions vary, and you will find the opposite usage as well: single quotes for citations, and double quotes for meaning.


'door', 'game'

A linguistic sign (signe, i.e. form and meaning as a duality) is transcribed in cursive typeface.

This is the most common convention when quoting words, phrases and sentences from a given language.

In handwriting, this cursive typeface is represented by underlined text.


door, game


B. Free transcription

Less strictly, but more commonly, forms are transcribed as if they were signs: door, game.

The advantage of a Free Transcription is that we do not need to perform a phonological analysis before we can transcribe forms. They can be written in regular orthography, but cursive (or underlined).

The disadvantage is that there is no way to distinguish between forms from signs in this transcription.

2. Other data

Apart from the units that constitue language, the following types of data also appear frequently in linguistic texts.

Type of data; transcription   Examples

A citation from a book or article, i.e. the literal representation of another text, is placed between quotation marks.

To distinguish between cited text and transcribed meanings, the former may be written between double quotes and the latter between single quotes; or vice versa.


According to the Foreword in the Bloomsbury English dictionary (2004: x) the work celebrates "the richness and diversity of the many varieties of English encountered in daily life".

In the conclusion of Progress in language, Jespersen describes the relationship between emotional cries and the use of language. He was in no doubt about their relative chronology: "Men sang out their feelings long before they were able to speak their thoughts" (1894, 1992: 360).

Any change (deletion, correction or addition) in cited text needs to be marked by means of square brackets.

Deleted text is marked with three dots between square brackets.

(Square brackets are also used in phonetic transcriptions: see the examples above.)

According to Li (1981: 57), "to use an RVC [resultative verb compound], the agent must have initiated the primary action referred to by the compound [i.e. the meaning of the first element], while the use of néng 'can' only suggests the possibility of initiating the action".

Lyons (1977: 503) claims that "[e]very statement that can be made by uttering a simple sentence expresses a proposition, which, if it is informative [...], provides the answer to either an explicit or an implicit question".

A graphical form (orthographic reality) is transcribed between angled brackets.


The letter <ß> was replaced by <ss> in the German orthography of Switzerland ages ago.

The Rueb spelling of the dialect of The Hague has <E> to represent both the vowel /ɛ/ and the schwa vowel /ə/, as in <MAUDEL> 'model', <DE HAAG> 'The Hague' and <ENKELT> 'only'.

3. Examples

Each of the following sentences illustrates one or more conventions explained above.

Linguistic units are represented in a Free transcription whenever this is feasible.

1.  Instead of thus, you can use so. Both mean 'therefore', but thus sounds more formal, and so can also mean 'to that degree'.

2.  The word beard does not refer to hair on the upper lip, but to hair on the chin.

3.  In French, the masculine noun livre means 'book', while the feminine noun livre means 'pound': Un livre contient des lettres et des mots. 'A book contains letters and words.'; Une livre contient 3500 kilocalories. 'One pound contains 3500 kilocalories.'.

4.  For the vowel [I] in bit, the jaw dips lower than for the vowel [i] in beat.

5.  In Finnish Menen metsään. 'I go into the forest.', metsään 'into the forest' is the illative case of metsä 'forest'.

6.  Adult speakers of Mandarin can refer to their father as wǒ bàba 'my father' without conveying the childish connotations associated with English my daddy.

7.  For Dutch Weet ik. 'I know.', the inversion between subject and verb signals the semantic presence of an implicit definite object 'it', which is lacking for Ik weet. 'I am cognizant.'.

8.  Comrie (1983: 100) describes the English form know as a "non-third person singular verb".

9.  According to Light's (1977: 35) comparison between the meanings of the Mandarin potential form the auxiliary verb néng 'can', "in instances where external forces are at work or permission is required, néng is mandatory and the potential is excluded". This would make it inappropriate to say tiào bu guò qù 'cannot jump across' about someone with a broken ankle. But how do we establish that a broken ankle is an "external force"?

10.  In Dutch orthography, <a> in <dag> represents the phoneme /ɑ/, but in <dagen>, the same letter <a> represents the phoneme /a/.

4. Exercises

Now apply the conventions explained above to each of the following sentences.

For linguistic units, use a Free transcription whenever this is feasible.

1.  Spanish un millón one million is constructed with a following noun by means of the subordinative expression de of, as in un millón de personas one million individuals.

2.  In Swedish, the form kyrko- in the compound kyrkogård churchyard preserves the genetive case of kyrka church.

3.  For historical reasons, the Spanish phoneme b has two different representations in Spanish orthography: b and v.

4.  For vulgar Pekingese sā three items the place of articulation is the same as that of English th θ in think, but the Beijing sound has less friction and makes a more relaxed impression acoustically.

5.  According to Baxter and Sagart's (2014) Old Chinese: A new reconstruction, Mandarin xiū shame goes back to Old Chinese s-nu.

6.  Charles Hockett's article Peiping phonology of 1966 gives an accurate contemporary phoneme inventory of Peking Mandarin.

7.  Dutch children aged two sometimes separate labiality from continuity for the labial continuants m en w, leading to realizations such as pələˈzik ̚  for muziek music and puləˈsej for w.c. toilet.

8.  The Mandarin particle a is used to communicate that the speaker expects a follow-up to match the preceding expression.

9.  The symbol is read as ku in the Japanese hiragana script, and as qi in the Chinese Phonetic Spelling Alphabet.

10.  For Frisian roppe call, cry the doetiid or past tense is formed with the vowel ô: ik rôp I called, do rôpst you called, etcetera.

5. More exercises

Again, apply the conventions explained above to each of the following sentences.

For linguistic units, use a Free transcription whenever this is feasible.

1.  In Dutch, both boom tree and boon bean have a final nasal consonant.

2.  Native speakers of Mandarin may find it hard to distinguish between ear and year in English.

3.  Chat-en-Oeuf, which literally means cat in egg, may be a surprising name for a wine, since wines seem to bear no meaningful relation with cats, nor with eggs. However, the fact that the name sounds like Chateauneuf can hardly be a coincidence.

4.  Korean, according to Volume 18 of the English edition of Kim Il Sung's Works, is a very good language ... so rich that it is capable of expressing any complex thought or delicate feeling.

5.  Yuen Ren Chao's article 回想我在語言上犯過的錯誤 Huíxiáng wǒ zài yǔyán shang fàn guo de cuòwu has been translated by George Kao as Where I went wrong in matters of language.

6.  In her lecture 要充分注意虚词使用的语义背景 Yào chōngfèn zhùyì xūcí shǐyòng de yǔyì bèijǐng The significance of the semantic context in the use of function words, 马真 Mǎ Zhēn explains the differences between bìng actually, truly and què truly, indeed.

7.  In Mandarin, the high vowel i has a rounded counterpart y.

8.  The x in English orthography is sometimes pronounced as ks, sometimes as z.

9.  In the Pinyin transcription, the letter r represents both a retroflex consonant ʐ and a retroflex semivowel ɻ.

10.  The Chinese character 儿 is used not just for ér child, but also to write the suffix -r, as in jiēr today.


