Overview of Linguistics - Reading Notes of The Study of Language(Second edition)

I got to know the book The Study of Language on social network(Qzone) and luckily borrowed it from my schoolmate. Before diving into the field of natural language processing, I’d like to get an roughly understanding of linguistics, and this book is a good choice. In this post I’ll take notes on the The Study of Language second edition(though it has sixth edition up to date).

The origins of language


Well, we simply do not know how language originated. Some speculations about the origins of language: the divine source, the natural-sound source and the oral-gesture source.
In a different view, glossogenetics and physiological adapation shows that human teeth(upright), lips(intricate muscle interlacing), larynx(with pharynx) and brain(lateralized) provide possibilities of language.

Major functions of language use

Interactional function: to do with how humans use language to interact with each other, socially or emotionally; how they indicate friendliness, co-operation or hostility, or annoyance, pain, or pleasure.
Transactional function: use linguistic abilities to communicate knowledge, skills and information.

The development of writing

Pictograms and ideograms

Pictogram: picture-writing
Ideograms: idea-writing
The distinction between pictograms and ideograms is essentially a difference in the relationship between the symbol and the entity it represents. The more ‘picture-like’ forms are pictograms, the more abstract, derived forms are ideograms. A key property of both pictograms and ideograms is that they do not represent words or sounds in a particular language.


When symbols come to be used to represent words in a language, they are described as examples of word-writing, or logograms.
Cuneiform writing: normally referred to when the expression “the earliest known writing system” is used.
Characters(Chinese writing): the longest continuous history of use as a writing system.

Syllabic writing and alphabetic writing

To avoid substantial memory load, some principled method is required to go from symbols which represent words(i.e. a logographic system) to a set of symbols which represent sounds(i.e. a phonographic system).
When a writing system employs a set of symbols which represent the pronunciations of syllabic writing. (There are no purely syllabic writing systems in use today.)
Alphabetic writing: the symbols can be used to represent single sound types in a language. An alphabet is essentially a set of written symbols which each represent a single type of sound.

The properties of language

Communicative versus informative

Communicative: intentionally communicating something
Informative: unintentionally sent signals

Unique properties

Displacement: It allows the users of language to talk about things and events not present in the immediate environment. It enables us to talk about things and places whose existence we cannot even be sure of.
Arbitrariness: A property of linguistic signs is their arbitrary relationship with the objects they are used to indicate. They do not, in any way, ‘fit’ the objects they denote.
Productivity(creativity/open-endedness): The potential number of utterances in any human language is infinite.
Culture transmission: Humans are not born with the ability to produce utterances in a specific language.
Discreteness: Each sound in the language is treated as discrete.
Duality: Language is organized at two levels or layers simultaneously. At one level, we have distinct sounds, and at another level, we have distinct meanings.

Other properties

The use of the vocal-auditory channel: Human linguistic communication is typically generated via the vocal organs and perceived via the ears.
Reciprocity: Any speaker/sender of a linguistic signal can also be a listener/receiver.
Specialization: Linguistic signals do not normally serve any other type of purpose, such as breathing or feeding.
Non-directionality: Linguistic signals can be picked up by anyone within hearing, even unseen.
Rapid fade: Linguistic signals are produced and disappear quickly.
Most of these are properties of the spoken language, but not of the written language.

The sounds of language

Phonetics: the general study of the characteristics of speech sounds
Articulatory phonetics: the study of how speech sounds are made, or ‘articulated’
Acoustic phonetics: deals with the physical properties of speech as sound waves ‘in the air’
Auditory(Perceptual) phonetics: deals with the perception, via the ear, of speech sounds
Forensic phonetics: has applications in legal cases involving speaker identification and the analysis of recorded utterances

Voiced and voiceless

When the vocal cords are spread apart, the air from the lungs passes between them unimpeded. Sounds produced in this way are described as voiceless.
When the vocal cords are drawn together, the air from the lungs repeatedly pushes them apart as it passes through, creating a vibration effect. Sounds produced in this way are described as voiced.

Place of articulation

These are sounds formed using both upper and lower lips.

These are sounds formed with the upper teeth and the lower lip.

These sounds are formed with the tongue tip behind the upper front teeth.
Includes: (Sorry can’t type the phonetic symbol directly. The phonetic of th in the,there,then)

These are sounds formed with the front part of the tongue on the alveolar ridge.

These are sounds formed with the tongue at the very front of the palate, near the alveolar ridge.

These are sounds formed with the back of the tongue against the velum.

This sound is produced without the active use of the tongue and other parts of the mouth.

Manner of articulation

Consonant sound resulting from a blocking or stopping effect on the airstream.

As the air is pushed through, a type of friction is produced.

Combine a brief stopping of the airstream with an obstructed release which causes some friction.

The velum is lowered and the airstream is allowed to flow out through the nose.

The articulation of each is strongly influenced by the following vowel sound.

The contents of vowel and the sound patterns are omitted

Words and word-formation process

The invention of totally new terms(The most typical sources are invented trade names for one company’s product which become general terms for any version of that product).
aspirin, nylon, zipper

Take over of words from other languages.
A special type of borrowing is described as load-translation, or calque. In this process, there is a direct translation of the elements of a word into the borrowing language.
alcohol, boss, piano

There is a joining of two separate words to produce a single form.
bookcase, fingerprint, wallpaper

Blending is typically accomplished by taking only the beginning of one word and joining it to the end of the other word.
smog(smoke+fog), bit(binary+digit), brunch(breakfast+lunch)

This occurs when a word of more than one syllable is reduced to a shorter form, often in casual speech.
fax(facsimile), gas(gasoline), ad(advertisement)

A word of one type(usually a noun) is reduced to form another word of a different type(usually a verb).
televise(television), donate(donation), opt(option)

A change in the function of a word.
paper(noun->verb), guess(verb->noun), empty(adjective->verb)

New words formed from the initial letters of a set of other words.
CD(compact disk), radar(radio detecting and ranging), ATM(automatic tell machine)

Accomplished by means of a large number of affixes which are not usually given separate listings in dictionaries.
prefix: added to the beginning of the word un-
suffix: added to the end of the word -ish
infix: incorporated inside another word unfuckingbelievable


Morphology, which literally means the ‘study of forms’, was originally used in biology, but, since the middle of the nineteenth century, has also been used to describe that type of investigation which analyzes all those morphemes which are used in a language.
The definition of a morpheme is “a minimal unit of meaning or grammatical function”.

  • morphemes
    • free
      • lexical
      • functional
    • bound
      • derivational
      • inflectional

Free and bound morphemes

Free morphemes

Morphemes which can stand by themselves as single words.

Lexical morphemes: a set of ordinary nouns, adjectives and verbs which we think of as the words which carry the ‘content’ of messages we convey.
boy, man, house, tiger, sad, long, yellow, sincere, open, look, follow, break
‘Open’ class of words(we can add new lexical morphemes to the language rather easily).

Functional morphemes: a set consists largely of the functional words in the language such as conjunctions, prepositions, articles and pronouns.
and, but, when, because, on, near, above, in, the, that, it
‘Closed’ class of words(we almost never add new functional morphemes to the language).

Bound morphemes

Morphemes which cannot normally stand alone, but are typically attached to another form.

Derivational morphemes: used to makek new words in the language and are often used to make words of a different grammatical category from the stem(when affixes are used with bound morphemes, the basic word-form involved is technically known as the stem).
(-ness, -ful, -less, -ish, -ly, re-, pre-, ex-, dis-, co-, un-)

Inflectional morphemes: to indicate aspects of the grammatical function of a word.
Noun+ -‘s(possessive), -s(plural)
Verb+ -s(3rd person present singular), -ing(present participle), -ed(past tense), -en(past participle)
Adjective+ -est(superlative), -er(comparative)

An inflectional morpheme never changes the grammatical category of a word. A derivational morpheme can change the grammatical category of a word.

Phrases and sentences: grammar

We need a way of describing the structure of phrases and sentences which will account for all of the grammatical sequences and rule out all ungrammatical sequences. Providing such an account involves us in the study of grammar.

The part of speech
Nouns are words used to refer to people, objects, creatures, places, qualities, phenomena and abstract ideas as if they were all ‘things’.
Adjectives are words used, typically with nouns, to provide more information about the ‘things’ referred to. (happy, large, cute)
Verbs are words used to refer to various kinds of actions(run, jump) and states(be, seem) involving the ‘things’ in events.
Adverbs are words used to provide more information about the actions and events(slowly, suddenly). Some adverb(really, very) are also used twith adjectives to modify the information about ‘things’.
Prepositions are words(at, in, on, near, with, without) used with nouns in phrases providing information about time, place and other connections involving actions and things.
Pronouns are words(me, they, he, himself, this, it) used in place of noun phrases, typically referring to things already known.
Conjunctions are words(and, but, although, if) used to connect, and indicate relationships between events and things.

Traditional grammar

In addition to the terms used for the parts of speech, traditional grammatical analysis also gave us a number of other categories, including ‘number’, ‘person’, ‘tense’, ‘voice’ and ‘gender’.
Number is whether the noun in singular or plural.
Person covers the distinctions of first person(involving the speaker), second person(involving the hearer) and third person(involving any others).
Tense: present tense, past tense, future tense.
Voice: active voice, passive voice
Gender: describe the relationship in terms of natural gender, mainly derived from a biological distinction between male and female. (Grammatical gender is common but may not be as appropriate in describing English)

The prescriptive approach

The view of grammar as a set of rules for the ‘proper’ user of a language is still to be found today and may be best characterized as the prescriptive approach.
Some familar examples of prescriptive rules for English sentences:

  1. You must not split an infinitive.
  2. You must not end a sentence with a preposition.

The descriptive approach

Analysts collect samples of the language they are interested in and attempt to describe the regular structures of the language as it is used, not according to some view of how it should be used. This is called the descriptive approach and it is the basis of most modern attempts to characterize the structure of different languages.
Structural analysis
One type of descriptive approach is called structural analysis and its main concern is to investigate the distribution of forms(e.g., morphemes) in a language. The method employed involves the use of ‘test-frams’ which can be sentences with empty slots in them.
Immediate constituent analysis
An approach with the same descriptive aims is called immediate constituent analysis. The technique employed in this approach is designed to show how small constituents(or components) in sentences go together to form larger constituents.
The analysis of the constituent structure of the sentence can be represented in different types of diagrams. (Simple diagram, labeled and bracketed sentences, tree diagrams, discussed in the following chapter in detail)


The word ‘syntax’ came originally from Greek and literally meant ‘a setting out together’ or ‘arrangement’.

Generative grammar
There have been attempts to produce a particular type of grammar which would have a very explicit system of rules specifying what combinations of basic elements would result in well-formed sentences since the 1950s.
Given an algebraic expression , the simple algebraic expression can generate an endless set of values, by following the simple rules of arithmetic. The endless set of such results is ‘generated’ by the operation of the explicitly formalized rules. If the sentences of a language can be seen as a comparable set, then there must be a set of explicit rules which yield those sentences. Such a set of explicit rules is a generative grammar.

Some properties of the grammar

  • The grammar will generate all the well-formed syntactic structures(e.g. sentences) of the language and fail to generate any ill-formed structures.
  • The grammar will have a finite(i.e. limited) number of rules, but will be capable of generating an infinite number of well-formed structures.
  • Property of recursion: the capacity to be applied more than once in generating a structure.
  • The grammar will have to capture the fact that a sentence can have another sentence inside it, or a phrase can have another phrase of the same type inside it.
  • The grammar should be capable of revealing how superficially didstinct sentences are closely related.
  • The grammar should be capable of revealing how some superficially similar sentences are in fact distinct.

Symbols and abbreviations in syntactic description

S sentence
N noun
Pro pronoun
PN proper noun
V verb
Adj adjective
Art article
Adv adverb
Prep preposition
NP noun phrase
VP verb phrase
PP prepositional phrase
* ungrammatical sequence
-> consists of
() optional constituent
{} one and only one of these constituents must be selected

(May be a bit different with the symbols used in the post COMS W4705 Natural Language Processing Note, but it doesn’t matter. And the tree diagram is introduced in that post as well).


Semantics is the study of the meaning of the words, phrases and sentences. Linguistic semantics deals with the conventional meaning conveyed by the use of words and sentences of a language.

Semantic features

Analyze meaning in terms of semantic features. Features such as +animate, -animate; +human, -human, for example, can be treated as the basic features involved in differentiaiting the meanings of each word in the language from every other word.
This approach gives us the ability to predict what nouns would make sentence semantically odd.
However, for many words in a language it may not be so easy to come up with neat components of meaning. Part of the problem seems to be that the approach involves a view of words in a language as some sort of ‘containers’, carrying meaning-components.

Semantic roles

Instead of thinking of the words as ‘containers’ of meaning, we can look at the ‘roles’ they fulfill within the situation described by a sentence.

agent: the entity that performs the action
theme: the entity that is involved in or affected by the action
instrument: if an agent uses another entity in performing an action, that other entity fills the role of instrument
experiencer: when a noun phrase designates an entity as the person who has a feeling, a perception or a state
location: where an entity is
source: where an entity moves from
goal: where an entity moves to

Lexical relations

Characterize the meaning of a word not in terms of its component features, but in terms of its relationship to other words. This procedure has also been used in the semantic description of languages and is treated as the analysis of lexical relations.

Two or more forms with very closely related meanings, which are often, but not always, intersubstituatable in sentences.

Two forms with opposite meanings
gradable antonyms such as the pair big-small, can be used in comparative constructions like bigger than-smaller than. Also, the negative of one member of the gradable pair does not necessarily imply the other.
non-gradable antonyms also called ‘complementatry pairs’, comparative constructions are not normally used, and the negative of one member does imply the other.

When the meaning of one form is included in the meaning of another, the relationship is described as hyponymy. (The meaning of animal is ‘included’ in the meaning of dog. Or, dog is a hyponym of animal.)
When we consider hyponymous relations, we are essentially looking at the meaning of words in some type of hierarchical relationship.
From the hierarchical diagram, we can say that two or more terms which share the same superordinate(higher-up) term are co-hyponymss.
The relation of hyponymy captures the idea of ‘is a kind of’.

The concept of a prototype helps explain the meaning of certain words not in terms of component features, but in terms of resemblance to the clearest exemplar. (For many American English speakers, the prototype of ‘bird’ is the robin.)

Two or more different (written) forms have the same pronunciation.(bare-bear, meat-meet)

One form(written and spoken) has two or more unrelated meanings.(bank: of a river; financial instituion, race: contest of speed; ethnic group)

One form(written or spoken) has multiple meanings which are all related by extension.(head, foot)

(Some other lexical relations like meronyms and holonyms are introduced in this post)


Relationship between words based simply on a close connection in everyday experience. That close connection can be based on a container-contents relation(bottle-coke; can-juice), a whole-part relation(car-wheel; hourse-roof) or a representative-symbol relationship(king-crown; the President-the White House).
Many examples of metonymy are highly conventionalized and easy to interpret. However, many others depend on an ability to infer what the speaker has in mind.


One way we seem to organize our knowledge of words is simply in terms of collocation, or frequently occurring together.
(butter-bread, needle-thread, salt-pepper)


When we read or hear pieces of language, we normally try to understand not only what the words mean, but what the writer or speaker of those words intended to convey. The study of ‘intended speaker meaning’ is called pragmatics.


Linguistic context(co-text)
The co-text of a word is the set of other words used in the same phrase or sentence. This surrounding co-text has a strong effect on what we think the word means.

Physical context
Our understanding of much of what we read and hear is tied to the physical context, particularly the time and place, in which we encounter linguistic expressions.


There are some words in the language that cannot be interpreted at all unless the physical context, especially the physical context of the speaker, is known. Expressions, which depend for their interpretation on the immediate physical context in which they were uttered, are very obvious examples of bits of language which we can only understand in terms of speaker’s intended meaning. These are technically known as deictic expressions.
Person deixis: used to point to a person(me, you, him, them)
Place deixis: (here, there, yonder)
Time deixis: (now, then, tonight, last week)


An act by which a speaker(or writer) uses language to enable a listener(or reader) to identify something.
We can use names associated with things to refer to people and names of people to refer to things. The key process here is called inference. An inference is any additional information used by the listener to connect what is said to whawt must be meant.


When we establish a referent and subsequently refer to the same object, we have a particular kind of referential relationship. The second referring expression is an example of anaphora and the first mention is called the antecedent.
Anaphora can be defined as subsequent reference to an already introduced entity. Mostly we use anaphora in texts to maintain reference.


What a speaker assumes is true or is known by the hearer can be described as a presupposition.
Constancy under negation: although two sentences have opposite meanings, the underlying presupposition remains true in both.

Speech acts

In very general terms, we can usually recognize the type of ‘act’ performed by a speaker in uttering a sentence. The use of the term speech act covers ‘actions’ such as ‘requesting’, ‘commanding’, ‘questioning’ and ‘informing’.

Forms Functions
Interrogative Question
Imperative Command(request)
Declarative Statement

Direct speech act: the forms in the set above is used to perform the corresponding function
Indirect speech act: whenever one of the forms in the set above is used to perform a function other than the one listed beside it


Politeness is showing awareness of another person’s face(Face is public self-image. This is the emotional and social sense of self that every person has and expects everyone else to recognize).

face-threatening act: say something that represents a threat to another person’s self-image (use a direct speech act to order someone to do something)
face-saving act: say something that lessens the possible threat to another’s face (use an indirect speech act instead)

negative face: the need to be independent and to have freedom from imposition
positive face: the need to be connected, to belong, to be a member of the group

Discourse analysis

When we ask how it is that we, as language users, make sense of what we read in texts, understand what speakers mean despite what they say, recognize connected as opposed to jumbled or incoherent discourse, and successfully take part in that complex activity called conversation, we are undertaking what is known as discourse analysis.


Texts must have a certain structure which depends on factors quite different from those required in the structure of a single sentence. Some of those factors are described in terms of cohesion, or the ties and connections which exist within texts.
Analysis of cohesive links within a text gives us some insight into how writers structure what they want to say, and may be crucial factors in our judgments on whether something is well-written or not.


There must be some factor which leads us to distinguish connected texts which make sense from those which do not. This factor is usually described as coherence.
The key to the concept of coherence is not something which exists in the language, but something which exists in people. It is people who ‘make sense’ of what they read and hear.

The co-operative principle

An underlying assumption in most conversational exchanges seems to be that the participants are co-operating with each other.

Four maxim
Quantity: Make your contribution as informative as is required, but not more, or less, than is required
Quality: Do not say that which you believe to be false or for which you lack evidence
Relation: Be relevant
Manner: Be clear, brief and orderly

Background knowledge

We actually create what the text is about, based on our expectations of what normally happens. In attempting to describe this phenomenon, many researchers use the concept of a ‘schema’.
A schema is a general term for a conventional knowledge structure which exists in memory. We have many schemata which are used in the interpretation of what we experience and what we hear or read about.
One particular kind of shcema is a ‘script’. A script is essentially a dynamic schema, in which a series of conventional actions takes place.


Parts of the brain

Broca’s area
Paul Broca, a French surgeon, reported in the 1860s that damage to this specific part of the brain was related to extreme difficulty in producing speech. It was noted that damage to the corresponding area on the right hemisphere had no such effect. This finding was first used to argue that language ability must be located in the left hemisphere and since then has been taken as more specifically illustrating that Broca’s area is crucially involved in the production of speech.

Wernicke’s area
Carl Wernicke was a German doctor who, in the 1870s, reported that damage to this part of the brain was found among patients who had speech comprehension difficulties. This finding confirmed the left-hemisphere location of language ability and led to the view that Wernicke’s area is part of the brain crucially involved in the understanding of speech.

The motor cortex
The motor cortex generally controls movement of the muscles. Close to Broca’s area is the part of the motor cortex that controls the articulatory muscles of the face, jaw, tongue and larynx. Evidence that this area is involved in the actual physical articulation of speech comes from the work, reported in the 1950s, of two neurosurgeons, Penfield and Roberts.

The arcuate fasciculus
The arcuate fasciculus is a bundle of nerve fibers. This was also one of Wernicke’s discoveries and forms a crucial connection between Wernicke’s area and Broca’s area.

The word is heard and comprehended via Wernicke’s area. This signal is then transferred via the arcuate fasciculus to Broca’s area where preparations are made to produce it. A signal is then sent to the motor cortex to physically articulated the word.(A massively oversimplified version of what may actually take place.)

Tongue tips and slips

There is the tip-of-the-tongue phenomenon in which you feel that some word is just eluding you, that you know the word, but it just won’t come to the surface.
The experience which occurs with uncommon terms or names suggests that our ‘word-storage’ may be partially organized on the basis of some phonological information and that some words in that ‘store’ are more easily retrieved than others. When we make mistakes in this retrieval process, there are often strong phonological similarities between the target word and the mistake. Mistakes of this type are sometimes referred to as Malapropisms.
A slip-of-the-tongue often results in tangled expressions or word reversals. This type of slip is also known as a Spoonerism.
Tips-of-the-lung are often simply the result of a sound being carried over from one word to the next, or a sound used in one word in anticipation of its occurrence in the next word.
Slips-of-the-ear can result in misinterpretaion when hearing.


Aphasia is defined as an impairment of language function due to localized cerebral damage which leads to difficulty in understanding and/or producing linguistic forms.
Broca’s aphasia
The type of serious language disorder known as Broca’s aphasia(also called ‘motor aphasia’) is characterized by a substantially reduced amount of speech, distorted articulation and slow, often effortful speech. What is said often consists almost entirely of lexical morphemes(e.g. nouns and verbs). The frequent omission of functional morphemes(e.g. articles, prepositions, inflections) has led to the characterization of this type of aphasia as agrammatic. The grammatical markers are missing.
In Broca’s aphasia, comprehension is typically much better than production.

Wernicke’s aphasia
The type of language disorder which results in difficulties in auditory comprehension is sometimes called ‘sensory aphasia’, but is more commonly known as Wernicke’s aphasia. Someone suffering from this disorder can actually produce very fluent speech which is, however, often difficult to make sense of.
Difficulty in finding the correct words(sometimes referred to as anomia) is also very common and circumlocution may be used.

Conduction aphasia
The type of aphasia is identified with damage to the arcuate fasciculus and is called conduction aphasia. Individuals suffering from this disorder typically do not have articulation problems. They are fluent, but may have disrupted rhythm because of pauses and hesitations. Comprehension of spoken words is normally good. However, the task of repeating a word phrase(spoken by someone else) will create major difficulty. What is heard and understood cannot be transferred to the speech production area.

Dichotic listening

An experimental technique which has demonstrated that, for the majority of subjects tested, the language functions must be located in the left hemisphere is called the dichotic listening test. This is a technique which uses the generally established fact that anything experienced on the right-hand side of the body is processed in the left hemisphere of the brain and anything on the left side is processed in the right hemisphere.
An experiment is possible in which a subject sits with a set of earphones on and is given two different sound signals simultaneously, one through each earphone. When asked to say what was heard, the subject more often correctly identifies the sound which came via the right ear. This has come to be known as the right ear advantage.

The explanation of this process proposes that a language signal received through the left ear is first sent to the right hemisphere and then has to be sent over to the left hemisphere(language center) for processing. This nondirect route will take longer than a linguistic signal which is received through the right ear and goes directly to the left hemisphere. First signal to get processed wins.

Language history and change


the historical study of languages
Cognates: within groups of related languages, we often find close similarities in particular sets of terms. A cognate of a word in one language is a word in another language which has a similar form and is, or was, used with a similar meaning.
Comparative reconstruction: the aim of this procedure is to reconstruct what must have been the original, or ‘proto’ form in the common ancestral language. It’s a bit like trying to work out what the greatgrandmother must have been like on the basis of common features possessed by the set of granddaughters.
The majority principle: if, in a cognate set, three forms begin with a [p] sound and one form begins with a [b] sound, then our best guess is that the majority have retained the original sound(i.e. [p]), and the minority has changed a little through time.
The most natural development principle: based on the fact that certain types of sound-change are very common, whereas others are extremely unlikely.

Language change

Sound changes
metathesis: involves a reversal in position of two adjoining sounds
epenthesis: involves the addition of a sound to the middle of a word
prothesis: involves the addition of a sound to the beginning of a word

Lexical changes


In general terms, sociolinguistics deals with the inter-relationships between language and society. It has strong connections to anthropology, through the investigation of language and culture, and to sociology, through the crucial role that language plays in the organization of social groups and institutions. It is also tied to social psychology, particularly with regard to how attitudes and perceptions are expressed and how in-group and out-group behaviors are identified.

Social dialects

Varieties of language used by groups defined according to class, education, age, sex, and a number of other social parameters.
Factors include: social class and education; age and gender; ethnic background; idiolect; style, register and jargon, diglossia.

Language and culture

Linguistic determinism: language determines thought
The Sapir-Whorf hypothesis: we dissect nature along lines laid down by our native languages.