Voice or voicing is a term used in phonetics and phonology to characterize speech sounds (usually consonants). Speech sounds can be described as either voiceless (otherwise known as unvoiced) or voiced.
The term, however, is used to refer to two separate concepts:
For example, voicing accounts for the difference between the pair of sounds associated with the English letters "s" and "z". The two sounds are transcribed as [s] and [z] to distinguish them from the English letters, which have several possible pronunciations, depending on the context. If one places the fingers on the voice box (i.e. the location of the Adam's apple in the upper throat), one can feel a vibration while zzzz is pronounced but not with ssss. (For a more detailed, technical explanation, see modal voice and phonation.) In most European languages, with a notable exception being Icelandic, vowels and other sonorants (consonants such as m, n, l, and r) are modally voiced.
The International Phonetic Alphabet has distinct letters for many voiceless and voiced pairs of consonants (the obstruents), such as [p b], [t d], [k ɡ], [q ɢ]. In addition, there is a diacritic for voicedness: ⟨◌̬⟩. Diacritics are typically used with letters for prototypically voiceless sounds.
In Unicode, the symbols are encoded U+032C ◌̬ COMBINING CARON BELOW (HTML
̬) and U+0325 ◌̥ COMBINING RING BELOW (HTML
|₍s̬₎||partial/central voicing of [s]||₍z̥₎||partial/central devoicing of [z]|
|₍s̬||initial voicing||₍z̥||initial devoicing|
|s̬₎||final voicing||z̥₎||final devoicing|
Partial voicing can mean light but continuous voicing, discontinuous voicing, or discontinuities in the degree of voicing. For the example, ₍s̬₎ could be an [s] with (some) voicing in the middle and ₍z̥₎ could be [z] with (some) devoicing in the middle.
Partial voicing can also be indicated in the normal IPA with transcriptions like [ᵇb̥iˑ] and [ædᵈ̥].
The distinction between the articulatory use of voice and the phonological use rests on the distinction between phone (represented between square brackets) and phoneme (represented between slashes). The difference is best illustrated by a rough example.
The English word nods is made up of a sequence of phonemes, represented symbolically as /nɒdz/, or the sequence of /n/, /ɒ/, /d/, and /z/. Each symbol is an abstract representation of a phoneme. That awareness is an inherent part of speakers' mental grammar that allows them to recognise words.
However, phonemes are not sounds in themselves. Rather, phonemes are, in a sense, converted to phones before being spoken. The /z/ phoneme, for instance, can actually be pronounced as either the [s] phone or the [z] phone since /z/ is frequently devoiced, even in fluent speech, especially at the end of an utterance. The sequence of phones for nods might be transcribed as [nɒts] or [nɒdz], depending on the presence or strength of this devoicing. While the [z] phone has articulatory voicing, the [s] phone does not have it.
What complicates the matter is that for English, consonant phonemes are classified as either voiced or voiceless even though it is not the primary distinctive feature between them. Still, the classification is used as a stand-in for phonological processes, such as vowel lengthening that occurs before voiced consonants but not before unvoiced consonants or vowel quality changes (the sound of the vowel) in some dialects of English that occur before unvoiced but not voiced consonants. Such processes allow English speakers to continue to perceive difference between voiced and voiceless consonants when the devoicing of the former would otherwise make them sound identical to the latter.
English has four pairs of fricative phonemes that can be divided into a table by place of articulation and voicing. The voiced fricatives can readily be felt to have voicing throughout the duration of the phone especially when they occur between vowels.
|Pronounced with the lower lip against the teeth:||[f] (fan)||[v] (van)|
|Pronounced with the tongue against the teeth:||[θ] (thin, thigh)||[ð] (then, thy)|
|Pronounced with the tongue near the gums:||[s] (sip)||[z] (zip)|
|Pronounced with the tongue bunched up:||[ʃ] (Confucian)||[ʒ] (confusion)|
However, in the class of consonants called stops, such as /p, t, k, b, d, ɡ/, the contrast is more complicated for English. The "voiced" sounds do not typically feature articulatory voicing throughout the sound. The difference between the unvoiced stop phonemes and the voiced stop phonemes is not just a matter of whether articulatory voicing is present or not. Rather, it includes when voicing starts (if at all), the presence of aspiration (airflow burst following the release of the closure) and the duration of the closure and aspiration.
English voiceless stops are generally aspirated at the beginning of a stressed syllable, and in the same context, their voiced counterparts are voiced only partway through. In more narrow phonetic transcription, the voiced symbols are maybe used only to represent the presence of articulatory voicing, and aspiration is represented with a superscript h.
|Pronounced with the lips closed:||[p] (pin)||[b] (bin)|
|Pronounced with the tongue near the gums:||[t] (ten)||[d] (den)|
|Pronounced with the tongue bunched up:||[tʃ] (chin)||[dʒ] (gin)|
|Pronounced with the back of the tongue against the palate:||[k] (coat)||[ɡ] (goat)|
When the consonants come at the end of a syllable, however, what distinguishes them is quite different. Voiceless phonemes are typically unaspirated, glottalized and the closure itself may not even be released, making it sometimes difficult to hear the difference between, for example, light and like. However, auditory cues remain to distinguish between voiced and voiceless sounds, such as what has been described above, like the length of the preceding vowel.
Other English sounds, the vowels and sonorants, are normally fully voiced. However, they may be devoiced in certain positions, especially after aspirated consonants, as in coffee, tree, and play in which the voicing is delayed to the extent of missing the sonorant or vowel altogether.
|Voice onset time|
There are two variables to degrees of voicing: intensity (discussed under phonation), and duration (discussed under voice onset time). When a sound is described as "half voiced" or "partially voiced", it is not always clear whether that means that the voicing is weak (low intensity) or if the voicing occurs during only part of the sound (short duration). In the case of English, it is the latter.
Juǀʼhoansi and some of the neighboring languages are typologically unusual in having contrastive partially voiced consonants. They have aspirate and ejective consonants, which are normally incompatible with voicing, in voiceless and voiced pairs. The consonants start out voiced but become voiceless partway through, allow normal aspiration or ejection. They are [b͡pʰ, d͡tʰ, d͡tsʰ, d͡tʃʰ, ɡ͡kʰ] and [d͡tsʼ, d͡tʃʼ] and a similar series of clicks.
There are languages with two sets of contrasting obstruents that are labelled /p t k f s x …/ vs. /b d ɡ v z ɣ …/ even though there is no involvement of voice (or voice onset time) in that contrast. That happens, for instance, in several Alemannic German dialects. Because voice is not involved, this is explained as a contrast in tenseness, called a fortis and lenis contrast.
There is a hypothesis that the contrast between fortis and lenis consonants is related to the contrast between voiceless and voiced consonants. That relation is based on sound perception as well as on sound production, where consonant voice, tenseness and length are only different manifestations of a common sound feature.