Gestural feature transcription

From SpeechWiki

(Difference between revisions)
Jump to: navigation, search
Line 68: Line 68:
*If the .pl/.dg tiers are both "NONE/VOW", then there should be a vowel label in the .vow tier; otherwise, the .vow tier should be "N/A".  
*If the .pl/.dg tiers are both "NONE/VOW", then there should be a vowel label in the .vow tier; otherwise, the .vow tier should be "N/A".  
*The only exception is for rhoticized vowels ([er], [axr]) and syllabics ([el], [em], [en]); these get a vowel label in the .vow tier and a constriction in the .pl1/.dg1 tier ("RHO/APP" for [axr], [er]; "LAT/CLO", "LAB/CLO", or "ALV/CLO" for [el], [em], [en]).
*The only exception is for rhoticized vowels ([er], [axr]) and syllabics ([el], [em], [en]); these get a vowel label in the .vow tier and a constriction in the .pl1/.dg1 tier ("RHO/APP" for [axr], [er]; "LAT/CLO", "LAB/CLO", or "ALV/CLO" for [el], [em], [en]).
 +
 +
; Laterals
 +
*Use "LAT" place for both light and dark [l]s. For an [l] with incomplete tongue tip closure, use "APP" in the degree tier. Example: the [l]s at the end of the words "all", "feel" would usually be marked as a "LAT/APP" segment followed by a "LAT/CLO" segment.
 +
 +
;Stops:
 +
*For unaspirated (voiced or voiceless) stops, the .dg is CLO during the closure, then FRIC during the frication.
 +
*For voiceless stops at the beginning of a stressed syllable, there may be aspiration. If there is a clear distinction between a frication portion and aspiration portion, use CLO for closure, FRIC for frication, APP during the aspiration. The aspiration should also be indicated as ASP in the .glo tier.
 +
*When entering a phone label which will be converted to feature tiers, the burst will not be separated into frication and aspiration; however, if there are feature changes during the aspiration (e.g. if the lateral in the word "place" occurs during the aspiration), then label the aspiration portion with feature labels.
 +
*If there is an utterance-initial or -final stop closure, it may not be possible to tell where the boundary between closure and silence is. In that case, mark a 100ms-long closure.
 +
 +
;Transitional periods between steady states
 +
*Don't label them as separate segments if they are natural/necessary transitional periods, e.g. the formant transitions between vowels and consonants. If there
 +
is an "extra" transitional sound beyond what is necessary, like "feel" --> [f iy ax l], label it as such.
 +
 +
; Voicing onset/offset
 +
*When in doubt, open a waveform blow-up; the onset/offset of voicing is the point at
 +
which periodicity starts/stops.
 +
 +
; Diphthongs realized as monophthongs
 +
*Label them as the monophthong, not as 1, when possible. For example, an underlying [aw] that's produced as an [ae] should be marked [ae], not [aw1]. Similarly for
 +
[ey]. For [ow], there's no monophthong label corresponding to the first part, so mark it [ow1]. For [oy], the initial part of it may sound like an [ao], in which case label it as such; or it may sound more like the sound at the beginning of [ow] or "or", in which case label it [oy1].
 +
 +
===Miscellaneous tips===
 +
*If an utterance is particularly long or difficult, remember that you can always save a partial transcription and return to it later.
 +
*Most people seem to find it easier to label all tiers from left to right, rather than one tier at a time. However, feel free to label however you see fit.

Revision as of 05:01, 26 January 2010

Contents

Transription guidelines

Checklist

Before starting an utterance
  • Make sure to have the phone-to-feature mappings, these instructions, and Heejin's Praat-specific guidelines handy.
  • Open the utterance in Praat.
  • Write down the time.
To keep in mind during transcription
  • The detailed guidelines below and Heejin's guidelines (to be merged into one document at some point).
  • The boundaries in the initial .wd transcription may have errors. Do not feel bound by them.
  • The final transcription should give enough information so that the speaker, by looking at the transcription, could recreate the acoustics exactly. More on this in the detailed guidelines below.
  • Go for accuracy over speed. But if you are very unsure about a segment, use "?" or multiple labels (e.g. "FRIC/APP").
  • Don't worry about exact boundaries up to +/- 20ms. I.e. if you find yourself unsure about a boundary location, but the uncertainty is within +/- 20ms, place the boundary somewhere in the appropriate range and move on.
  • Use the .cm tier to mark anything problematic or that should be discussed.
  • Use the menus to label feature tiers, rather than typing in, unless you are marking an unusual/uncertain segment (e.g. with "?" or multiple labels).
  • It's OK to use some deductive reasoning. E.g. if the intended sound is [b] but the actual segment is a fricative, it is probably a LAB, FRIC and not a [v] (L-D, FRIC).
When finishing an utterance
  • Write down the time.
  • Save the transcription.
  • Do these sanity checks:
    • Re-skim the guidelines below to make sure the transcription follows them.
    • Listen to each segment to make sure you believe the label for that segment.
  • Stretch, get a drink...
When doing a 2nd pass
  • Fix mistakes, not disagreements.
  • Once the 2nd pass is done, you should be able to "defend" each difference between transcriptions and the other transcribers', i.e. "I chose this over that because...".

Detailed guidelines

In no particular order...

When using a phone label to generate a feature vector
  • If a phone has an unspecified feature value ([hh], [q], [r], [er], [axr], [sh], [zh], [ch], [jh]), that feature must be entered explicitly. E.g. an [hh] must have its vowel tier specified by hand; it should be easy to tell what vowel shape the [hh] is in based on formants/listening. Similarly, when [q] is realized as IRR, the vowel shape is usually easy to tell and should be labeled in the .vow tier (if it's not easy to tell, label it '?'). For some phones with unspecified feature values, it may be hard to tell what the actual value is; e.g. an [r] may be rounded or unrounded; label it '?' if you can't tell.
The "recreation" rule
  • The final transcription should give enough information so that the speaker, by looking at the transcription, could recreate the acoustics exactly (assuming he/she could actually read the transcription).
  • For example, if a word is very reduced but with a hint of the original gestures, that should be indicated somehow in the transcription. E.g. if the word is "probably" and is produced like "pry" but with a hint of labial/lateral gestures in the middle, don't transcribe it as /pcl p r ay1 ay2/. Use place=LAB or LAT and degree=APP to indicate these hints of gestures.
[ah] vs. [ax], [ih] vs. [ix], [er] vs. [axr]
  • Use a schwa if the segment is unstressed and 50ms or shorter.
APP degree
  • Used for both glides ([y], [w]) and other sounds realized as approximants. If there is any gesture towards an intended consonant, even if small, use APP degree. E.g. if "probably" is produced almost like [p r ay] but with some evidence of lip narrowing in the middle of the [ay]-like region, mark that as APP, LAB.
Two stop closures in a row
  • If you can't tell when the place of closure has changed (e.g. "woul*d g*o"), just mark the boundary in the middle.
"GLO" place
  • Used only for glottal stops.
VOI vs. IRR in the .glo tier
  • If there are regular pitch periods, even with very low pitch, label them as VOI. Use IRR only when the pitch periods are not at regular intervals.
Boundaries in diphthongs\
Where does the boundary between 1 and 2 go?
  • [aw1] should look/sound more like an [ae] (or [aa] in some dialects), [aw2] like an [uh] or [w]
  • [ay1] <--> [aa], [ay2] <--> [ih] or [y]
  • [ey1] <--> [eh], [ey2] <--> [ih] or [y]
  • [ow1] doesn't have a non-diphthong correlate, [ow2] <--> [uh] or [w]
  • [oy1] <--> [ao] or [ow1], [oy2] <--> [ih] or [y]
Voiceless vowels
  • Mark them as "VL", not "ASP", in the glottal tier (to differentiate from [hh])
The vowel rule
  • If the .pl/.dg tiers are both "NONE/VOW", then there should be a vowel label in the .vow tier; otherwise, the .vow tier should be "N/A".
  • The only exception is for rhoticized vowels ([er], [axr]) and syllabics ([el], [em], [en]); these get a vowel label in the .vow tier and a constriction in the .pl1/.dg1 tier ("RHO/APP" for [axr], [er]; "LAT/CLO", "LAB/CLO", or "ALV/CLO" for [el], [em], [en]).
Laterals
  • Use "LAT" place for both light and dark [l]s. For an [l] with incomplete tongue tip closure, use "APP" in the degree tier. Example: the [l]s at the end of the words "all", "feel" would usually be marked as a "LAT/APP" segment followed by a "LAT/CLO" segment.
Stops
  • For unaspirated (voiced or voiceless) stops, the .dg is CLO during the closure, then FRIC during the frication.
  • For voiceless stops at the beginning of a stressed syllable, there may be aspiration. If there is a clear distinction between a frication portion and aspiration portion, use CLO for closure, FRIC for frication, APP during the aspiration. The aspiration should also be indicated as ASP in the .glo tier.
  • When entering a phone label which will be converted to feature tiers, the burst will not be separated into frication and aspiration; however, if there are feature changes during the aspiration (e.g. if the lateral in the word "place" occurs during the aspiration), then label the aspiration portion with feature labels.
  • If there is an utterance-initial or -final stop closure, it may not be possible to tell where the boundary between closure and silence is. In that case, mark a 100ms-long closure.
Transitional periods between steady states
  • Don't label them as separate segments if they are natural/necessary transitional periods, e.g. the formant transitions between vowels and consonants. If there

is an "extra" transitional sound beyond what is necessary, like "feel" --> [f iy ax l], label it as such.

Voicing onset/offset
  • When in doubt, open a waveform blow-up; the onset/offset of voicing is the point at

which periodicity starts/stops.

Diphthongs realized as monophthongs
  • Label them as the monophthong, not as 1, when possible. For example, an underlying [aw] that's produced as an [ae] should be marked [ae], not [aw1]. Similarly for

[ey]. For [ow], there's no monophthong label corresponding to the first part, so mark it [ow1]. For [oy], the initial part of it may sound like an [ao], in which case label it as such; or it may sound more like the sound at the beginning of [ow] or "or", in which case label it [oy1].

Miscellaneous tips

  • If an utterance is particularly long or difficult, remember that you can always save a partial transcription and return to it later.
  • Most people seem to find it easier to label all tiers from left to right, rather than one tier at a time. However, feel free to label however you see fit.
Personal tools