Beruflich Dokumente
Kultur Dokumente
Primary Sound One of the five primary types: Speech, Babble, Overlap, Music, Noise.
Type
Language The language_locale code of each of the languages spoken in the segment.
Use "Unknown" for any language variety that you cannot confidently identify. Use
XX in place of the locale code if you can identify the language but you cannot
confidently determine the locale (e.g., en_XX = English from an unknown locale).
We will provide the list of valid language_locale codes to be used. Contact us if you
identify a variety in the file that is not on the provided list.
Speaker ID A string that uniquely identifies the speaker. The Speaker ID must be consistent
throughout the entire file.
3. Transcription Conventions
Transcription should represent all words as spoken – including hesitations, filler words, false starts, and
other verbal tics.
3.1. Characters and Special Symbols
Transcription should include only upper and lowercase letters, apostrophes, commas, exclamation
points, hyphens, periods, question marks, spaces, and a limited set of special mark-up symbols.
Don't use numerals (e.g., 1, IV) and special symbols (e.g., $, +, @) to transcribe spoken words.
· "I have like $0" = "I have like zero dollars."
· "It was great/weird" = "It was great slash weird."
· "6 + 6 = 12." = "six plus six equals twelve."
· "My email is m-golden@gmail.com" = "My email is M dash golden at gmail dot com."
Below is the set of special mark-up symbols used in the transcription to indicate certain features or
events within an audio file (e.g., unintelligible speech, code-mixing). Do not use these symbols for any
reason other than as mark-up language.
Symbol(s) Name Use
<> Angle brackets Around opening and closing tags e.g., <initial>.
: Colon In conjunction with angle brackets and slash for non-target
language tag e.g., <lang:Foreign></lang:Foreign>.
/ Slash In conjunction with angle brackets for closing markup tags e.g.,
</initial>.
Use an abbreviation only if the speaker explicitly pronounces the word as abbreviated. Don't add a
period after an abbreviated word (unless it appears at the end of a sentence).
· "I live in Cambridge, Mass." = "I live in Cambridge, Mass."
· "Billie Jean King went to Cal State." = "Billie Jean King went to Cal State."
The titles Ms, Mrs, Mr, and Mx that prefix a person's name are considered words in their own right, not
abbreviations. When used as titles, transcribe them as Ms, Mrs, Mr, and Mx. When used as direct
addresses (without a following name), transcribe them as spelled-out forms (e.g., mister or missus).
· "Mr. Smith this way please." = "Mr. Smith, this way please."
· "Hey mister can you help me with this survey?" = "Hey, mister, can you help me with this survey?"
3.5. Contractions
Standard contractions must be transcribed as they are pronounced (e.g., isn't, where's, y'all). Include the
apostrophe in the spelling.
Transcribe the following contractions as a single word:
· gimme
· gonna
· gotta
· lemme
· wanna
· watcha
· kinda
3.6. Interjections
Interjections are words or expressions that speakers use within an utterance to express affirmation,
surprise, or negation. Each language has its own specific set of interjections that speakers can use. When
transcribing interjections, use language-specific standardized spellings. Interjections do not require any
special mark-up symbols.
For English, we transcribe only the following interjections:
· eee · mm · uh-oh
· ew · mhm · whoa
· huh · nah · whew
· hmm · oh · yay
· jeez · uh-huh · yep
Notes:
· Interjections are not to be confused with filler words. See Section 3.11.2 for guidelines on filler words.
· In particularly, the interjection "hmm" is not to be confused with the filler word "#hm". Use context to
disambiguate the two different uses.
3.7. Individual Spoken Letters
Transcribe individual spoken letters as capital letters, separated by a space.
· "My name is John – jay, oh, eich, en". = "My name is John J O H N."
This does not apply to initialisms (e.g., IBM, FBI). More on transcribing initialism to follow in Section
3.10.
3.8. Numbers
Spell out numbers in full, not with numerals, according to how the speaker says them. This applies to
both cardinal (e.g., 0, 215) and ordinal numbers (e.g., 1st, 5th).
· "5" = "five"
· "5 " = "fifth"
th
· "306" = "three hundred and six", "three oh six", or "three zero six", depending on how it was
pronounced.
· "Play radio 109.4 FM" = "play radio one oh nine point four <initial>FM</initial>"
· "Beverly Hills, 90210" = "Beverly Hills nine oh two one oh"
When spelling out numbers, use hyphens as required by the rules of the language. In English, numbers
from twenty-one through ninety-nine are spelled with hyphens. Others are not hyphenated.
· "twenty-five"
· "three hundred"
· "five hundred fifty-two"
· "nineteen forty-five"
3.9. Punctuation
Only apostrophes, commas, exclamation points, hyphens, periods, question marks should be used as
punctuation marks. Don't use any other English punctuations (e.g., semi-colons, and quotation marks).
Use these punctuations as required by the grammar rules.
End Punctuations
Periods Use a period only at the end of a complete sentence that is a statement.
Question Use a question mark only after a direct question or a tag question.
Marks
· Isn't that simple?
· You know the answer, don't you?
Exclamation Use an exclamation point at the end of a sentence when you feel or hear an
Points emphatic stress or intonation. An exclamation point usually marks an outcry or an
emphatic or ironic comment.
· That's the biggest pumpkin I have ever seen!
· When will I ever learn!
Sentence-Internal Punctuation
Commas Use commas to break up long stretches of speech. This is to facilitate reader
comprehension. Below are some suggestions of when a comma should be used:
· To separate items in a list of three or more, using the serial (aka Oxford) comma
(i.e., the comma before the conjunction that joins the last two elements:
· I enjoy skydiving, snowboarding, and mountain biking.
· To set off a direct address:
· Maryam, listen to me carefully.
· I'm not calling you, my friends, just to whine about my life.
· To break up compound and complex sentences:
· I would like to join you, but I'm afraid I have class at that time.
· Marcos and I couldn't go to the jazz concert, so we watched it on TV instead.
· To set off introductory words and phrases:
· Therefore, they cancelled their trip.
· After taking a break, the team resumed their meeting.
· Around parenthetical phrases:
· That report on the New York Times was, to say the least, a bombshell.
· Getting a hotel by the sea, like the one we stayed last year, would be superb.
Word-Internal Punctuations
Hyphens Use hyphens according to standard orthographic rules of the language. If it is not
clear if a compound word should be spelled with a hyphen or not, Reference the
American Heritage Dictionary as a reference.
Here are a few examples of English compound words that can (or sometimes
must) use hyphens:
· a-line
· d-day
· ex-boyfriend, ex-drummer
· extra-loud
· self-aware
· t-shirt
· u-turn
· v-neck
· x-ray
For product names, only use hyphens if they are parts of the official product
names.
· "Let's go to Chick-fil-A" = "Let's go to Chik-fil-A."
For hyphens in numbers, see Section 3.8.
When transcribing a language other than English, use punctuation symbols and rules that are
appropriate for that language. This could happen when a speaker switches to a foreign language in the
middle of a segment. In this case, the foreign punctuation symbols should be within the foreign
language tags <lang:Foreign></lang:Foreign> described in Section 3.14.
Note: Some punctuation use is stylistic/subjective. Differences of opinion are not necessarily errors.
3.10. Acronyms and Initialisms
Acronyms refer to terms based on the initial letters of their various elements and are spoken as words.
They should be transcribed as words in upper case without white spaces or periods between the letters.
· "I work for NASA." = "I work for NASA."
· "AIDS has a great impact on society." = "AIDS has a great impact on society."
Initialisms refer to terms spoken as series of letters (e.g., IBM, IMDB, HTTP). Initialisms should be
written as upper case letters enclosed within the <initial> and </initial> tags.
· "I work for IBM." = "I work for <initial>IBM</initial>."
· "I like ZZ Top." = "I like <initial>ZZ</initial> Top."
· "http://www.amazon.com/" = "<initial>HTTP</initial> colon slash slash <initial>WWW</initial> dot
Amazon dot com."
Use periods only for initials standing for given names (e.g., E. B. White, George W. Bush). Otherwise, no
period is needed in initialisms.
· "George W Bush paints now" = "George <initial>W.</initial> Bush paints now."
Don't include plural markers (e.g., -s) or the possessive marker ('s) within the <initial></initial> tags.
· "Welcome to the Ordinary Wizarding Level Examinations. O. W. L.s. More commonly known as Owls." =
"Welcome to the Ordinary Wizarding Level Examinations. <initial>OWL</initial>s. More commonly
known as Owls."
· "George W's dog was a Scottish Terrier." = "George <initial>W.</initial>'s dog was a Scottish Terrier."
Initialisms are treated as words. So, don't break up an initialism with any tags and don't include any
other tags within the <initial></initial> tags.
· "I'll be taking my S (cough) AT next month." = "I'll be taking my [cough] <initial>SAT</initial> next
month."
Notes:
· #ah
· #er
· #hm
· #uh
· #um
Don't alter the spelling of filler words to reflect how the speaker pronounces the word. If the speaker
says a filler word that does not match any of the listed filler words, transcribe the filler word that is
closest in pronunciation.
Notes:
· Filler words are not to be confused with interjections. See Section 3.6 for guidelines on interjections.
· In particular, the filler word "#hm " is not to be confused with the interjection "hmm". Use context to
disambiguate the two different uses.
3.12.2. Media
For co-channel media audio files, when a foreground speaker (speaker of interest) is overlapping with
one or more background speakers, transcribe only the speech of the foreground speaker, and insert
the [bg-speech] tag at the the start of the overlapping background speech as described in Section
3.12.1.
When there is intelligible overlapping speech between two foreground speakers, transcribe the speech
of each overlapping speaker as separate speech segments. For details on creating speech segments for
transcription, see Section 2.1.
For each transcribed speaker, place the opening <overlap> tag at the start of the overlapping speech
and the closing </overlap> tag at the end of the overlapping speech. Enclose the necessary
punctuations within the overlap tags.
Don’t break up a word with the <overlap></overlap> tags (and initialisms are treated as words). If the
overlap begins in middle of a word, place the <overlap> tag before the word. If the overlap ends in the
middle of a word, place the </overlap> tag after the word. When a segment contains the opening
<overlap> tag, it must also contain the closing </overlap> tag.
Example:
Segment Start End Speaker Transcription Content
time time
1 3.49 17.867 host01 [music] It's, it's unbelievably scary, #uh, because, you
know, <overlap>you've got ((all these)) fights going
on.</overlap>
Notes:
· Don't transcribe overlapping speech between two or more background speakers (e.g., where speakers
are speaking behind a field report and his/her interviewee), even if it is intelligible.
· Don't transcribe overlapping speech between three or more foreground speakers, even if the
overlapping speech contains intelligible speech. In this case, label the segment as Overlap, and no
language code, speakerId, and transcription are needed.
· For applying the <overlap></overlap> tags in conjunctions with initialisms and non-target languages,
see Section 3.10 and Section 3.14 respectively.
1. Transcribe the word as it was pronounced using the respective standard orthography of each language.
2. Enclose both the root and the affix within the <lang:Foreign></lang:Foreign> tags.
Non-target language tags can be used in conjunctions with other markup tags (e.g. <initial></initial> and
<overlap></overlap>):
· "The story is set in Belarus after the collapse of the СССР (pronounced [ɛsɛsɛsɛr]), well that's USSR in
Russian." = "The story is set in Belarus after the collapse of the
<lang:Russian><initial>СССР</initial></lang:Russian>. Well, that's <initial>USSR</initial> in Russian."
· "I'll sometimes start a sentence in English y termino-(another foreground speaker begins talking)-
en español (end of segment)." = "I'll sometimes start a sentence in English <lang:Spanish>y termino
<overlap>en español</overlap></lang:Spanish>."
3.15. Non-Speech
3.15.1. Non-Speech Noises
Indicate the following non-speech noises in the transcription by inserting the following tags in square
brackets in the location where it occurs.
Tags Descriptions
[cry] Crying/sobbing
Non-speech/non-human noises
[applause] Clapping.
[prompt] IVR prompts or voice recordings commonly found at the beginning of calls.
Other noises
[bg- Speech in the background that overlaps with the speech of the foreground speaker.
speech]
[music] Music that is one or more seconds long without anyone speaking in the foreground.
This includes on-hold music, songs, or singing.
Note: Don't use this tag for music playing in the background while someone's
speaking.
[noise] Other miscellaneous noises not covered on the list above (e.g., screaming, raining,
punching, etc).
Don't insert a non-speech tag in the middle of a word. If a non-speech sound occurs in the middle of a
word, add the tag exactly before the word in which it occurred.
· "I will abso-(ring)-lutely open it" = "I will [ring] absolutely open it."
If a non-speech sound occurs repeatedly, represent it only once.
· "Wait … click click click click there" = "Wait [click] there."
3.15.2. Silence/Pauses
Despite your best effort to create tight segments as required by Section 2.1, a speech segment may still
contain long pauses and periods with no actual speech.
Use the [no-speech] tag to indicate pauses or silence of one or more seconds, even in cases when there
are some foreground noises mixed in with the pause.
· "They're not (pause) (breath) (pause) coming." = "They're not [no-speech] coming."
4. Metadata Labelling
In addition to segment labelling and speech transcription described in Sections 2 and 3, each transcribed
file should contain a set of required metadata labels. This section calls out some of the specific labelling
required.
4.1. Labelling the Transcribed File
4.1.1. File-level Values
For each transcribed file, the following file-level values (objects) must be provided:
File-level Description
Values
Domains A string (or a list of strings) that describes the domain(s) covered in the transcribed
file. We will provide the list of valid Domains to be used.
Topics A string (or a list of strings) that describes the topic(s) or scenario(s) covered in the
transcribed file. We will provide the list of valid Topics and or Scenarios to be used.
Primary The language_locale code of the single most frequently spoken language in the
Language transcribed file. We will provide the list of the valid language_locale codes to be
used. Contact u if you identify a variety in the file that is not on the provided list.
Primary A string that describes the specific variety of the Primary Language (e.g. "AAE",
Variety "Spanish-accented"). We will provide the list of valid Variety labels to be used.
Use "N/A" if it has not specified the variety for the primary Language.
Other A list of the language_locale codes fsor all the non-primary languages in the
Language(s) transcribed file. Use XX in place of the locale code for languages whose locales
cannot be confidently determined (e.g., en_XX = English from an unknown locale).
We will provide the list of the valid language_locale codes to be used. Contact us if
you identify a variety in the file that is not on the provided list.
Annotator ID A string that uniquely identifies the transcriptionist of the file. The AnnotatorID
must be consistent throughout the entire delivery.
Speaker ID A string that uniquely identifies the speaker. It should correspond to a Speaker ID that
has already been used in one or more segments.
Gender One of the three labels that specifies the gender of the speaker: Male, Female,
Unknown.
· Use the label that corresponds to the speaker's self-identification whenever that
information is available. Don’t override speaker’s self-identification. If the speaker's
self-identification is not available, it's OK to rely on your perception.
· Use Unknown whenever you cannot confidently determine the speaker's gender.
When Gender is Unknown, Gender Source below will always be AnnotatorIdentified.
Gender One of the two labels that describes how the gender label of the speaker was
Source assigned: SpeakerIdentified, AnnotatorIdentified.
Nativity One of the three labels that specifies the proficiency of the speaker on the primary
language specified for the data: Native, NonNative, Unknown.
· Use the label that corresponds to the speaker's self-identification if that information
is available. Don’t override speaker’s self-identification.
· If the speaker's self-identification is not available, it's OK to rely on your perception
while following these general rules of thumb:
·
· Native: Use this when the speaker speaks the primary language with no or a slight
foreign accent, and their speech contains little non-native grammatical features and
word choices. IMPORTANT: Note that speakers speaking with grammatical patterns or
an accent of a regional or ethnic dialect (e.g. Southern English, African American
English, or Chicano English in the US) should be labeled as Native.
· NonNative: Use this when the speaker speaks the primary language with a
discernible foreign accent, and their speech contains non-native grammatical features
and word choices.
· Use Unknown whenever you cannot confidently determine whether the speaker is a
native speaker of the primary language or not. When Nativity is Unknown, Nativity
Source below will always be AnnotatorIdentified.
Nativity One of the two labels that describes how the Nativity label of the speaker was
Source assigned: SpeakerIdentified, AnnotatorIdentified.
Languages A list of all the languages spoken by this speaker, including "Unknown". We will
provide the list of valid language_locale codes to be used. Contact us if you identify a
variety in the file that is not on the provided list.
5. Appendix A: The Complete Set of Non-Speech Tags and Other Markup Tags
The section lists all the non-speech tags and other markup tags introduced in the Transcription
Conventions section for ease of reference. See the Transcription Conventions section for the exact use
case and example(s) of each tag.
Markup tags
<initial></initial>
<lang:Foreign></lang:Foreign>
<lang:X></lang:X>
where X can be replaced by any commonly accepted language names with the first letter capitalized
(e.g., Arabic, Korean, Spanish)
<overlap></overlap>
Noise tags
[applause]
[beep]
[bg-speech]
[breath]
[click]
[cough]
[cry]
[dtmf]
[laugh]
[lipsmack]
[music]
[no-speech]
[noise]
[prompt]
[ring]
[sta]
· The Licensor should provide generic Metadata that is useful for us in the following format:
JSON Schema