Sie sind auf Seite 1von 14

Loudness level One of the three loudness levels: Loud, Normal, or Quiet.

Use "Normal" if not


known.

Primary Sound One of the five primary types: Speech, Babble, Overlap, Music, Noise.
Type

2.3.2. Speech Segments Only


Additionally, for Speech segments only, the following objects must be present and filled:
Segment Description
Object

Language The language_locale code of each of the languages spoken in the segment.
Use "Unknown" for any language variety that you cannot confidently identify. Use
XX in place of the locale code if you can identify the language but you cannot
confidently determine the locale (e.g., en_XX = English from an unknown locale).
We will provide the list of valid language_locale codes to be used. Contact us if you
identify a variety in the file that is not on the provided list.

Speaker ID A string that uniquely identifies the speaker. The Speaker ID must be consistent
throughout the entire file.

Transcription Transcription of the speech signals, following the Transcription Conventions


Data in Section 3.

3. Transcription Conventions
Transcription should represent all words as spoken – including hesitations, filler words, false starts, and
other verbal tics.
3.1. Characters and Special Symbols
Transcription should include only upper and lowercase letters, apostrophes, commas, exclamation
points, hyphens, periods, question marks, spaces, and a limited set of special mark-up symbols.
Don't use numerals (e.g., 1, IV) and special symbols (e.g., $, +, @) to transcribe spoken words.
· "I have like $0" = "I have like zero dollars."
· "It was great/weird" = "It was great slash weird."
· "6 + 6 = 12." = "six plus six equals twelve."
· "My email is m-golden@gmail.com" = "My email is M dash golden at gmail dot com."
Below is the set of special mark-up symbols used in the transcription to indicate certain features or
events within an audio file (e.g., unintelligible speech, code-mixing). Do not use these symbols for any
reason other than as mark-up language.
Symbol(s) Name Use

<> Angle brackets Around opening and closing tags e.g., <initial>.
: Colon In conjunction with angle brackets and slash for non-target
language tag e.g., <lang:Foreign></lang:Foreign>.

(()) Double Around unintelligible speech or overlapping speech of three or


parentheses more speakers.

# Hashtag In front of filler words (aka, filled pauses).

/ Slash In conjunction with angle brackets for closing markup tags e.g.,
</initial>.

[] Square Around non-speech tags such [cough].


brackets

~ Tilde To indicate truncated speech.


3.2. Spelling and Grammar
Use standard orthography rather than phonetic spelling to transcribe what the speaker says.
3.2.1. Dialectal Pronunciations
Transcribe dialectal pronunciations using the spellings of the "standard" forms, unless such dialectal
pronunciations are codified in an accepted written version of the dialect.
· "Issall well n' good darlin'." = "It's all well and good darling."
· "I'm from the wes' side." = "I'm from the west side."

3.2.2. Mispronounced Words


Transcribe mispronunciations using the standard spelling.

· "Call your representive." = "Call your representative."

3.2.3. Non-Standard Usage


Transcribe a speaker's utterances verbatim, even in cases when the speaker's utterances do not
conform to the standard grammar of the language. Do not correct grammatical "mistakes" or variations
made by the speaker.
· "He been done work." = "He been done work."
· "We be playing basketball after work." = "We be playing basketball after work."
The same goes for non-standard or unexpected word choice. Transcribe the words as they are spoken,
not as what is expected.
· "The volcano said: I lava you." = "The volcano said I lava you."
Spell-check all transcription files after transcription is complete. When in doubt about the spelling of a
word or name, consult the American Heritage Dictionary: https://ahdictionary.com/. To reference the
names of song titles, movies, TV shows, brands, etc. use http://amazon.com/ or, if
necessary, http://google.com/.
3.3. Capitalization
Transcription should follow the accepted capitalization patterns. For example, capitalize the first word of
a sentence, proper names (e.g., Jeff Bezos, France, iPad, eBay), acronyms (e.g., POTUS), initialisms (e.g.,
IMB), and so on.
· "I want to visit Oregon" = "I want to visit Oregon."
· "I work at NASA" = "I work at NASA."
· "I'm going to Mexico on Thursday" = "I'm going to Mexico on Thursday."
3.4. Abbreviations
Do not introduce abbreviations in the transcription. Always spell out the full word when pronounced as
such.
· "He's 6 ft 2!" = "He's six foot two."
· "Talk to Doctor Smith immediately." = "Talk to Doctor Smith immediately."

Use an abbreviation only if the speaker explicitly pronounces the word as abbreviated. Don't add a
period after an abbreviated word (unless it appears at the end of a sentence).
· "I live in Cambridge, Mass." = "I live in Cambridge, Mass."
· "Billie Jean King went to Cal State." = "Billie Jean King went to Cal State."
The titles Ms, Mrs, Mr, and Mx that prefix a person's name are considered words in their own right, not
abbreviations. When used as titles, transcribe them as Ms, Mrs, Mr, and Mx. When used as direct
addresses (without a following name), transcribe them as spelled-out forms (e.g., mister or missus).
· "Mr. Smith this way please." = "Mr. Smith, this way please."
· "Hey mister can you help me with this survey?" = "Hey, mister, can you help me with this survey?"
3.5. Contractions
Standard contractions must be transcribed as they are pronounced (e.g., isn't, where's, y'all). Include the
apostrophe in the spelling.
Transcribe the following contractions as a single word:
· gimme
· gonna
· gotta
· lemme
· wanna
· watcha
· kinda
3.6. Interjections
Interjections are words or expressions that speakers use within an utterance to express affirmation,
surprise, or negation. Each language has its own specific set of interjections that speakers can use. When
transcribing interjections, use language-specific standardized spellings. Interjections do not require any
special mark-up symbols.
For English, we transcribe only the following interjections:
· eee · mm · uh-oh
· ew · mhm · whoa
· huh · nah · whew
· hmm · oh · yay
· jeez · uh-huh · yep

Notes:

· Interjections are not to be confused with filler words. See Section 3.11.2 for guidelines on filler words.
· In particularly, the interjection "hmm" is not to be confused with the filler word "#hm". Use context to
disambiguate the two different uses.
3.7. Individual Spoken Letters
Transcribe individual spoken letters as capital letters, separated by a space.
· "My name is John – jay, oh, eich, en". = "My name is John J O H N."
This does not apply to initialisms (e.g., IBM, FBI). More on transcribing initialism to follow in Section
3.10.
3.8. Numbers
Spell out numbers in full, not with numerals, according to how the speaker says them. This applies to
both cardinal (e.g., 0, 215) and ordinal numbers (e.g., 1st, 5th).
· "5" = "five"
· "5 " = "fifth"
th

· "306" = "three hundred and six", "three oh six", or "three zero six", depending on how it was
pronounced.
· "Play radio 109.4 FM" = "play radio one oh nine point four <initial>FM</initial>"
· "Beverly Hills, 90210" = "Beverly Hills nine oh two one oh"
When spelling out numbers, use hyphens as required by the rules of the language. In English, numbers
from twenty-one through ninety-nine are spelled with hyphens. Others are not hyphenated.
· "twenty-five"
· "three hundred"
· "five hundred fifty-two"
· "nineteen forty-five"
3.9. Punctuation
Only apostrophes, commas, exclamation points, hyphens, periods, question marks should be used as
punctuation marks. Don't use any other English punctuations (e.g., semi-colons, and quotation marks).
Use these punctuations as required by the grammar rules.
End Punctuations

Periods Use a period only at the end of a complete sentence that is a statement.

· That city is safe.

Question Use a question mark only after a direct question or a tag question.
Marks
· Isn't that simple?
· You know the answer, don't you?

Exclamation Use an exclamation point at the end of a sentence when you feel or hear an
Points emphatic stress or intonation. An exclamation point usually marks an outcry or an
emphatic or ironic comment.
· That's the biggest pumpkin I have ever seen!
· When will I ever learn!

Sentence-Internal Punctuation
Commas Use commas to break up long stretches of speech. This is to facilitate reader
comprehension. Below are some suggestions of when a comma should be used:
· To separate items in a list of three or more, using the serial (aka Oxford) comma
(i.e., the comma before the conjunction that joins the last two elements:
· I enjoy skydiving, snowboarding, and mountain biking.
· To set off a direct address:
· Maryam, listen to me carefully.
· I'm not calling you, my friends, just to whine about my life.
· To break up compound and complex sentences:
· I would like to join you, but I'm afraid I have class at that time.
· Marcos and I couldn't go to the jazz concert, so we watched it on TV instead.
· To set off introductory words and phrases:
· Therefore, they cancelled their trip.
· After taking a break, the team resumed their meeting.
· Around parenthetical phrases:
· That report on the New York Times was, to say the least, a bombshell.
· Getting a hotel by the sea, like the one we stayed last year, would be superb.

Word-Internal Punctuations

Apostrophes Use apostrophes in contractions, possessives of individual letters, possessive "s",


or as part of a person's name.
· "That's where it's at" = "That's where it's at."
· "Project Q's timeline" = "Project Q's timeline."
· "Sinead O'Connor" = "Sinead O'Connor."
· "Eleven o'clock" = "Eleven o'clock."
· "Read Jess' email" = "Read Jess' email."

Hyphens Use hyphens according to standard orthographic rules of the language. If it is not
clear if a compound word should be spelled with a hyphen or not, Reference the
American Heritage Dictionary as a reference.
Here are a few examples of English compound words that can (or sometimes
must) use hyphens:
· a-line
· d-day
· ex-boyfriend, ex-drummer
· extra-loud
· self-aware
· t-shirt
· u-turn
· v-neck
· x-ray
For product names, only use hyphens if they are parts of the official product
names.
· "Let's go to Chick-fil-A" = "Let's go to Chik-fil-A."
For hyphens in numbers, see Section 3.8.

When transcribing a language other than English, use punctuation symbols and rules that are
appropriate for that language. This could happen when a speaker switches to a foreign language in the
middle of a segment. In this case, the foreign punctuation symbols should be within the foreign
language tags <lang:Foreign></lang:Foreign> described in Section 3.14.

· Hey, y'all. <lang:Spanish>¡Hola! ¿Cómo estás?</lang:Spanish> Sorry I'm late.

Note: Some punctuation use is stylistic/subjective. Differences of opinion are not necessarily errors.
3.10. Acronyms and Initialisms
Acronyms refer to terms based on the initial letters of their various elements and are spoken as words.
They should be transcribed as words in upper case without white spaces or periods between the letters.
· "I work for NASA." = "I work for NASA."
· "AIDS has a great impact on society." = "AIDS has a great impact on society."
Initialisms refer to terms spoken as series of letters (e.g., IBM, IMDB, HTTP). Initialisms should be
written as upper case letters enclosed within the <initial> and </initial> tags.
· "I work for IBM." = "I work for <initial>IBM</initial>."
· "I like ZZ Top." = "I like <initial>ZZ</initial> Top."
· "http://www.amazon.com/" = "<initial>HTTP</initial> colon slash slash <initial>WWW</initial> dot
Amazon dot com."
Use periods only for initials standing for given names (e.g., E. B. White, George W. Bush). Otherwise, no
period is needed in initialisms.
· "George W Bush paints now" = "George <initial>W.</initial> Bush paints now."
Don't include plural markers (e.g., -s) or the possessive marker ('s) within the <initial></initial> tags.
· "Welcome to the Ordinary Wizarding Level Examinations. O. W. L.s. More commonly known as Owls." =
"Welcome to the Ordinary Wizarding Level Examinations. <initial>OWL</initial>s. More commonly
known as Owls."
· "George W's dog was a Scottish Terrier." = "George <initial>W.</initial>'s dog was a Scottish Terrier."
Initialisms are treated as words. So, don't break up an initialism with any tags and don't include any
other tags within the <initial></initial> tags.

· "I'll be taking my S (cough) AT next month." = "I'll be taking my [cough] <initial>SAT</initial> next
month."

Notes:

· The word "OK"/ "okay " is always transcribed as "okay. "


· Spoken individual letters (e.g., proper names that are spelled out) are not initialisms and don't require
the <initial></initial> tags. See Section 3.7 for an example.
· For transcribing initialisms in a non-target language, see Section 3.14.
3.11. Disfluent Speech
Disfluent speech refer to any interruption of the normal flow of speech. Speakers may stumble over
their words, repeat themselves, utter truncated words, restart phrases or sentences, and use hesitation
sounds (i.e. filler words).
3.11.1. Stumbled Speech, Repetitions, and Truncated Words
Make your best effort to transcribe stumbled speech and repetitions according to what you hear after
listening to the segment a few times.
· "Directions to the… to the… the hotel" = "Directions to the to the the hotel."
Use tildes to indicate truncated words, whether at the beginning or the end.
· "Ale… alexa … stop the mu… the music." = "Ale~ Alexa, stop the mu~ the music."
· "...lexa play Janet Jackson… no wait…" = "~lexa, play Janet Jackson. No, wait."
· "N… n… no. It's Ch… Chom… Chomsky who said that." = "N~ n~ no. It’s Ch~ Chom~ Chomsky who said
that."

3.11.2. Filler Words


Filler words are "words" that speakers use to indicate hesitation or fill a pause in order to maintain
control of a conversation while thinking of what to say next.
Each language has a limited set of filler words that speakers can use. For English, transcribe only the
following fillers, preceded by the hashtag:

· #ah
· #er
· #hm
· #uh
· #um

Don't alter the spelling of filler words to reflect how the speaker pronounces the word. If the speaker
says a filler word that does not match any of the listed filler words, transcribe the filler word that is
closest in pronunciation.
Notes:

· Filler words are not to be confused with interjections. See Section 3.6 for guidelines on interjections.
· In particular, the filler word "#hm " is not to be confused with the interjection "hmm". Use context to
disambiguate the two different uses.

3.12. Overlapping Speech


3.12.1. Conversational Telephony
For split-channel audio files of conversation telephony where there is only one foreground speaker
(speaker of interest) in each channel, transcribe only the speech of the foreground
speaker. Don't transcribe overlapping speech in the background (e.g., where people nearby or in the
same room are speaking), even if it is intelligible.
When transcribing the foreground speaker, insert the [bg-speech] tag at the start of the overlapping
background speech. If the overlapping background speech spans multiple segments, insert the [bg-
speech] tag in each segment that contains background speech. Don’t break up a word with the [bg-
speech] tag. If the overlapping background speech begins in the middle of the word, place the [bg-
speech] tag before the word.
· "You're definitely a Raven-(speech from an interferer)-claw." = "You're definitely a [bg-
speech] Ravenclaw."

3.12.2. Media
For co-channel media audio files, when a foreground speaker (speaker of interest) is overlapping with
one or more background speakers, transcribe only the speech of the foreground speaker, and insert
the [bg-speech] tag at the the start of the overlapping background speech as described in Section
3.12.1.
When there is intelligible overlapping speech between two foreground speakers, transcribe the speech
of each overlapping speaker as separate speech segments. For details on creating speech segments for
transcription, see Section 2.1.
For each transcribed speaker, place the opening <overlap> tag at the start of the overlapping speech
and the closing </overlap> tag at the end of the overlapping speech. Enclose the necessary
punctuations within the overlap tags.
Don’t break up a word with the <overlap></overlap> tags (and initialisms are treated as words). If the
overlap begins in middle of a word, place the <overlap> tag before the word. If the overlap ends in the
middle of a word, place the </overlap> tag after the word. When a segment contains the opening
<overlap> tag, it must also contain the closing </overlap> tag.
Example:
Segment Start End Speaker Transcription Content
time time

1 3.49 17.867 host01 [music] It's, it's unbelievably scary, #uh, because, you
know, <overlap>you've got ((all these)) fights going
on.</overlap>

2 3.49 17.867 guest01 [music] [no-speech] <overlap>(())</overlap> [no-


speech]

Notes:

· Don't transcribe overlapping speech between two or more background speakers (e.g., where speakers
are speaking behind a field report and his/her interviewee), even if it is intelligible.
· Don't transcribe overlapping speech between three or more foreground speakers, even if the
overlapping speech contains intelligible speech. In this case, label the segment as Overlap, and no
language code, speakerId, and transcription are needed.
· For applying the <overlap></overlap> tags in conjunctions with initialisms and non-target languages,
see Section 3.10 and Section 3.14 respectively.

3.13. Unintelligible Speech


Use double parentheses (()) to mark stretches of speech that is difficult or impossible to understand or
transcribe (such as when a speaker is speaking too softly or when a speaker is speaking over another
foreground speaker). There should be a space before and after the double parentheses, but not within
the parentheses themselves.
· "Alexa play ???? on spotify." = "Alexa, play (()) on Spotify."
If the transcriptionist has a guess about the speaker's words, transcribe what they think they hear within
the double parentheses.
· "Alexa read ????? from audible." = "Alexa, read ((Cat In The Hat)) from Audible."
· "Alexa turn the ????" = "Alexa, turn the ((lights off))."
3.14. Non-Target Languages
When a speaker switches to a language other than English, place the tag <lang:Foreign> at the location
when the switch between languages begins and </lang:Foreign> when the switch ends. When a
segment contains the opening <lang:Foreign> tag, it must also contain the closing </lang:Foreign> tag.
If the transcriptionist can unambiguously identify the non-target language, replace "Foreign" with the
language name in the tags. Capitalize the first letter of the language name.
Transcribe the speech of the non-target language, using the standard orthography of the non-target
language, if the transcriptionist understands the language. Otherwise, transcribe the non-target
language as (()).
· "You have to finish todo esto, porque. I have other things to do." = "You have to finish
<lang:Spanish>todo esto, porque</lang:Spanish>. I have other things to do."
· "I'd like to tell her que ya no la quiero." = "I'd like to tell her <lang:Foreign>(())</lang:Foreign>."
Words of non-target language origin adopted into common use in the target language (i.e. loanwords)
should be transcribed using the standard orthography of the target language. Don't use the
<lang:Foreign></lang:Foreign> tags around loanwords that have been grammaticalized and fully
adopted into common use in English. If it is unclear whether a word is a loanword or not, consult a
dictionary like the American Heritage Dictionary: https://www.ahdictionary.com/. A word that is listed
in the dictionary is a strong ground to consider it an established loanword, even if it is of foreign origin.
· "There was a tsunami in Indonesia." = "There was a tsunami in Indonesia."
· "Alexa… recipe for tacos" = "Alexa, recipe for tacos."
· "Remind me to spritz the flowers at eight." = "Remind me to spritz the flowers at eight."
Don't break up a word with the foreign language tags. This is rare in English, but in cases where a
speaker mixes languages within a single word, such as having the root word in the non-target language
but the affix in the target language:

1. Transcribe the word as it was pronounced using the respective standard orthography of each language.
2. Enclose both the root and the affix within the <lang:Foreign></lang:Foreign> tags.

Non-target language tags can be used in conjunctions with other markup tags (e.g. <initial></initial> and
<overlap></overlap>):

· "The story is set in Belarus after the collapse of the СССР (pronounced [ɛsɛsɛsɛr]), well that's USSR in
Russian." = "The story is set in Belarus after the collapse of the
<lang:Russian><initial>СССР</initial></lang:Russian>. Well, that's <initial>USSR</initial> in Russian."
· "I'll sometimes start a sentence in English y termino-(another foreground speaker begins talking)-
en español (end of segment)." = "I'll sometimes start a sentence in English <lang:Spanish>y termino
<overlap>en español</overlap></lang:Spanish>."

3.15. Non-Speech
3.15.1. Non-Speech Noises
Indicate the following non-speech noises in the transcription by inserting the following tags in square
brackets in the location where it occurs.
Tags Descriptions

Human vocal noises

[breath] Inhalation and exhalation between words, yawning

[cough] Coughing, throat clearing, sneezing

[cry] Crying/sobbing

[laugh] Laughing, chuckling

[lipsmack] Lipsmacks, tongue-clicks

Non-speech/non-human noises

[applause] Clapping.

[beep] The beep sound that replaces profanity or classified information.

[click] Machine or phone click.

[dtmf] Noise made by pressing a telephone keypad.

[ring] Telephone ring.

[sta] Continuous static.

[prompt] IVR prompts or voice recordings commonly found at the beginning of calls.

Other noises

[bg- Speech in the background that overlaps with the speech of the foreground speaker.
speech]

[music] Music that is one or more seconds long without anyone speaking in the foreground.
This includes on-hold music, songs, or singing.
Note: Don't use this tag for music playing in the background while someone's
speaking.
[noise] Other miscellaneous noises not covered on the list above (e.g., screaming, raining,
punching, etc).

Don't insert a non-speech tag in the middle of a word. If a non-speech sound occurs in the middle of a
word, add the tag exactly before the word in which it occurred.
· "I will abso-(ring)-lutely open it" = "I will [ring] absolutely open it."
If a non-speech sound occurs repeatedly, represent it only once.
· "Wait … click click click click there" = "Wait [click] there."

3.15.2. Silence/Pauses
Despite your best effort to create tight segments as required by Section 2.1, a speech segment may still
contain long pauses and periods with no actual speech.
Use the [no-speech] tag to indicate pauses or silence of one or more seconds, even in cases when there
are some foreground noises mixed in with the pause.
· "They're not (pause) (breath) (pause) coming." = "They're not [no-speech] coming."
4. Metadata Labelling
In addition to segment labelling and speech transcription described in Sections 2 and 3, each transcribed
file should contain a set of required metadata labels. This section calls out some of the specific labelling
required.
4.1. Labelling the Transcribed File
4.1.1. File-level Values
For each transcribed file, the following file-level values (objects) must be provided:
File-level Description
Values

Domains A string (or a list of strings) that describes the domain(s) covered in the transcribed
file. We will provide the list of valid Domains to be used.

Topics A string (or a list of strings) that describes the topic(s) or scenario(s) covered in the
transcribed file. We will provide the list of valid Topics and or Scenarios to be used.

Primary The language_locale code of the single most frequently spoken language in the
Language transcribed file. We will provide the list of the valid language_locale codes to be
used. Contact u if you identify a variety in the file that is not on the provided list.

Primary A string that describes the specific variety of the Primary Language (e.g. "AAE",
Variety "Spanish-accented"). We will provide the list of valid Variety labels to be used.
Use "N/A" if it has not specified the variety for the primary Language.

Other A list of the language_locale codes fsor all the non-primary languages in the
Language(s) transcribed file. Use XX in place of the locale code for languages whose locales
cannot be confidently determined (e.g., en_XX = English from an unknown locale).
We will provide the list of the valid language_locale codes to be used. Contact us if
you identify a variety in the file that is not on the provided list.

4.1.2. Annotator Information


For each transcribed file, the following annotator information must be provided:
Annotator Description
Info

Annotator ID A string that uniquely identifies the transcriptionist of the file. The AnnotatorID
must be consistent throughout the entire delivery.

4.2. Labelling Speakers in the Transcribed File


For each speaker whose speech has been transcribed, the following speaker information (objects) must
be provided:
Speaker Description
Object

Speaker ID A string that uniquely identifies the speaker. It should correspond to a Speaker ID that
has already been used in one or more segments.

Gender One of the three labels that specifies the gender of the speaker: Male, Female,
Unknown.

· Use the label that corresponds to the speaker's self-identification whenever that
information is available. Don’t override speaker’s self-identification. If the speaker's
self-identification is not available, it's OK to rely on your perception.
· Use Unknown whenever you cannot confidently determine the speaker's gender.
When Gender is Unknown, Gender Source below will always be AnnotatorIdentified.

Gender One of the two labels that describes how the gender label of the speaker was
Source assigned: SpeakerIdentified, AnnotatorIdentified.

Nativity One of the three labels that specifies the proficiency of the speaker on the primary
language specified for the data: Native, NonNative, Unknown.

· Use the label that corresponds to the speaker's self-identification if that information
is available. Don’t override speaker’s self-identification.
· If the speaker's self-identification is not available, it's OK to rely on your perception
while following these general rules of thumb:
·
· Native: Use this when the speaker speaks the primary language with no or a slight
foreign accent, and their speech contains little non-native grammatical features and
word choices. IMPORTANT: Note that speakers speaking with grammatical patterns or
an accent of a regional or ethnic dialect (e.g. Southern English, African American
English, or Chicano English in the US) should be labeled as Native.
· NonNative: Use this when the speaker speaks the primary language with a
discernible foreign accent, and their speech contains non-native grammatical features
and word choices.
· Use Unknown whenever you cannot confidently determine whether the speaker is a
native speaker of the primary language or not. When Nativity is Unknown, Nativity
Source below will always be AnnotatorIdentified.

Nativity One of the two labels that describes how the Nativity label of the speaker was
Source assigned: SpeakerIdentified, AnnotatorIdentified.

Languages A list of all the languages spoken by this speaker, including "Unknown". We will
provide the list of valid language_locale codes to be used. Contact us if you identify a
variety in the file that is not on the provided list.

5. Appendix A: The Complete Set of Non-Speech Tags and Other Markup Tags
The section lists all the non-speech tags and other markup tags introduced in the Transcription
Conventions section for ease of reference. See the Transcription Conventions section for the exact use
case and example(s) of each tag.
Markup tags

<initial></initial>

<lang:Foreign></lang:Foreign>

<lang:X></lang:X>
where X can be replaced by any commonly accepted language names with the first letter capitalized
(e.g., Arabic, Korean, Spanish)

<overlap></overlap>

Noise tags

[applause]

[beep]
[bg-speech]

[breath]

[click]

[cough]

[cry]

[dtmf]

[laugh]

[lipsmack]

[music]

[no-speech]

[noise]

[prompt]

[ring]

[sta]

· The Licensor should provide generic Metadata that is useful for us in the following format:
JSON Schema

Das könnte Ihnen auch gefallen