Information

How to cite

This website has been described in Witte, et al. (2021). As the website is built around the methods and materials developed in Witte and Köbler (2019), citing that source would also be a good idea.

Word metric calculator settings

Temporary spelling changes

When calculating orthographic transparency, a set of temporary spelling changes may be used. For example, a digit such as 3 may be temporary replaced with a letter repressentation such as "tre". In order to use such temporary replacement, simply add the the desired replacements in the appropriate text area on the calculation page. The format of the replacements should have one replacement per row, on which the original spelling and the desired replacement are separated by a tab character.

Optional checking and correction of manually enterred phonetic transcriptions

Manually entered phonetic transcriptions can be checked auotomatically by the website. This checking will occur prior to the word-metric calculations, and any transcription errors detected will be displayed in an error-messages box on the results page.

The AFC-list

The AFC-list is a Word metric database for the Swedish language published in Witte and Köbler (2019) under a CC-BY 4.0 license. The AFC-list contains word spellings, phonetic transcriptions, word frequencies, as well as quite a few additional types of data arguably important for the perception of speech, for a total of 816 404 Swedish words.

Column descriptions

The list below contains short descriptions of each column heading used on the SWM website. For further description of the word metrics, see Witte and Köbler (2019) and Witte et al. (2021).

OrthographicForm: The orthographic form / spelling of the word
PhoneticForm: The standard phonetic transcription, according to the convention described in Witte and Köbler (2019)
PhonotacticType: The phonotactic type
SyllableCount: The number of syllables in the phonetic transcription
PhoneCount: The number of phonetic segments in the phonetic transcription
ZipfValue: The Zipf-scale value
RawWordTypeFrequency: The total number of occurrences of the spelling in the internet blog corpora in Witte and Köbler (2019)
RawDocumentCount: The total number of different internet blogs in which the word occurs, based on the internet blog corpora in Witte and Köbler (2019)
PLD1WordCount: The number of phonetic neighbors
OLD1WordCount: The number of orthographic neighbors
PLDx_Average: The average edit distance to the x number of closest phonetic neighbours in the AFC-list (Cf. Yarkoni, Balota, & Yap, 2008)
OLDx_Average: The average edit distance to the x number of closest orthographic neighbours in the AFC-list (Cf. Yarkoni, Balota, & Yap, 2008)
PNDP: The Zipf-scale weighted phonetic neighborhood density probability
ONDP: The Zipf-scale weighted orthographic neighborhood density probability
GIL2P_OT_Average: The average grapheme-initial letter-to-pronunciation orthographic transparency
GIL2P_OT_Min: The minimum grapheme-initial letter-to-pronunciation orthographic transparency
PIP2G_OT_Average: The average pronunciation-initial phone-to-grapheme orthographic transparency
PIP2G_OT_Min: The minimum pronunciation-initial phone-to-grapheme orthographic transparency
G2P_OT_Average: The average grapheme-to-pronunciation orthographic transparency
SSPP_Average: The average normalized stress and syllable based phonotactic probability
SSPP_Min: The minimum normalized stress and syllable based phonotactic probability
PSP_Sum: The summed positional segment probability
PSBP_Sum: The summed position specific bi-phone probability
S_PSP_Average: The average standardized positional segment probability
S_PSBP_Average: The average standardized position specific bi-phone probability
OrthographicIsolationPoint: The (zero-based) index of the phone at which a particular word can be uniquely discriminated from all other words in the AFC-list
PhoneticIsolationPoint: The (zero-based) index of the letter at which a particular word can be uniquely discriminated from all other words in the AFC-list (Cf. The COHORT model of speech perception, Cf. Marslen-Wilson & Welsh, 1978)
LetterCount: The number of letters in the orthographic form
GraphemeCount: The number of graphemes in the sonograph array
DiGraphCount: The number of two-letter graphemes in the sonograph array
TriGraphCount: The number of three-letter graphemes in the sonograph array
LongGraphemesCount: The number of graphemes longer than three letters in the sonograph array
SpecialCharacter: The existence of special characters in the orthographic form
UpperCase: The proportion of times the word begins with an upper case letter
MostCommonPoS: The most common word-class assignment in the internet blog corpora in Witte and Köbler (2019)
MostCommonLemma: The most common lemma assignment in the internet blog corpora in Witte and Köbler (2019)
ForeignWord: The foreign word marking
Abbreviation: The abbreviation marking
Acronym: The acronym marking
HomographCount: The number of homographs
HomophoneCount: The number of homophones
NumberOfSenses: The total number of senses of all lemmas as described in Witte and Köbler (2019)
ReducedTranscription: The reduced phonetic transcription, as defined in Witte and Köbler (2019)
TemporarySyllabification: The phonetic transcription re-syllabified by the syllabification tool used for calculating SSPP.
PLD1Transcriptions: The reduced phonetic transcriptions and raw word frequencies (delimited by colons) of phonetic neighbors (delimited by vertical lines), sorted according to word frequency. The first word is the look-up word.
OLD1Spellings: The orthographic forms and raw word frequencies (delimited by commas) of orthographic neighbors (delimited by vertical lines), sorted according to word frequency. The first word is the look-up word.
PLDx_Neighbors: The x number of closest phonetic neighbours in the AFC-list (Cf. Yarkoni, Balota, & Yap, 2008)
OLDx_Neighbors: The x number of closest orthographic neighbours in the AFC-list (Cf. Yarkoni, Balota, & Yap, 2008)
Sonographs: The sonographs (as defined in Witte and Köbler, 2019)
Homographs: The reduced phonetic transcriptions of homographs, as defined in Witte and Köbler (2019)
Homophones: All homophone spellings
AllPoS: The word-class assignments, and their relative distributions in the internet blog corpora in Witte and Köbler (2019)
AllLemmas: The lemma assignments, and their relative distributions in the internet blog corpora in Witte and Köbler (2019)
SSPP: The normalized stress and syllable based phonotactic probability for each phoneme combination
PSP: The positional segment probability for each phoneme
PSBP: The position specific bi-phone probability for each bi-phone
S_PSP: The standardized positional segment probability each phoneme
S_PSBP: The standardized position specific bi-phone probability for each bi-phone
GIL2P_OT: The grapheme-initial letter-to-pronunciation orthographic transparency of each sonograph
PIP2G_OT: The pronunciation-initial phone-to-grapheme orthographic transparency of each sonograph
G2P_OT: The grapheme-to-pronunciation orthographic transparency of each sonograph
Tone: The pitch accent
MainStressSyllable: The primary stressed syllable (1-based index)
SecondaryStressSyllable: The secondary stressed syllable (1-based index)
PhoneCountZero: The number of phones when empty word-initial syllable onsets and word-final codas are counted as phones
PossiblePoSCount: The number of possible word classes
PossibleLemmaCount: The number of possible lemmas
ManualEvaluations: A field that may specify detailed comments concerning the word-metric calculations
ManualEvaluationsCount: The number of comments stored for each word in ManualEvaluations
IPA: A phonetic transcription without syllable boundaries

Phonetic transcription convention

This website uses the AFC-list Phonetic transcription convention described in Witte and Köbler (2019).

Basically, the phonetic transcriptions need to be in the IPA format and adhere to the following principles:

The transcription should be surrounded by square brackets e.g. [ˈ uː ɖ].
Transcriptions should be phonetic rather than phonemic.
Phonetic length should only be used in stressed syllables.
Syllable boundary marks need to be added between syllables, however exactly correct syllabification is not vital.
Items within the phonetic transcriptions (except for phonetic length markings) should be separated from each other by a blank space.
Only phonetic characters displayed in the box below should be used.

_ ! . * ˈ ˌ ∅ ¤ ² a a͡u a͡uː ɑ ɑː b bː ɕ d dː ɖ ɖː e e̞ eː e͡ʉ e͡ʉː ə ɛ ɛ̝ ɛː f fː ɡ ɡː h ɧ ɧː i iː ɪ ʝ ʝː k kː l lː ɭ ɭː m mː ɱ ɱː n nː ɳ ɳː ŋ ŋː o oː ɔ ɵ p pː r rː s sː ʂ ʂː t tː ʈ ʈː u uː ʉ ʉ̞ ʉː ʊ v vː y yː ʏ ø ø̞ øː

Sample words (spellings / transcriptions):

Some sample spelling and transcription combinations are presented in the box below.

aktievinster 	[² a kː t . s ɪ . e̞ . v ˌ ɪ nː . s t e̞ r]
aladdin 	[ˈ a lː . a . d ɪ n]
algebraisk 	[a l . ɡ e̞ . b r ˈ ɑː . ɪ s k]
alkvetterns 	[² a lː . k v ˌ ɛ̝ tː . e̞ ɳ ʂ]
allmänbeblandelse 	[² a lː . m ɛ̝ m . b e̞ . b l ˌ a nː . d e̞ l . s e̞]
allvarliges 	[² a lː . v ˌ ɑː . ɭ ɪ . ɡ e̞ s]
alstringskraftig 	[² a lː . s t r ɪ ŋ s . k r ˌ a fː . t ɪ ɡ]
alvbottinas 	[² a lː v . b ˌ ɔ tː . ɪ . n a s]
ambivalensers 	[a m . b ɪ . v a . l ˈ ɛ̝ nː . s e̞ ʂ]
amoklöpandes 	[a . m ² uː k . l ˌ øː . p a n . d e̞ s]

References

Marslen-Wilson, W. D., & Welsh, A. (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 10(1), 29-63. doi:10.1016/0010-0285(78)90018-X
Witte, E., & Köbler, S. (2019). Linguistic Materials and Metrics for the Creation of Well-Controlled Swedish Speech Perception Tests. Journal of Speech, Language, and Hearing Research : JSLHR, 62(7), 2280-2294. doi:10.1044/2019_JSLHR-S-18-0454
Witte, E., Edlund, J., Jönsson, A., & Danielsson, H. (2021). Swedish Word Metrics: A Swe-Clarin resource for psycholinguistic research in the Swedish language. Paper presented at the CLARIN Annual Conference 2021.
Yarkoni, T., Balota, D., & Yap, M. (2008). Moving beyond Coltheart's N: a new measure of orthographic similarity. Psychonomic Bulletin & Review, 15(5), 971-979. doi:10.3758/PBR.15.5.971