Introduction and Rationale
Linguistics is a discipline that is reaping large advances from computational methods, computer processing and statistical models. Although numerous individual languages' phonologies and phonetics have been described, the majority still reside in widely inaccessible formats, including paper, proprietary software programs, antique hardware, or inoperable encodings. To date there is no central repository for the sounds from all known languages that includes theoretical models of distinctive feature sets.
Here, we are developing the Phonetics Information Base and Lexicon (PHOIBLE), a typological phonological database to encompass the feature sets and sound systems from all known languages for which resources can be discovered.
Linguistic information has been been collected from authoritative resources and dynamically integrated into PHOIBLE throughout the duration of the project. Moreover, many of the languages that have been added represent understudied languages with electronically inaccessible resources, such as paper grammars from the 18th, 19th and 20th centuries.
The current database includes 200 languages' phonemic and corresponding allophonic inventories from the Stanford Phonology Archive (Crothers et al, 1979), 451 languages' phonemic inventories from UPSID (Maddieson 1984, Maddieson & Precoda 1990), 200 African languages' phonemic and corresponding graphemic inventories (Hartell 1993, Chanard 2006) and hundreds more inventories including phonemes, allophones and their conditioning environments, which we extracted from secondary resources like grammars and phonological descriptions. The data set also includes additional genealogical and geographical information about each language.
The PHOIBLE project also integrates the theoretical model of distinctive features from an extended phonological feature set based on International Phonetic Alphabet (IPA; International Phonetic Association 2005) and on Hayes 2009. This is accomplished by creating a mapping relationship from each IPA segment to a set of features (Moran 2012). In this way, the IPA is a pivot for interoperability across all resources in PHOIBLE because their contents are encoded in Unicode IPA.
For a detailed description of PHOIBLE, see Moran 2012a. For examples of some of the research we are doing with PHOIBLE, see Moran, McCloy & Wright 2012; Cysouw, Dediu & Moran 2012; and McCloy, Moran & Wright 2013.
For questions or more information, contact us. For access to data, keep reading.
PHOIBLE was funded in 2009 by a grant from the Royalty Research Fund at the University of Washington. The data are available in several formats for research purposes only. For commercial use, contact us
. If you use the data in a publication, please include the appropriate citation. (Notify us and we'll add you to the PHOIBLE bibliography.)
- Raw supplemental data for: Revisiting population size vs. phoneme inventory size.
- Phoneme level supplemental data for: Revisiting population size vs. phoneme inventory size.
- PHOIBLE SQL dump (XML; MySQL)
- PHOIBLE RDF Linked Data for the LLOD.
- Moran, Steven. 2012b. Using Linked Data to Create a Typological Knowledge Base. In Linked Data in Linguistics: Representing and Connecting Language Data and Language Metadata, Christian Chiarcos, Sebastian Nordhoff and Sebastian Hellmann (eds). Springer, Heidelberg. (BibTeX)
- Supplemental material for: Still No Evidence for an Ancient Language Expansion From Africa.
- Cysouw, Michael and Dan Dediu and Steven Moran. 2012. Still No Evidence for an Ancient Language Expansion From Africa. Science, 355: 657--b. (Full paper | BibTeX)
- Chanard, Christian. 2006. Systèmes Alphabétiques des langues africaines. Online: http://sumale.vjf.cnrs.fr/phono/.
- Crothers, John H., James P. Lorentz, Donald A. Sherman and Marilyn M. Vihman. 1979. Handbook of Phonological Data From a Sample of the World's Languages: A Report of the Stanford Phonology Archive.
- Cysouw, Michael and Dan Dediu and Steven Moran. 2012. Still No Evidence for an Ancient Language Expansion From Africa. Science, 355: 657--b.
- Hartell, Rhonda L. 1993. Alphabets des langues africaines. UNESCO and Société Internationale de Linguistique.
- Hayes, Bruce. 2009. Introductory Phonology. Blackwell.
- International Phonetic Association. 2005. International Phonetic Alphabet. http://www.arts.gla.ac.uk/IPA/.
- Maddieson, Ian and Kristin Precoda. 1990. Updating UPSID. In UCLA Working Papers in Phonetics, 74, 104-111.
- Maddieson, Ian. 1984. Pattern of Sounds. Cambridge University Press.
- McCloy, Daniel, Steven Moran and Richard Wright. 2013. Revisiting ‘The role of features in phonological inventories’. Paper presented at the CUNY Conference on the Feature in Phonology and Phonetics. (Slides | BibTeX)
- Moran, Steven. 2012a. Phonetics Information Base and Lexicon. PhD thesis. University of Washington.
- Moran, Steven. 2012b. Using Linked Data to Create a Typological Knowledge Base. In Linked Data in Linguistics: Representing and Connecting Language Data and Language Metadata, Christian Chiarcos, Sebastian Nordhoff and Sebastian Hellmann (eds). Springer, Heidelberg.
- Moran, Steven, Daniel McCloy and Richard Wright. 2012. Revisiting Population Size vs. Phoneme Inventory Size. Language, 88(4), 877-893.
- The above references in BibTeX.