Welcome to PHOIBLE Online

PHOIBLE Online is a repository of cross-linguistic phonological inventory data, which have been extracted from source documents and tertiary databases and compiled into a single searchable convenience sample. The 2014 edition includes 2155 inventories that contain 2160 segment types found in 1672 distinct languages.

A bibliographic record is provided for each source document; note that some languages in PHOIBLE have multiple entries based on distinct sources that disagree about the number and/or identity of that language’s phonemes.

Two principles guide the development of PHOIBLE, though it has proved challenging both theoretically and technologically to abide by them:

  1. Be faithful to the language description in the source document (now often called ‘doculect’, for reasons indicated above)
  2. Encode all character data in a consistent representation in Unicode IPA

In addition to phoneme inventories, PHOIBLE includes distinctive feature data for every phoneme in every language. The feature system used was created by the PHOIBLE developers to be descriptively adequate cross-linguistically. In other words, if two phonemes differ in their graphemic representation, then they necessarily differ in their featural representation as well (regardless of whether those two phonemes coexist in any known doculect). The feature system is loosely based on the feature system in Hayes 2009 with some additions drawn from Moisik & Esling 2011.

However, the final feature system goes beyond both of these sources, and is potentially subject to change as new languages are added in subsequent editions of PHOIBLE.

The 2014 edition includes inventories from the following contributors:

Contributor Description Sources Number of inventories
Christian Chanard and Rhonda L. Hartell (AA) The inventories in Alphabets of Africa (AA) come from the work of Christian Chanard's Systèmes alphabétiques des langues africaines, an online database of the work of Alphabets des langues africaines, published in 1993 by the Regional Office in Dakar, Senegal, and edited by Rhonda L. Hartell. AA contains the phoneme inventories and orthographies of 200 languages. Incorrect ISO 639-3 language name identifiers and incorrect Unicode IPA characters were updated before the inventories from the online version were added to PHOIBLE (see Moran 2012, chp 4 for details). Christopher Green verified the inventories' contents and in cases where there were discrepencies between Chanard and Hartell, additional resources were consulted to resolve these issues (ibid.). 203
Christopher Green and Steven Moran (GM) Christopher Green and Steven Moran extracted phonological inventories from secondary sources including grammars and phonological descriptions with the goal of attaining pan-Africa coverage. This is a work in progress.
PHOIBLE (PH) Steven Moran and Daniel McCloy and Richard Wright. 389
Ramaswami, N. (RA) These inventories come from Common Linguistic Features in Indian Languages: Phoentics, by N. Ramaswami. This source contains 100 languages' phoneme inventories, as compiled from various works on languages of India. 100
South American Phonological Inventory Database (SAPHON) The South American Phonological Inventory Database (SAPHON), compiled and edited by Lev Michael, Tammy Stark and Will Chang, is a comprehensive resource describing phoneme inventories from languages spoken in South America. It contains over 300 data points and is available online at: http://linguistics.berkeley.edu/~saphon/. 355
Stanford Phonology Archive (SPA) The Stanford Phonology Archive (SPA) was the first computerized database of phonological segment inventories. It was inspired by Joseph Greenberg's research on universals and his personal archive of data from notebooks and his memory (Crothers et al 1979, i-ii). The inventories in PHOIBLE Online come from the Handbook of Phonological Data From a Sample of the World's Languages, compiled and edited by Crothers et al 1979, and kindly provided to the Phonetics Lab (University of Washington) by Marilyn M. Vihman. The inventories in SPA include descriptions of phonemes, allophones and comments on phonological contexts for 197 languages. The inventory descriptions were digitized and each phoneme was mapped from its original written description, e.g. d-pharyngealized, to a Unicode IPA representation. Each inventory was also assigned an ISO 639-3 language name identifer. Details are given in Moran 2012, chp 4, and the SPA-to-Unicode IPA mappings are given in Moran 2012, appendix E. 197
UCLA Phonological Segment Inventory Database (UPSID) In the early 1980's, Ian Maddieson developed the UCLA Phonological Segment Inventory Database (UPSID), a computer-accessible database of contrastive segment inventories (Maddieson 1984). The initial sample of 317 languages drew on the work of the Stanford Phonology Archive (Crothers et al 1979), but decisions regarding the phonemic status and phonetic descriptions of some segments do not coincide between the compilers of the two databases and were therefore updated in UPSID (Maddieson 1984, pg 6). Maddieson and Precoda (1990) expanded the sample of languages from 317 to 451; both datasets have been based on a quota sampling technique that aims to include one language from each small language family. UPSID inventories contain no descriptions of tone. The UPSID-451 data used in PHOIBLE Online were extracted from a DOS software package. Each segment description, originally given in an ASCII encoding (e.g. XW9:) was mapped to Unicode IPA and each inventory was assigned an ISO 639-3 language name identifier. For details, see Moran 2012, chp 4; the UPSID-to-Unicode mappings are given in Moran 2012, appendix F. 451

The data set also includes additional genealogical and geographical information about each language from the Glottolog.

The PHOIBLE project also integrates the theoretical model of distinctive features from an extended phonological feature set based on International Phonetic Alphabet (International Phonetic Association 2005) and on Bruce Hayes 2009. This is accomplished by creating a mapping relationship from each IPA segment to a set of features (Steven Moran 2012). In this way, the IPA is a pivot for interoperability across all resources in PHOIBLE because their contents are encoded in Unicode IPA.

For a detailed description of PHOIBLE, see Steven Moran 2012. For examples of some of the research we are doing with PHOIBLE, see: Steven Moran and Daniel McCloy and Richard Wright 2012, Michael Cysouw and Dan Dediu and Steven Moran 2012, McCloy, Daniel R. and Moran, Steven and Wright, Richard A. 2013 and Moran & Blasi, Cross-linguistic comparison of complexity measures in phonological systems, forthcoming.

How to use PHOIBLE

Users can browse or search PHOIBLE's inventories by clicking on the tabs "Inventories", "Languages" or "Segments" above. Data can be downloaded by clicking the download button . If you use PHOIBLE in your research, please cite appropriately, following our recommended citation format.

How to cite PHOIBLE

If you are citing the database as a whole, or making use of the phonological distinctive feature systems in PHOIBLE, please cite as follows:

Moran, Steven & McCloy, Daniel & Wright, Richard (eds.) 2014.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://phoible.org, Accessed on 2014-09-19.)

If you are citing phoneme inventory data for a particular language or languages, please use the name of the language as the title, and include the original data source as an element within PHOIBLE:

UCLA Phonological Segment Inventory Database. 2014. Lelemi sound inventory (UPSID).
In: Moran, Steven & McCloy, Daniel & Wright, Richard (eds.)
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://phoible.org/inventories/view/441, Accessed on 2014-09-19.)