Chapter 6.1 Introduction

Malcolm Ross and Andrew Pawley and Meredith Osmond

Map 1.1: Oceanic and non-Oceanic Austronesian languages

1. Aims1

This is the sixth and last of a set of volumes on the lexicon of the Proto Oceanic (POc) language.2 POc was the immediate ancestor of the Oceanic subgroup of the Austronesian language family. This subgroup consists of all the Austronesian languages of Melanesia east of 136˚ E, together with those of Polynesia and, with two exceptions, those of Micronesia—around 500 languages in all (see Map 1.1).3 Extensive arguments for the existence of Oceanic as a clearly demarcated branch of Austronesian were first put forward by Dempwolff (1927, 1937), and the validity of the subgroup is now recognised by probably all scholars working in Austronesian historical linguistics.

The development and break-up of the POc language and speech community were stages in a truly remarkable chapter in human prehistory—the colonisation by Austronesian speakers of the Indo-Pacific region in the period after about 2000 BC. The outcome was the largest of the world’s well-established language families and (until the expansion of Indo-European after Columbus) the most widespread. The Austronesian family comprises more than 1,000 distinct languages. Its eastern and western outliers, Madagascar and Easter Island, are two-thirds of a world apart, and its northernmost extensions, Hawai‘i and Taiwan, are separated by 70 degrees of latitude from its southernmost outpost, Stewart Island in New Zealand.

Map 1.2: Geographic limits of historically identified Oceanic speakers and presently documented Lapita sites

A strong school of opinion associates the subsequent break-up of POc with the rapid colonisation of Island Melanesia and the central Pacific by bearers of the Lapita culture between about 1200 and 900 BC (see Map 1.2 and vol. 2, chapter 2).

The present project brings together a large corpus of lexical reconstructions for POc, with supporting cognate sets, organised according to semantic fields and using a standard orthography for POc. We hope that it will be a useful resource for culture historians, archaeologists and others interested in the prehistory of the Pacific region. The comparative lexical material should also be a rich source of data for various kinds of purely linguistic research, e.g. on subgrouping (as in §1.8 and §1.9), phonological change, semantic change and semantic structure (e.g. colexification) in the 500 or so daughter languages.

Volume 1 of The lexicon of Proto Oceanic reconstructs terms associated with material culture. Volumes 2, 3 and 4 examine relevant sets of cognate terms that provide insights into how POc speakers viewed their environment. Volume 2 deals with the geophysical or inanimate environment, and volumes 3 and 4 treat plants and animals respectively. Volume 5 and the present volume return to terminologies centring on people. Volume 5 concerns gender and age, the body, and human conditions and physical and cognitive activities that arise from nature rather than nurture. The present volume concerns culturally learned structures, including social organisation, beliefs in the supernatural, the seasons of the year, counting and other elements of non-material culture.

A consideration of the totality of our reconstructions across volumes 1 to 5 has led to an unexpected reassessment of the origin of Proto Oceanic (§§1.8–1.9) together with a small revision to its phonology (§1.8.2.4).

The editors had intended to provide a seventh volume that would perform several functions. It would treat closed classes of lexical roots; review the project’s main findings concerning Proto Oceanic speakers’ culture and environment and compare these findings with what archaeology tells us about the way of life and environment of the bearers of the Lapita culture. Some of these matters are partially folded into the chapters of the present volume, e.g. social anthropology into chapters 3 and 4, archaeology into chapter 5 and archaeogenetics briefly into chapter 15. Two factors have led to the decision not to proceed with volume 7 and to make this the last volume. Firstly, the editors are now octogenarians and would like to live somewhat less hectic lives. Secondly, and importantly, funds have been provided by the (Australian Research Council’s) Centre of Excellence for the Dynamics of Language to set up a publicly accessible electronic database of the reconstructions from the six volumes along with their supporting data, thereby fulfilling at least the purposes of the cumulative indexes intended for volume 7. It will also provide a locus for updating the project’s findings and for additions by other scholars.

This introduction follows a similar path to that taken in earlier volumes, but deviates in §1.8 and §1.9 to outline the fresh insights into the prehistory of Proto Oceanic itself based on the reconstructions in volumes 1–5. Section 1.2 gives an overview of this volume’s contents. and §1.3 summarises its relationship to previous work. Section 1.4 examines the issues that arise in reconstruction. It falls into four main subsections. Subsection 1.4.1 sketches our approach to reconstruction. Section 1.4.2 is a brief introduction to sound correspondences. The third, §1.4.3 looks at the kinds of language grouping found in Oceania, as this bears on the validity of our reconstructions. Section 1.4.4, sets out the criteria that we apply in making a reconstruction, and our answers to the challenges this raises. In section 1.5 we briefly explain the conventions used in the cognate sets that make up much of this and previous volumes. Section 1.7 brings us to Proto Oceanic itself and presents its phonology as it has been understood until now, and the two orthographies that have been used to represent it. After a short note on POc morphology in §1.6, section 1.8 takes us—we think appropriately in this our final volume—to the study of Proto Oceanic phonology and origins based on volumes 1–5 (Ross, in prep.). The results are summarised in §1.9.

2. The present volume

Inspection of the table of contents shows that the chapters in the present volume vary hugely in length. Each chapter concerns a semantic domain. For some of these domains—kinship, seasonal cycles, counting—we found a wealth of data and were dealing with internally structured closed classes of lexemes whose presentation required numerous tables (and diagrams in the case of kinship). For other domains—the spirit world, measurement—there was limited lexical material, and for yet others—*mana, *tabu—the author chose to limit domains to single concepts considered by others to be key cultural concepts in the Oceanic lexicon.

Chapters 2 to 5 of this volume are concerned with POc speakers’ social organisation. Chapter 2 is a detailed reconstruction of kinship terms and structures. Chapter 3, by the late Per Hage, is a slightly edited and abridged version of a paper first published in The Journal of the Polynesian Society in 2007. It complements chapter 2 by using evidence from disciplines other than linguistics to answer the question, “Was POc society matrilineal?” Chapter 4 returns to a much discussed issue, reconstructing terms associated with chieftainship and rank in POc and examining the consequences of this reconstruction.4 Chapter 5 uses reconstructed terms to investigate POc speakers’ settlement patterns.

Chapter 6 concerns the probable recreational activities of POc speakers, looking at music, song, dance and games.

In chapters 7 to 10 we turn to topics that have to do with belief systems and the supernatural. Chapter 7 concerns the beings that inhabited the POc spirit world. Chapters 8 and 9 both deal with human manipulation of the supernatural. Chapter 8 takes a broad look at magic, while chapter 9 focuses on the reconstruction of PEOc *mana, a term that has been much discussed by Pacific anthropologists and denotes the pervasive supernatural power given by ancestral spirits to certain powerful individuals to ensure their success. Chapter 10 analyses the meanings of the POc term *tabu, which has reflexes throughout Oceanic. It meant ‘prohibited’, but in certain EOc languages it also attributes an aura of sanctity to the ‘prohibited’ person or object.

Chapter 11 investigates in some detail the way that Oceanic speakers have referred to the cyclic nature of time and have used the sun, moon and stars to regulate the agricultural cycle.

The terms that POc speakers used to refer to various aspects of speech are the subject of chapter 12.

Chapter 13 reconstructs terms that had to do with trade and more generally with change of possession: giving, receiving and stealing. It also introduces the practice of ceremonial exchange, which plays a role in chapter 14 on counting. There it is argued that the POc decimal counting system and its associated complexities were kept alive by their use in ceremonial exchange feasts. Chapter 15 suggests that POc may also have had a digit-tally system used in everyday counting. One counting complexity covered in chapter 14 is the use of numeral classifiers, and chapter 16 deals with the subset of classifiers used in measurement.

Appendix A lists the data sources employed in this volume. Appendix B lists the languages from which data for this and previous volumes are drawn. It includes alternative names of languages, an index to languages, maps showing their approximate locations, and a list of their ISO codes, glottocodes and longitudes and latitudes.

3. The relation of the current project to previous work

Reconstructions of POc phonology and lexicon began with Dempwolff’s pioneering work in the 1920s and 1930s. Dempwolff’s dictionary of reconstructions attributed to Proto Austronesian (PAn) (Dempwolff 1938)—but equivalent in modern terms to Proto Malayo-Polynesian (PMP)—includes some 600 reconstructions with reflexes in Oceanic languages.

Since the 1950s, POc and other early Oceanic interstage languages have been the subject of a considerable body of research. However, relatively few new reconstructions safely attributable to POc were added to Dempwolff’s material until the 1970s. In 1969 George Grace made available as a working paper a compilation of reconstructions from various sources amounting to some 700 distinct items, attributed either to POc or to early Oceanic interstages. These materials were presented in a new orthography for POc, based largely on Biggs’ (1965) orthography for an interstage he called Proto Eastern Oceanic. Updated compilations of Oceanic cognate sets were produced at the University of Hawai‘i in the period 1977–1983 as part of a project directed by Grace and Pawley. These compilations and the supporting data are problematic in various respects and we have made only limited use of them.

Comparative lexical studies have been carried out for several lower-order subgroups of Oceanic: for Proto Polynesian by Biggs (resulting in Walsh & Biggs 1966, Biggs, Walsh & Waqa 1970 and subsequent versions of the POLLEX file, including Biggs & Clark 1993, Clark & Biggs 2006 and online as Greenhill & Clark 2011); for Proto Micronesian by scholars associated with the University of Hawai‘i (Bender et al. 1983, 2003a, 2003b); for the ancestor of the Banks and Torres languages by Alexandre François (several unpublished manuscripts); for Proto North and Central Vanuatu by Clark (2009); for Proto Southern Vanuatu by Lynch (2001c); for New Caledonia by Ozanne-Rivierre (1992), Haudricourt & Ozanne-Rivierre (1982) and Geraghty (1989); for Proto SE Solomonic by Levy (1980) and Lichtenberk (1988); for Proto Central Pacific by Hockett (1976), Geraghty (1983, 1986, 1996, together with a number of unpublished papers); for Proto Eastern Oceanic by Biggs (1965), Cashmore (1969), Levy (1980), and Geraghty (1990); and for Proto Central Papuan by Pawley (1975), Lynch (1978a, 1980), and Ross (1994a).

Robert Blust (1970, 1980a, 1983-84a, 1986, 1989) of the University of Hawai‘i, in a series of papers published extensive, alphabetically ordered, lexical reconstructions (with supporting cognate sets) for interstages earlier than POc, especially for Proto Austronesian, Proto Malayo-Polynesian and Proto Eastern Malayo-Polynesian. He has also written several papers investigating specific semantic fields (Blust 1980c, 1982b, 1987, 1994). Blust & Trussel had a major work in progress, the online Austronesian Comparative Dictionary (ACD), which brings together Blust’s reconstructions for Proto Austronesian and lower-order stages up to mid 2020, when the sudden death of Steve Trussel, who was responsible for the web interface and data input, brought this work to a sudden halt. With the passing of Robert Blust in January 2022, the ACD was bequeathed to Alexander Smith and found a new home with the Cross-Linguistic Linked Data project, where hopefully it will continue to grow.5

Several papers predating our project systematically investigated particular semantic domains in the lexicon of POc, e.g. Milke (1958b), French-Wright (1983), Pawley (1982a, 1985), Pawley & Green (1984), Lichtenberk (1986), Walter (1989), and the various papers in Pawley & Ross (1994). Ross (1988) contained a substantial number of new POc lexical reconstructions, as well as proposed modifications to the reconstructed POc sound system and the orthography. However, previous Oceanic lexical studies were limited both by large gaps in the data, with a distinct bias in favour of ‘Eastern Oceanic’ languages, and by the technical problems of collating large quantities of data. Although many languages in Melanesia remain poorly described, there are now many more dictionaries and extended word lists, particularly for Papua New Guinea, than there were in the 1980s. And developments in computing hardware and software now permit much faster and more precise handling of data than was possible then. A list of sources is found in Appendix A.

Several compilations of reconstructions have provided valuable points of reference, both inside and outside the Oceanic group. We are indebted particularly to Bender et al. (2003a, 2003b), two editions of POLLEX (Biggs & Clark 1993 and Clark & Biggs 2006), Blust & Trussel (2020), Clark (2009) and Lynch (2001c).

In the course of planning the several volumes of the present project, we came to realise that the form in which preliminary publications were presented—namely as essays, each discussing cognate sets for a particular semantic field at some length—would also be the best form for the presentation of this set of volumes. A discursive treatment of individual terminologies, as opposed, say, to a dictionary-type listing of reconstructions with supporting cognate sets, makes it easier to relate the linguistic comparisons to relevant issues of culture history, language change, and methodology. Hence each of the present volumes has as its core a collection of analytic essays. Some of these have been published or presented elsewhere, but are included here in revised form.

In some cases we have updated the earlier versions in the light of subsequent research, and, where appropriate, have inserted cross-references between contributions. Authorship is in some cases hard to pin down, as a number of people have had a hand in collating the data, doing the reconstructions, and (re)writing for publication here. In most chapters, however, one person did the research which determined the structure of the terminology, and that person appears as the first or only author, and where another or others had a substantial part in putting together the chapter they appear as the second or further authors.

4. Reconstructing the lexicon

4.1. Terminological reconstruction

Our method of doing ‘terminological reconstruction’ is as follows. First, the terminologies of present-day speakers of Oceanic languages are used as the basis for constructing a hypothesis about the semantic structure of a corresponding POc terminology, taking account of (i) ethnographic evidence, i.e. descriptions of the lifestyles of Oceanic communities and (ii) the geographical and physical resources of particular regions of Oceania. For example, by comparing terms in several languages for parts of an outrigger canoe, or for growth stages of a coconut, one can see which concepts recur and so are likely to have been present in POc. Secondly, a search is made for cognate sets (§1.4.2), i.e. words from different languages that appear to be descended from the same protoform, from which forms can be reconstructed to match each meaning in this hypothesised terminology. The search is not restricted to members of the Oceanic subgroup; if a term found in an Oceanic language proves to have external (non-Oceanic) cognates, the POc antiquity of that term will be confirmed and additional evidence concerning its meaning will be provided. Thirdly, the hypothesised terminology is re-examined to see if it needs modification in the light of the reconstructions. There are cases, highlighted in the various contributions to these volumes, where we were able to reconstruct a term where we did not expect to do so and conversely, often more significantly, where we were unable to reconstruct a term where we had believed we should be able to. In each case, we have discussed the reasons why our expectations were not met and what this may mean for Oceanic culture history. We have set out to pay more careful attention to reconstructing the semantics of POc forms than has generally been done in earlier work, treating words not as isolates but as parts of terminologies.

Blust (1987:81) distinguishes between conventional ‘semantic reconstruction’, which asks, “What was the probable meaning of protomorpheme X?”, and Dyen and Aberle’s (1974) ‘lexical reconstruction’, where one asks, “What was the protomorpheme which probably meant ‘X’?” At first sight, it might appear that terminological reconstruction is a version of lexical reconstruction. However, there are sharp differences. Lexical reconstruction applies a formal procedure: likely protomeanings are selected from among the glosses of words in available cognate sets, then an algorithm is applied to determine which meaning should be attributed to each set. This procedure may have unsatisfactory results, as Blust points out. Reconstructions may end up with crude and overly simple glosses; or no meaning may be reconstructed for a form because none of the glosses of its reflexes is its protomeaning.

Terminological reconstruction is instead similar to the semantic reconstruction approach. In terminological reconstruction the meanings of protomorphemes are not determined in advance. Instead, cognate sets are collected and their meanings are compared with regard to:

  • their specific denotations, where these are known;
  • the geographic and genealogical distribution of these denotations (i.e. are the glosses from which the protogloss is reconstructed well distributed?);
  • any derivational relationships to other reconstructions;
  • their place within a working hypothesis of the relevant POc terminology (e.g., are terms complementary —‘bow’ implies ‘arrow’; ‘seine net’ implies ‘floats’ and ‘weights’? Are there different levels of classification—generic, specific, and so on?).

For example, it proved possible to reconstruct the following POc terms for tying with cords (vol.1:290–293):

  • POc *buku ‘tie (a knot); fasten’
  • POc *pʷita ‘tie by encircling’
  • POc *paqu(s), *paqus-i- ‘bind, lash; construct (canoe +) by lashing together’
  • POc *pisi ‘bind up, tie up, wind round, wrap’
  • POc *kiti ‘tie, bind’

In each of the supporting cognate sets from contemporary languages there are a number of items whose glosses in the dictionaries or word lists are too vague to tell the analyst anything about the specific denotation of the item, and in the case of *kiti this prevents the assignment of a more specific meaning. The verb *buku can be identified as the generic term for tying a knot because of its derivational relationship (by zero derivation) with a noun whose denotation is clearly generic, *buku ‘node (as in bamboo or sugarcane); joint; knuckle; knot in wood, string or rope’ (vol.1:85–86). Other senses are extensions of this meaning (vol.2:50, vol.5:159, 175–176, 341). Reconstruction of the meaning of *pʷita as ‘tie by encircling’ is supported by the meanings of the Lukep, Takia and Longgu reflexes, respectively ‘tie by encircling’, ‘tie on (as grass-skirt)’, and ‘trap an animal’s leg; tie s.t. around ankle or wrist’: Lukep and Takia are North New Guinea languages, whilst Longgu is SE Solomonic. Reconstruction of the meaning of *paqu(s), *paqus-i- as ‘bind, lash; construct (canoe +) by tying together’ is supported by the meanings of the Takia, Kiribati and Samoan reflexes, respectively ‘tie, bind; construct (a canoe)’, ‘construct (canoe, house)’, and ‘make, construct (wooden objects, canoes +)’: Takia is a North New Guinea language, Kiribati is Micronesian, and Samoan is Polynesian. The meaning of *pisi is similarly reconstructed by reference to the meanings of its Mono-Alu, Mota, Port Sandwich, Nguna and Fijian reflexes.

Often, however, the authors have been less fortunate in the information available to them. For example, Osmond (vol.1:222–225) reconstructs six POc terms broadly glossed as ‘spear’. Multiple terms for implements within one language imply that these items were used extensively and possibly in specialised ways. Can we throw light on these specialised ways? Unfortunately, some of the word lists and dictionaries available give minimal glosses, e.g. ‘spear’, for reflexes of the six reconstructions. What we need to know for each reflex is: what is the level of reference? Is it a term for all spears, or perhaps all pointed projectiles including arrows and darts? Or does it refer to a particular kind of spear? Is it noun or verb or both? If a noun, does it refer to both the instrument and the activity? Most word lists are frustratingly short on detail. For this kind of detail, ethnographies have proven a more fruitful source of information than many word lists.

Another problem is inherent in the dangers of sampling from some 500 languages. The greater the number of languages, the greater are the possible variations in meaning of any given term, and the greater the chances of two languages making the same semantic leaps quite independently. Does our (sometimes quite limited) cognate set provide us with a clear unambiguous gloss, or have we picked up an accidental bias, a secondary or distantly related meaning? Did etymon x refer to fishhook or the material from which the fishhook was made? Did etymon y refer to the slingshot or to the action of spinning round?

4.2. Sound correspondences

Phonological changes, whereby one sound evolves into another, are mostly regular. For example, the initial consonant of the reflexes of the three words below is the same for all three items (and for numerous others).6 In each language all instances of initial *p- have evolved “regularly”, i.e., in the same way.

POc *papine *pisiko *pat[i] 7 *p-
‘woman’ ‘meat, flesh’ ‘four’
Adm: Aua pifine pirio p-
Adm: Baluan pein pusio pa- p-
NNG: Numbami wiso wata w-
PT: Kilivila vivila viliy-na -vasi v-
PT: Yamalele vavine viɣo v-
PT: Sinaugoro vavine vi-viɣo vasi-vasi v-
PT: Motu hahine hidio hani h-
MM: Tolai vavina vio -vat v-
MM: Vaghua vavene vəzəɣo -vac v-
SES: Arosi haihine hasiʔo hai h-
NCV: Mota vavine visoɣo-i vat v-
Mic: Woleaian faifile fitixo faa- f-
Fij: Wayan vavine viðiko v-
Pn: Samoan fefine f-

The grouping to which each language belongs is indicated by an abbreviation on the left (§1.5.1).

The “sound correspondence” that concerns us here, the initial consonant correspondence, is shown on the right. Reconstructing forms in a protolanguage depends on working out the systematic sound correspondences among cognate vocabulary in contemporary languages and on having a working hypothesis about how the sounds of POc have changed and are reflected in modern Oceanic languages. Working out sound correspondences even for twenty languages is a large task, and so we have relied heavily on the work of others and our own previous work. The sound correspondences we have used are as follows: Ross (1988) for Western Oceanic and Admiralties; Ross (1996a) for Yapese; Ross (1996b) for Oceanic languages of Indonesian Papua; Pawley (1972) for Eastern Oceanic; Levy (1979, 1980) for SE Solomonic and Lichtenberk (1988) for Cristobal-Malaitan; Pawley (1972) and Tryon & Hackman (1983) for SE Solomonic; Ross & Næss (2007) for Temotu; François (pers. comm.) for the Banks and Torres Islands of Vanuatu; Tryon (1976) and Clark (2009) for North and Central Vanuatu; Lynch (2001c) for Southern Vanuatu; Geraghty (1989), Haudricourt & Ozanne-Rivierre (1982), Ozanne-Rivierre (1992, 1995) and Lynch (2015) for New Caledonia; Jackson (1986) and Bender et al. (2003a, 2003b) for Micronesian; Geraghty (1986) for Central Pacific; and Biggs (1978) for Polynesian. We have also done additional work on North and Central Vanuatu and New Caledonia ourselves.

For non-Oceanic languages we have referred to sound correspondences given by Tsuchida (1976) for Formosan languages; by Zorc (1977, 1986) and Reid (1982) for the Philippines; by Adelaar (1992b) and Nothofer (1975) for Malay and Javanese; by Sneddon (1984) for Sulawesi; by Collins (1983) for central Maluku; by Grimes & Edwards (in prep.) for what is conventionally known as CMP; and by Blust (1978a) and Kamholz (2014) for SHWNG.

Regular sound correspondences can be interfered with in various ways: by phonetic conditioning that the analyst has not identified (see, e.g., Blust 1996a), by borrowing (for an extreme Oceanic case, see Grace 1996), or by the frequency of an item’s use (Bybee 1994). We have tried at least to note, and sometimes to account for, irregularities in cognate sets.

4.3. The internal structure of the Oceanic subgroup of the Austronesian family

Figure 1.1 shows nine primary subgroups of Oceanic. Its rake-like structure indicates that no convincing body of shared innovations has been found to allow any of the nine subgroups to be combined into higher-order groupings. Section 1.4.3.1 explains the theory that underlies the formulation of Figure 1.1, which is important to the practice of reconstruction. Sections 1.4.3.2 and 1.4.3.3 offer some commentary on our subgrouping, and in §1.4.4 we explain how our criteria for making a reconstruction and attributing it to a protolanguage are related to subgrouping issues.

4.3.1. Subgroups and linkages

In Figure 1.1 each node is, with one minor exception, either a single language, usually a reconstructed protolanguage, or, in italics, a group of languages. The exception is the two very closely related languages Mussau and Tench.

Figure 1.1: Schematic diagram showing the subgrouping of Oceanic Austronesian languages.

Where a node is a protolanguage, its descendants form a subgroup. The only descendant languages shown in Figure 1.1 are reconstructed protolanguages, but Appendix B lists by grouping the descendant languages referred to in these volumes. A subgroup is identified by innovations shared by its member languages, i.e. it is ‘innovation-defined’ in the terminology of Pawley & Ross (1995). These innovations are assumed to have occurred just once, in the subgroup’s protolanguage, i.e. the exclusively shared ancestor of its members. Thus languages of the large Oceanic subgroup of Austronesian share a set of innovations relative to the earlier Austronesian stages shown in Figure 1.5. By inference these innovations occurred in their common ancestor, POc, and the claim that they are innovations is based on a comparison of reconstructed POc with reconstructed PMP. The phonological innovations of POc were identified by Dempwolff (1934), and have been somewhat modified by subsequent research (§1.8.1). POc also reflects morphosyntactic innovations (Lynch et al. 2002: ch.4), morphological innovations (e.g. POc acquired a morphological distinction between three kinds of alienable possessive relationship: food, drink and general; Lichtenberk 1985a), and lexical innovations (e.g. PMP *limaw ‘citrus fruit’ was replaced by POc *molis; Lynch 1984).

Italics are used in Figure 1.1 to indicate a group of languages that is not a subgroup, i.e. has no identifiable exclusively shared parent. Thus Southern Oceanic linkage in Figure 1.1 indicates a collection of languages descended from POc (Ross 1988). They comprise the languages of Vanuatu, the Loyalties and New Caledonia, but they do not form a subgroup. There was no “Proto Southern Oceanic”, as no convincing innovation has been identified that is reflected by all Southern Oceanic languages. Nonetheless, there are innovations which chain various, sometimes overlapping, groups of Southern Oceanic languages together (§1.4.3.2). Some of these innovations are inherited, i.e. they define smaller subgroups within Southern Oceanic. Of these, Southern Vanuatu is the best known example (Lynch 2001c:181–184). Others are probably the result of contact between fairly similar languages. The recently discovered fact that there were multiple immigrations by, we take it, speakers of early Oceanic languages probably gave rise to this kind of contact (see the discussion in §15.8.1).

The term “linkage” occurs in several of the italicised labels in Figure 1.1. The distinction between a subgroup and a linkage is important in reconstruction.8

A subgroup is defined by a set of coterminous innovations that are inferred to have occurred in its common ancestor (its protolanguage).9 By “coterminous” is meant that all the innovations are shared by all the languages of the group.10 This is the situation in Figure 1.2.11 Languages A and B share a set of innovations and form one subgroup. Languages C–J share another set of innovations and form another subgroup. The processes of language change that give rise to innovations are continuous, meaning that subgroup formation is recursive. Within the subgroup CDEFGHJ are two (sub)subgroups CDE and FG, alongside two languages H and J. This situation can be represented in two ways: by a tree (left) or a maplike representation (right). The tree, like Figure 1.1, also displays the protolanguages from which the languages of each subgroup are inferred to be descended.

Figure 1.2: Schematic diagram of a subgroup

Figure 1.4 shows the same subgroup AB as figure 1, but languages C–J display a pattern of intersecting subgroups.12 Languages CDEF form a “subgroup” on the basis of a set of coterminous innovations, and languages CDE form one on the basis of a further set. But E and F also share innovations with G, H and J, forming a subgroup EFGHJ that intersects with CDEF. What is more, E and F share further innovations with H and G respectively; that is, E and F each reflect innovations that are coterminous neither with those that define CDEF, nor with those that define EFGHJ. This intertwining of groups formed by intersecting innovation domains is a linkage (an ‘innovation-linked group’ in Pawley & Ross 1995). Its boundary can be defined, but no tree that accounts for all innovations can be drawn. If no tree can be drawn, then no protolanguage can be posited, and, since a reconstruction must belong to a protolanguage, strictly speaking no reconstructions can be made. We return to this matter in §1.4.3.2.

Figure 1.4: Schematic diagram of a linkage

Innovations begin as changes that occur in the language of an individual speaker, and some of these changes spread across the community. As long as languages are mutually intelligible, changes continue to spread. Their places of origin, and directions and extents of spread, may differ, so that the resulting innovations are not coterminous but instead intersect. And over time, social relationships may change, so that changes arrive from new origins. The outcome of these processes is a linkage.

However, untangling the history of a linkage is difficult, and sometimes impossible. In the “worst-case” scenario one or more innovations spreads right across the languages of the linkage. In this case it becomes virtually impossible to distinguish it from a subgroup. But returning to Figure 1.4, perhaps EFGHJ in fact reflects innovations that occurred in Proto EFGHJ. If so, then we cannot posit Proto CDEF or Proto CDE. Instead, we infer that at some date relationships were realigned so that speakers of pre-C and pre-D came into intimate enough contact with speakers of Proto EFGHJ or one of its descendants for innovations to pass between them, creating the illusion of a subgroup CDEF. But, with a little thought one could come up with a good number of scenarios that result in the pattern in Figure 1.4, and determining which reflects the actual history can be difficult.

Map 1.3: Oceanic language groups in northwest Melanesia: the Admiralties and St Matthias groups and the subgroups of Western Oceanic.

It is tempting to see a subgroup and a linkage as opposing patterns, but comparison of Figure 1.4 with the righthand diagram of Figure 1.2 shows that a subgroup is a subtype of a linkage, one in which the ranges of innovations happen not to intersect (François 2014:171). Nonetheless, we maintain the distinction between a subgroup and a linkage, as the former reflects a reconstructable protolanguage but the latter does not (or sometimes, as emerges below, does so more weakly).

4.3.2. Oceanic linkages

A number of Oceanic linkages have been recognised by scholars. They include Fijian (Geraghty 1983), the Caroline Islands (Jackson 1983), Vanuatu (Tryon 1976; Clark 1985; Lynch 2000b; 2004d; François 2011b, 2014), NW Melanesia (Ross 1988), the SE Solomons (Lichtenberk 1988, 1994b; Pawley 2011) and E Polynesian (Walworth 2014). In some of these there is evidence for events that would further complicate the description of a linkage in §1.4.3.1.

One such event sequence is indicated in Figure 1.1 by a dashed line around the relevant groups of languages. These are instances of a group of languages undergoing a division and then coming back into contact to form a grouping in a different constellation from before. The best researched of these is the Fijian linkage, which represents the partial resynthesis of the Fiji-based descendants of earlier Western Central Pacific and Eastern Central Pacific linkages after Rotuman and Polynesian had split off from them (Geraghty & Pawley 1981; Geraghty 1983; Pawley 1996c).13 Geraghty reconstructed the history of the Fijian linkage by painstaking analysis of innovations from at least two stages in its history. From the earlier period Western Fijian languages share innovations with Rotuman and Eastern Fijian with Polynesian. From a more recent period Western Fijian and Eastern Fijian languages share innovations with each other, reflecting their reintegration into a single linkage, within which the present Western/ Eastern boundary has shifted relative to the (fuzzy) boundary of the earlier period. This kind of process also forms part of the history of the Guadalcanal-Gelic subgroup within SE Solomonic (Pawley & Green 1984).

A linkage sometimes consists of some but not all of the languages descended from a single parent. The Western Oceanic linkage (reflects the innovations of POc, but no innovation is exclusive to the whole of Western Oceanic (although the merger of POc *r and *R comes close). However, the languages of its three component linkages (Map 1.1)—North New Guinea, Papuan Tip and Meso-Melanesian—display complex patterns of intersecting innovations.14 The WOc linkage is evidently descended from the dialects of POc that were left behind in the Bismarck Archipelago after speakers of the languages ancestral to the other eight primary subgroups in Figure 1.1 had moved away to the north or east (Ross 2014, 2017). After these departures various innovations occurred. Each arose somewhere in the Western Oceanic dialect network and spread to neighbouring dialects without reaching every dialect in the network.

The Southern Oceanic linkage as proposed by Lynch (1999, 2000b, 2001b, 2004d) is characterised by complex overlapping innovations, but by none that are reflected in all its member languages and would qualify it as a subgroup (see discussion in Lynch et al. 2002:112–114).

4.3.3. Oceanic subgroups

Figure 1.1 also shows a number of Oceanic groups for which a protolanguage is reconstructable. By definition these are subgroups. They are Admiralty (Ross 1988: ch.9), SE Solomonic (Pawley 1972:98–110; Levy 1979, 1980, n.d.; Tryon & Hackman 1983; Lichtenberk 1988), Temotu (Ross & Næss 2007; Næss & Boerger 2008; Lackey & Boerger 2021), S Vanuatu (Lynch 2001c:181–184), Micronesian (Jackson 1983, 1986; Bender et al. 2003a), and Papuan Tip (Ross 1992b)

Central Pacific is also a subgroup, but one defined by only a handful of shared innovations, indicating that the period of unity was short (Geraghty 1996). The high-order subgrouping of Central Pacific is due to Geraghty (1983), except for the position of Rotuman (Pawley 1996b). Within Central Pacific is another long recognised subgroup, Polynesian, for which Pawley (1996a) lists diagnostic innovations.

4.4. Criteria for reconstruction

4.4.1. The distributional criterion

The strength of a lexical reconstruction rests crucially on the distribution of the supporting cognate set across language groups. The distribution of cognate forms and agreements in their meanings is much more important than the number of cognates. It is enough to make a secure reconstruction if a cognate set occurs in just two languages in a family, with agreement in meaning, with two provisos. The first is that the two languages belong to different primary groups, and the second that there is no reason to suspect that the resemblances are due to borrowing or chance. The PMP term *(h)abij ‘twins’ is reflected in several western Malayo-Polynesian languages (e.g. Batak apid ‘twins, double (fused) banana’) but, when the reconstruction was made, only one Oceanic reflex was known,15 namely Roviana avisi ‘twins of the same sex’ (vol. 5, §2.6). Because Roviana belongs to a different first-order branch of Malayo-Polynesian from the western Malayo-Polynesian witnesses (cf Figure 1.5) and because there is virtually no chance that the agreement is due to borrowing or chance similarity, this distribution was enough to justify the reconstruction of PMP *(h)abij, POc *apic ‘twins’.

4.4.2. Which protolanguage? Handling the Oceanic tree’s rakelike structure

Here we deal with two issues relating to the question, To which protolanguage should a reconstruction be assigned? In this section we explain how we handle the rake-like structure of the Oceanic tree in Figure 1.1. In §1.4.4.3 we respond to the fact that a linkage has no identifiable protolanguage (§1.4.3.2).

The rake-like form of Figure 1.1 almost certainly reflects the very rapid settlement of Oceania out of the Bismarcks,16 but it confronts us with a methodological question. If we follow the standard rubric that we make a reconstruction if a cognate set occurs in languages of just two primary language groups (§1.4.4.1), then reflexes of an etymon in, say, a SE Solomonic language and a Micronesian language would be sufficient evidence for a POc reconstruction and the absence of reflexes in Admiralty and Western Oceanic would be irrelevant. Given what we know about the location of the POc homeland (in the Bismarcks; vol.2, ch.2) and the early eastward spread of Oceanic speakers, this is too loose a criterion. Instead, we assume two hypothetical nodes not shown in the tree in Figure 1.1.17 These are

  • Remote Oceanic, comprising Southern Oceanic, Micronesian and Central Pacific;
  • Eastern Oceanic, comprising SE Solomonic and Remote Oceanic.18

If a cognate set occurs in two or all three of the groups in Remote Oceanic, the reconstruction is attributed to PROc (PROc). If a cognate set occurs in one or more of the groups in Remote Oceanic and in SE Solomonic, it is attributed to Proto Eastern Oceanic (PEOc). In this way we acknowledge that such reconstructions may represent an innovation that postdates the spread of the early Oceanic speech community. There are enough PROc and PEOc reconstructions to suggest that such lexical innovations indeed occurred. This in turn provides evidence for Remote Oceanic and Eastern Oceanic subgroups, but evidence that is too weak to be relied on, for at least two reasons. First, it is quite possible that some of our PROc and PEOc reconstructions will be promoted to POc as more Admiralty and Western Oceanic data become available. Second, it is reasonable to assume that some of our PROc and PEOc etyma are of POc antiquity but happen to have been lost in Proto Admiralty and Proto Western Oceanic. Without supporting phonological or morphological evidence we are unwilling to treat PROc or PEOc as anything other than convenient hypothetical groups which allow us to retain conservative criteria for a POc reconstruction.

FIXME: relabel PEOc reconstructions in vol1 and vol2?

A reconstruction here labelled ‘PROc’ was labelled ‘PEOc’ in volume 1 or 2, but if it lacks SE Solomonic reflexes, it is labelled as a PROc reconstruction in volumes 3–6. Two factors have led to the distinction between PEOc and PROc in more recent volumes. One is that the historical separateness of SE Solomonic both from Western Oceanic and from groups treated as Remote Oceanic has become increasingly clear through recent research (Pawley 2009). The other, especially relevant to volume 3 on plants and to volume 4 on animals, is that the primary biogeographic divide in Oceania is between Near and Remote Oceania (see vol. 2, Map 5), i.e. between the main Solomons archipelago and the Temotu islands. Whether or not a plant or animal name has a SE Solomonic reflex is thus significant. Many plant names do not, and are thus attributed in volume 3 to PROc.

Our criterion for attributing a reconstruction to POc is that the cognate set must include data from at least two out of three criterial groupings: Admiralties (or Yapese or Mussau), Western Oceanic, and our hypothetical Eastern Oceanic. Both here and at the hypothetical interstages defined above, no reconstruction is made if there are grounds to infer borrowing from one of these groupings to another.19 We also reconstruct an etymon to POc if it is reflected in just one of the four criterial groupings and in a non-Oceanic Austronesian language (a member of one of the lefthand branches in Figure 1.5), as illustrated above by the reconstruction of POc *apic ‘twins’.

There are indications that Yapese (a single-language “subgroup”) and Mussau and Tench (a subgroup with two closely related languages) may be more closely related to Admiralty than to any other Oceanic subgroup,20 and for this reason they are tentatively treated as Admiralty languages for the purposes of reconstruction. That is, the presence of a reflex in one or more of these languages and in Admiralty does not support a POc reconstruction, but the presence of of a reflex in one or more of these languages and one of Western Oceanic or Eastern Oceanic does support one.

In chapter 2 (§4) of volume 2 Pawley discusses Blust’s (1998a) proposal that the primary split in Oceanic divides Admiralty from a subgroup embracing all other Oceanic languages. Pawley dubs the latter ‘Nuclear Oceanic’. If Blust’s subgrouping were accepted, then an etymon which lacked cognates outside Oceanic would need to be reflected both in an Admiralties language and in a non-Admiralties language for a POc reconstruction to be made. Etyma with reflexes in both Western and Eastern Oceanic, but not in the Admiralties, would be reconstructed as Proto Nuclear Oceanic. Under the criteria outlined above, however, we attribute these reconstructions to POc. These criteria were used in volumes 1 and 2, and we have thought it wise to maintain them throughout the volumes of this work. The reader who wishes to single out reconstructions attributable to a putative Proto Nuclear Oceanic (rather than to POc) can easily recognise them. They are those POc reconstructions for which (i) there are no Admiralties reflexes, and (ii) there is no higher-order reconstruction (i.e. PEMP, PCEMP, PMP or PAn).

4.4.3. Which protolanguage? Handling linkages

The languages of a linkage have no identifiable exclusively shared parent. Yet we have found many instances in which a cognate set is limited to one of the linkages in Figure 1.1: Western Oceanic, New Guinea Oceanic, Southern Oceanic or the reintegrated North and Central Vanuatu linkage. By the logic of §1.4.3.2 a form reconstructed from a cognate set restricted to a linkage should be reconstructed to the next protolanguage node up the tree. For a Western Oceanic cognate set, for example, this would mean reconstructing it to POc—this would defy the condition that a POc cognate set must be spread over at least two out of the four criterial groupings (§1.4.4.2).

As with PEOc and PROc (§1.4.4.2), we think it is more realistic to attribute these reconstructions to a hypothetical protolanguage rather than to a higher node in the tree. Hence there are reconstructions labelled PWOc and so on. Again these apparent lexical innovations offer only weak evidence for the protolanguage to which they are attributed. In addition to the explanations of the kinds offered above for PEOc and PROc etyma, it is possible, for example, that an innovatory ‘PWOc’ etymon arose when the Western Oceanic dialect network was still close-knit, and spread from dialect to dialect before the network broke into the two networks ancestral to its present-day first-order subgroups.

It is probable that the NNG and PT linkages form a grouping within WOc, separate from MM. We call this grouping the New Guinea Oceanic linkage, and so etyma reflected only in NNG and PT languages are attributed to a weakly supported Proto New Guinea Oceanic (Milke 1958, Pawley 1978), and etyma reflected in either NNG or PT (or both) and in MM are labelled PWOc.

5. Conventions common to the series

5.1. Presentation of reconstructions

Each of the contributions to these volumes concerns a particular POc ‘terminology’. Generally, each contribution begins with an introduction to the issues raised by the reconstruction of its particular terminology, and the rest consists of reconstructed etyma with supporting data and a commentary on matters of meaning and form.

The reconstruction of POc *pale below, abbreviated from Chapter 5, shows how reconstructions and supporting cognate sets are presented. Above it is a superordinate (PMP) reconstruction drawn from published sources. Below it are supporting reflexes. Sometimes a lower-order reconstruction like PMic *fale below is included, either in acknowledgment of others’ work, or because it reflects a significant change in form or meaning.

PMP *balay ‘public building’ (Blust 1987); ‘unwalled building’ (Waterson 1993)
POc *pale ‘building for storage or public use, open-sided building, shed’
Adm: Lou pal ‘canoe hut’
Adm: Mussau ale ‘house’
NNG: Yabem ale ‘house’
NNG: Lukep (Pono) para ‘yam house’
MM: Tolai pal ‘house, room’
MM: Mono-Alu hale-hale ‘public building’
SES: Arosi hare ‘shed for yams’ (E Arosi); ‘house with side of roof only, made in garden’ (W Arosi)
SES: Bauro hare ‘canoe house, men’s house’
SES: Sa’a hale ‘yam shed outside a garden’
SES: Kwaio fale ‘hut for childbirth’
SES: Gela hale ‘house’
NCV: Raga vale ‘house, hut, garden house’
NCV: Nokuku vale ‘shelter’
NCV: Nokuku val-val ‘garden shelter’
PMic *fale ‘meeting house’ (Bender et al. 2003a)
Mic: Puluwat fǣl ‘meeting house’
Mic: Woleai fal, fale- ‘men’s house, club house’
Fij: Bauan vale ‘house’
Pn: Samoan fale ‘house’
Pn: Hawaiian hale ‘house’

In putting together cognate sets, we have sometimes found apparent or uncertain reflexes which do not quite ‘fit’ the set: either they display a phonological irregularity or their meaning is just a little too different from the rest of the set for us to assume cognacy. Rather than eliminate them, we often include them below the cognate set under the rubric ‘cf. also’.

Because our supporting data are drawn from such a wide range of languages, the convention is adopted of prefixing each language name with the abbreviation for the genealogical or geographic group to which the language belongs, so that the distribution of a cognate set is more immediately obvious. The abbreviations are:

Yap: Yapese (one language)
Adm: Admiralty and Mussau/Tench
NNG: North New Guinea
SJ: Sarmi/Jayapura
PT: Papuan Tip
MM: Meso-Melanesian
SES: Southeast Solomonic
TM: Temotu
NCV: North/Central Vanuatu
SV: South Vanuatu
NCal: New Caledonia and Loyalties
Mic: Micronesian
Fij: Fijian and Rotuman
Pn: Polynesian

We have sought to be consistent in always listing these groups in the same order, but contributors vary in the ordering of languages within groups.

Map 1.4: Groups of Oceanic languages used in cognate sets

Lynch’s research on Southern Oceanic (§1.4.3.2) renders the NCV group mildly anomalous, although there is no doubt that it reflects an integrated dialect network. There are a number of etyma whose reflexes are confined to North and Central Vanuatu, and so we continue to include ‘Proto North/Central Vanuatu’ reconstructions. These perhaps represent a Southern Oceanic term that has been lost in southern Vanuatu and New Caledonia. Where the distribution of reflexes requires it, the chapters in this volume include reconstructions for PROc and for PSOc. Etyma with these distributions were attributed to PEOc in volumes 1 and 2, but the distributions are transparent, thanks to the presence of the group labels in cognate sets (cf §1.4.4.2).

In the interests of space we do not give the history of the reconstructions themselves, as this would often require commentary on the modifications made by others and by us, and on why we have made them. Where a reconstruction is not new, we have tried to give its earliest source, e.g. ‘Blust 1987’ above, but this is difficult when earlier reconstructions differ in form and meaning and when their sources are not reported.

In general, the contributions to these volumes are concerned with items reconstructable in POc, PWOc, PEOc, PROc and occasionally Proto New Guinea Oceanic (PNGOc). Etyma for PWOc, PNGOc and PEOc are reconstructed because these may well also be POc etyma for which known reflexes are not well distributed (see discussion in §1.4.4). Reconstructions for lower-order interstages are decreasingly likely to reflect POc etyma and may be the results of cultural change as Oceanic speakers moved further out into the Pacific.

Contributors to these volumes have usually not made fresh reconstructions at interstages superordinate to POc. What they have done, however, is to cite other scholars’ reconstructions for higher-order interstages, as these represent a summary of the non-Oceanic evidence in support of a given POc reconstruction. These interstages are shown in Figure 1.5.

Sometimes non-Oceanic evidence has been found to support a POc reconstruction where no reconstruction at a higher-level interstage has previously been made. In this case a new higher-order reconstruction is made, and the non-Oceanic evidence is given in a footnote.

Whilst we have tried to use the internal organisation of the lexicons of Oceanic languages themselves as a guide in setting the boundaries of each terminology, we have inevitably taken decisions which differ from those that others might have made. There are, obviously, overlaps and connections between various semantic domains and therefore between the contributions here. We have done our best to provide cross- references, but we have sometimes duplicated information rather than ask the reader repeatedly to look elsewhere in the book. Indexes at the end of each volume and in the final volume are intended to make it easier to use the volumes collectively as a work of reference.

5.2. Data

Data sources are listed in Appendix A.

For some reconstructed etyma only a representative sample of reflexes is given. We have endeavoured to ensure, however, that in each case this sample not only is geographically and genealogically representative, but also provides evidence to justify the reconstruction’s shape and gloss. Where only a few reflexes are known to us, this is usually noted.

Although there are accepted or standard orthographies for a number of the languages from which data are cited here, all data are transcribed as far as possible into a standard phonemic orthography based on that used by Ross (1988:3–4) in order to facilitate comparison.21 This means, for example, that the j of the German-based orthographies of Yabem and Gedaged becomes y, Yabem c becomes ʔ, Gedaged z becomes ɬ and so on; the ng of English-based orthographies becomes ŋ; and Fijian g, q and c become ŋ, g and ð respectively.

The following symbols have more or less their usual IPA (Interenational Phonetic Association) values: ð, ɢ, ɣ, h, k, l, ʟ, ɬ, ʎ, m, n, ŋ, ñ, p, q, χ, ɾ, r, s, t, w, x, z, ʔ, a, æ, e, ɛ, ə, i, ɨ, o, œ, ɔ, ʌ, u, ɯ. As far as possible, however, our orthography is phonemic and does not show allophonic variation, so that there are instances where a symbol does not have its usual phonetic value. For example, Wayan Fijian k is a voiceless stop word-initially but [k] is in free or stylistic variation with [ɣ] word-medially. The voiced stops b, d, g and the voiced bilabial trill ʙ are prenasalised in some languages, but prenasalisation is not written unless it is phonemically distinctive. Where a language has just one rhotic, we usually write r, despite the fact that that rhotic is sometimes a flap. Other orthographic symbols (with values in IPA) are:

f [ɸ, f] voiceless bilabial or (less often) labio-dental fricative
v [β, v] voiced bilabial or (less often) labio-dental fricative
c [ts], [ʧ] voiceless alveolar or palatal affricate
j [ʣ], [ʤ] voiced alveolar or palatal affricate
y [j] palatal glide
dr [ⁿr] prenasalised voiced alveolar trill (as in Fijian)
ö [ø] rounded mid front vowel
ü [y] rounded high front vowel

Other superscripts and diacritics are as follows:

  • contrastive long vowels are represented by a macron, e.g. ā;
  • contrastive vowel nasalisation is represented by a tilde, e.g. ã;
  • high and low tone are represented respectively by an acute and a grave accent, e.g. é, è;22
  • labialisation is marked by a superscript w, e.g. ;
  • velarisation is marked by a superscript ɯ, e.g. pᵚ;
  • contrastive aspiration is marked by a superscript h, e.g. ;
  • contrastive devoicing is marked by a small circle beneath, e.g. ;
  • apicolabials are represented by the corresponding apical symbol and the linguolabial diacritic (the ‘seagull’), e.g. ;
  • retroflexes are represented by the corresponding apical symbol with a dot beneath, e.g. .

Except for inflexional morphemes, non-cognate portions of reflexes, i.e. derivational morphemes and non-cognate parts of compounds, are shown in parentheses (…). Where an inflexional morpheme is an affix or clitic and can readily be omitted, its omission is indicated by a hyphen at the beginning or end of the base. This applies particularly to possessor suffixes on directly possessed nouns (see §2.2). Where an inflexional morpheme cannot readily be omitted, it is separated from its base by a hyphen. This may happen because of complicated morphophonemics or because the morpheme is always present, like the attributive -n in some NNG and Admiralties languages and prefixed reflexes of the POc article *na in scattered languages. When a reflex is itself polymorphemic (i.e. the morphemes reflect morphemes present in the reconstructed etymon) or contains a reduplication, the morphemes or reduplicates are also separated by a hyphen.

Languages from which data are cited in this volume are listed in Appendix B in their subgroups or linkages, together with an index allowing the reader to find the subgroup to which a given language belongs. Appendix B also includes alternative language names. The difficulty of deciding where the borderline between dialect and language lies, combined with the fact that these volumes contain work by a number of contributors, has resulted in some inconsistency in the way dialects are labelled in cognate sets. Some occur in the form ‘Lukep (Pono)’, i.e. the Pono dialect of the Lukep language, whilst others are represented simply by the dialect name, e.g. Iduna, noted in Appendix B as ‘Iduna (= dialect of Bwaidoga)’.

5.3. Conventions used in representing reconstructions

Reconstructions are marked with an asterisk, e.g. *Rumaq ‘dwelling house’, in keeping with the standard convention in historical linguistics. POc reconstructions, and also PWOc and PNGOc reconstructions, are given in the orthography of §1.7. For reconstructions at higher-order interstages the orthographies are those used by Blust in his various publications and the ACD. Reconstructions at lower-order interstages are given in the standard orthography adopted for data (§4.2). Geraghty’s (1986) PCP orthography, for example, is based on Standard Fijian spelling, and is converted into our standard orthography in the same way as Fijian spelling is. In practice, this means that the orthographies for PEOc, PROc and PCP are the same as for POc, except that a distinction between *p and *v is recognised and *R is generally absent from PCP.23 Biggs and Clark’s PPn reconstructions are in any case written in an orthography identical to our standard. Bracketing and segmentation conventions in protoforms are shown in Table 1.1.

PMP final consonants are usually retained in POc in absolute word-final position. In many cases decisive evidence for retention or loss can be found in those Oceanic languages that usually retain final consonants. However, there are some cases where it is uncertain whether POc kept a PMP final, as when a PMP etymon is not attested in an Oceanic language that consistently retains POc final consonants. An example is *-d in PMP *palahud ‘go down to the sea or coast’, a term reflected in Oceanic only inlanguages that regularly lose POc final consonants. In such cases the consonant is reconstructed in parentheses, e.g. POc *palau(r) ‘go to sea, make a sea voyage’.

Table 1.1. Bracketing and segmentation conventions in protoforms
(x) it cannot be determined whether x was present
(x,y) either x or y was present
[x] the item is reconstructable in two forms, one with and one without x
[x,y] the item is reconstructable in two forms, one with x and one with y
x-y x and y are separate morphemes
x- x takes an enclitic or a suffix
⟨x⟩ x is an infix

In presenting words that display anomalies of form, it is often necessary to posit an expected form. For example, in §14.6.5.1, the Banoni term raus ‘100’ is accompanied by the note “metathesis of †rasu”, i.e. ‘metathesis of expected rasu’. In this volume we use a less widely employed convention and mark expected forms with a dagger, to distinguish an expected form both from reconstructions and real data.24 Sometimes we need to refer to a reconstructed form that one would expect as the regular reflex of an established POc etymon, but which does not occur because an irregular sound change has occurred. In such cases the dagger and asterisk conventions are used together. For example, in vol.5:99, we reconstruct PNCV *kaRo ‘vine, rope; vein’. It is descended, however, from POc *waRo(c) ‘vine, creeper; string, rope; vein, tendon’, and the expected PNCV form, referred to in our discussion there, would be †*waRo. The dagger marks it as expected but unattested.

When historical linguists compile cognate sets they commonly retain word for word the glosses given in the sources from which the items are taken. However, again in the interests of standardisation, we have often reworded (and sometimes abbreviated) the glosses of our sources, while preserving the meaning. Where glosses were in a language other than English we have translated them. In the interests of space and legibility, and because data often have multiple sources, we have given the source of a reflex only when it is not included in the listings in Appendix A.

Sometimes our authors use the convention of providing no gloss beside the items in a cognate set whose gloss is identical to that of the POc (or other lower-order) reconstruction at the head of the set, i.e. the reconstruction which they reflect. Where necessary, we use ‘(N)’ to indicate that a gloss is a noun, and ‘(V)’, ‘(VI)’, ‘(VT)’ or ‘(VSt)’ to indicate that it is a verb, intransitive verb, transitive verb or stative verb. Because in many environments transitive verbs were regularly formed from the intransitive stem by adding the suffix *-i- (vol.5:24), in many cases the intransitive and transitive verbs are simply shown in sequence, e.g. POc *qalo(p), *qalop-i- ‘beckon with the palm downward, wave’. In such cases, the first verb is always intransitive, the second (in *-i-) transitive.

Within glosses we use the conventional abbreviations ‘k.o.’ (as in ‘k.o. yam’) for ‘kind of’, ‘s.o.’ for ‘someone’ and ‘s.t.’ for ‘something’.

Table 1.2. POc consonants used in reconstructions in the six volumes of this work.
labialised bilabial bilabial dental alveolar palatal velar labialised velar uvular
stop voiceless *pʷ *p *t *c *k *kʷ *q
stop voiced *bʷ *b *d *j *g
trill *r
prenasalised trill *dr
nasal *mʷ *m *n
fricative *s
lateral *l
approximant *w *y

6. Proto Oceanic bound morphology

Proto Oceanic bound morphology is not discussed in this volume, other than in §2.2, as the use of possessor suffixes with inalienably possessed nouns plays a role in reconstructions in chapter 2.

An account of aspects of POc morphology, especially verbal derivational morphology, is given in vol.5:21–26, where it is followed by some comments on the fossilisation of earlier morphology in POc forms (vol.5:26–30).

7. Proto Oceanic phonology and orthography

Work based on the sound correspondences of both Oceanic and non-Oceanic languages has resulted in the reconstructed paradigm of POc consonants shown in Table 1.2. A number of Oceanic (and non-Oceanic) languages attest to the facts that *t was dental, *d alveolar. This is significant in the prehistory of POc discussed below (§1.8.2.3). The POc vowels that occur in our reconstructions are *i, *e, *a, *o, *u.

In the light of recent work it is likely that both the consonant and vowel sets require some revision. We return to this in sections 1.8.2 and 1.8.3.

Lynch (2000a) concludes that POc stress fell on the penultimate mora. Each vowel counted as one mora, and so did the final consonant if there was one. Hence the stress of a word that ended in a vowel like *ku̱tu̱ ‘head louse’ (a mora is indicated by an underscore) fell on its penultimate syllable: *kútu. The stress of a word that had a final consonant, like *ma̱nu̱ḵ ‘bird’, fell on the final syllable: *manúk. Note that an inalienably possessed noun (§2.2) took a possessor suffix, and that this must have resulted in stress shift: *máta ‘eye’, but *matá-gu ‘my eye’. Inalienably possessed nouns are marked with a final hyphen in our reconstructions: *mata- ‘eye’.

Table 1.3. POc orthographies after Grace (1969) and Ross (1988)
Grace oral grade *p *t *d/*r *s *j *k
Ross oral grade *p *pʷ *t *r *s *c *k *kʷ
Grace nasal grade *mp *ŋp/*mpw *nt *nd *nj *ŋk
Ross nasal grade *b *bʷ *d *dr/*nr *j *g
Grace *m *ŋm/*mw *n *w *y *l *q *R *i *e *a *o *u
Ross *m *mʷ *n *w *y *l *q *R *i *e *a *o *u

Table 1.3 shows two POc orthographies. The first was established by Biggs (1965) for PEOc and applied to POc by Grace (1969). It was used with a number of variants, separated by a slash in Table 1.3. The second orthography, used here and in the POc reconstructions in these volumes is from Ross (1988, 1989b), with the addition of *pʷ (introduced without comment by Blust 1984) and *kʷ (Ross 2011). The terms “oral grade” and “nasal grade” belong to the terminology of Oceanic historical phonology (§1.8.1 and §1.8.2).

8. The phonological prehistory of Proto Oceanic

In section 1 we expressed the hope that the material would be a rich source of data for historical linguistics. Section 1.8.2 and its subsections, along with §1.9, report on research based on the POc reconstructions in volumes 1–5. First, however, we recapitulate the currently conventional view of POc phonology.

The widely accepted hypothesis about the provenance of Proto Oceanic is shown in Figure 1.5. It is due to Robert Blust, originally presented in Blust (1977b) and repeated with modifications and accumulated supporting evidence in subsequent publications (Blust 1978a, 1982, 1983–84b, 1993, 2009b). New research based on the reconstructions in volumes 1–5 and summarised in §1.8.2 and its subsections, §1.9.1 proposes that this hypothesis—we will call it the “accepted hypothesis”—should be retired. The fresh research confronts us with the need to reassess the part of the tree that is headed by Proto Central/Eastern Malayo-Polynesian. This leads to a re-evaluation in §1.9.3 of where Proto Oceanic came from.

The conventions used in Figure 1.5 are those outlined in §1.4.3.1 for Figure 1.1. Thus Formosan languages in Figure 1.5 indicates a collection of languages descended (along with PMP) from PAn (Blust 1999). They are spoken in Taiwan, but do not form a subgroup. There was no ’“Proto Formosan”, as Formosan languages and language groups are all descended directly from PAn. Despite references to “Proto Western Malayo-Polynesian”, Western Malayo-Polynesian languages have never been seriously considered a subgroup of Austronesian (Ross 1995b; Adelaar 2004). Smith (2017) provides a set of hypotheses about the groups that make up WMP.25 Their common ancestor is PMP. Recent years have seen renewed research into the Central Malayo-Polynesian languages and those of South Halmahera/West New Guinea, and we turn to this in §1.9.

Figure 1.5: Schematic diagram showing the widely accepted genealogy of the Austronesian family

8.1. The Proto Austronesian and Proto Malayo-Polynesian antecedents of Proto Oceanic phonology

First, though, it is noteworthy that much research on the prehistory of the POc lexicon has focussed on phonological changes that occurred between PMP and POc. This is because PMP and POc are protolanguages clearly defined by shared innovations, the bedrock of the linguistic comparative method, whereas Blust’s two proposed interstages, PCEMP and PEMP (Blust 1978), are only weakly defined.

We give here a conventional account of POc innovations, before revising this history in §1.8.2 in the light of research based on the reconstructions in volumes 1–5.

Map 1.5: The Austronesian language family and the major subgroups according to the standard hypothesis

Table 1.4. Correspondences between PMP and POc protophonemes as currently understood. Shadings are explained in §1.8.2

PAn *p, *b *t, *C *d, *r *s, *z *j *k, *g
PMP *p, *b *t *d, *r *s, *z *j *k, *g
POc oral grade: *p *pʷ *t *r *s *c *k *kʷ
nasal grade: *b *bʷ *d *dr *j *g
PAn *m *n, *-L(-) *w *y *l, *L- *q *R *S
PMP *m *n *w *y *l *q *R *h
POc *m *mʷ *n *w *y *l *q *R *∅
PAn/PMP *i, *-uy(-) *e [ə], *-aw *-ay *a *u
POc *i *o *e *a *u

The Oceanic subgroup is defined by a set of shared innovations relative to PMP. It was on the basis of some of these that Dempwolff (1927, 1937) first recognised his Urmelanesisch (‘Proto Melanesian’) as a major Austronesian subgroup. In the 1937 work he also recognised that Polynesian languages shared the innovations of Urmelanesisch, and so the concept of an Oceanic subgroup entered the literature. However, naming it took a while. Grace (1955) defined the borders of the new subgroup and called it “Eastern Malayo-Polynesian”.26

Meanwhile, Milke (1958) made frequent reference to ozeanisch-austronesische Sprachen (‘Oceanic-Austronesian languages’) and in 1961 finally adopted the terms ozeanische Sprachen and proto-Ozeanisch (‘Oceanic languages’, ‘Proto Oceanic’), which were soon adopted by his colleagues.

Correspondences between PAn, PMP and POc protophonemes are shown in Table 1.4. PAn protophonemes are shown for reference, as the volumes of this work cite PAn reconstructions fairly often.

Certain POc innovations exclusive to Oceanic languages are immediately visible in the form of a number of mergers and splits, highlighted in colour in Table 1.4.

  1. The PMP voiced/voiceless pairs *p/*b, *k/*g and *s/*z and the PMP pair *d/*r each merged respectively as *p, *k, *s and *r in an interstage that we label ‘Proto X’
  2. Proto X *p, *k, *s and *r then split to give POc “oral-grade” *p, *k, *s and *r and “nasal-grade” *b, *g, *j and *dr (the “grade” terms are explained in §1.8.2). Although *t did not participate in the merger in (a), *t did participate in the split, with POc oral-grade *t and nasal-grade *d.
  3. A small complication is that PMP *j did not participate in the merger in (a), but did participate in the split in (b), its POc nasal grade merging with that of *s.

Ozanne-Rivierre (1992) suggests that the corresponding *t/*d merger was hindered by the mismatch in point of articulation between dental *t and alveolar *d, a mismatch attested in many non-Oceanic Austronesian languages.

Table 1.5 is a corrected and expanded version of the table in Blust (2013:599) showing examples of PMP reconstructions and their POc continuations. It illustrates the combined effect of (a) and (b): each of the PMP pairs *p/*b, *k/*g, *s/*z and *d/*r first merged and then split. The set of changes in (a) and (b) alone is unusual enough to be strong evidence for the integrity of the Oceanic subgroup.

Table 1.5. Examples of PMP reconstructions and their POc continuations showing the effects of the mergers and splits giving rise to POc consonant grade
segment PMP POc grade gloss
*p- pitu pitu oral seven
*p- punay bune nasal pigeon
*-p- hapuy api oral fire
*-mp- t-umpu tubu nasal ancestor
*b- bulan pulan oral moon
*b- beRek boRok nasal pig
*-b- qabu qapu oral ashes
*-mb- ambit abit nasal hold in hand
*t- taqun taqun oral year
*t- (nasal)
*-t- qutin qutin oral penis
*-nt- -nta -da nasal P:1INC.PL
*-nt- punti pudi nasal banana
*d- duha rua oral two
*d- daRaq draRaq nasal fresh water
*-d- kuden kuron oral cooking pot
*-nd- pandan padran nasal pandanus
*s-s- susu susu oral breast
*s- siRi jiRi nasal a shrub: Cordyline
*-s- ŋusuq ŋuju- nasal lips, snout, beak
*z- zaqat saqat oral bad
*-z- quzan qusan oral rain
*z- zalan jalan nasal path, road
*-z- tazim tajim nasal sharp
*k- kali kali oral dig
*k- kumuR gumu nasal gargle, rinse mouth
*-k- seka soka oral pierce, stab
*-ŋk- laŋkaw lago nasal tall, long
*g- gaway kawe oral octopus tentacle
*g- gemgem gogom nasal hold in fist
*-g- liget likot oral turn, rotate
*-g- (nasal)

Another set of innovations is the introduction of the labiovelars *pʷ, *bʷ, *mʷ and *kʷ into Proto Oceanic (Blust 1981a; Lynch 2002e; Ross 2011). Many items containing a labiovelar lack non-Oceanic cognates, and some, at least, must have been borrowed into POc from neighbouring Papuan languages. For example, *mʷapo(q) ‘taro’ was apparently borrowed by POc speakers as they copied taro-growing techniques from Papuan speakers (vol.3,267). In some inherited items a labial became a labiovelar next to a round vowel, but it is not clear whether the labiovelar actually occurred in POc. Thus a number of Oceanic languages reflect *tamʷata ‘man, husband’, derived from *tau ‘body, person’ + *mataq ‘unripe, immature, young’, but we cannot be sure whether *tamʷata or *taumata(q) was the POc form (vol.5:43–44).

Collectively, innovations affecting the vowels are also exclusive to Oceanic, although individually each of them occurs in various non-Oceanic languages:

  1. PMP *e, phonetically [ə], became POc *o.
  2. PMP word-final diphthongs *-uy(-), *-aw and *-ay were simplified to POc *-i, *-o and *-e respectively, the first two thereby merging with plain vowels.27

A further innovation that has come to light during work on these volumes concerns certain PMP trisyllabic roots with *-e- (*[ə]) as the nucleus of their penultimate syllable. These trisyllables lost *-e- in POc, along with one consonant of the resulting consonant cluster:

PMP *biseqak POc *pisa(k)~*pisak-i- ‘split’ (vol.1:261)
PMP *ma-udehi POc *muri ‘be behind’ (vol.2:251)
PMP *tuqelan POc *tuqan ‘bone’ (vol.5:85)
PMP *baReqaŋ POc *paRa(ŋ) ‘molar tooth’ (vol.5:133),
PMP *buteliR POc *putiR ‘wart’ (vol.5:344).
PMP *buqeni POc *puni ‘ringworm, Tinea imbricata’ (vol.5:346)
PMP *ma-heyaq POc *maya(q) ‘shy, embarrassed; ashamed’ (vol.5:585).

The conditioning of this change remains unclear, as it did not affect the etyma below:

PMP *maqesak POc *maosak ‘ripe, cooked’ (vol.1:157),
PMP *baqeRu POc *paqoRu ‘new’ (vol.2:203),
PMP *qateluR POc *qatoluR ‘egg’ (vol.4:278)
PMP *qulej-an POc *quloc-a(n) ‘maggoty’ (vol.4:415).

PMP *qalejaw/POc *qaco ‘daylight, sun’ (vol.2,153–155) appears exceptionally to have lost the first consonant of the cluster, but there is evidence that a PAn variant *qajaw was ancestral to POc *qaco.

8.2. Reinterpreting the origins and distribution of POc oral- and nasal-grade consonants

This section presents a revision of the history sketched in §1.8.1, as promised there.

Figure 1.6 diagrams three accounts of the history of POc *p and *b. In the first two accounts ‘(N)’, “nasal grade”, implies that POc *b reflected an earlier nasal + obstruent sequence (*mp, *mb) and was perhaps prenasalised (POc *[ᵐb]). The terms “oral grade” and “nasal grade” were coined by Grace (1959:27) to refer to the pairs of POc obstruents that had been recognised by Dempwolff (1927).

Figure 1.6: Three analyses of the phonological history of POc *p and *b

Dempwolff inferred that PMP *p and *b, for example, merged as POc *p, while PMP *mp and *mb merged as POc *b.28 He made parallel assumptions about PMP *k/*g versus PMP *ŋk/*ŋg, and PMP *s/*z/*j versus PMP *ns/*nz/*nj.29 He also assumed that, e.g., PMP *p and *mp, or *b and *mb, were in free variation and that they became fossilised randomly in each Oceanic daughter-language, such that a word might begin with a reflex of *p in one daughter-language but a reflex of *mp in another.

Despite the obvious improbability of this assumption and the frequent discussions of consonant grade, reviewed by Grace (1990), the randomness assumption was maintained in some form until the publication of Ross (1988).30 The latter found that in the vast majority of POc etyma with one or more “graded” consonants, the grade of each consonant can be reconstructed unambiguously because its Oceanic reflexes agree in grade, a finding supported by the cognate sets in the present work. The illusion of randomness had two sources. First, although Milke (1968) had correctly identified POc *j (his *nj) as the nasal-grade consonant paired with oral-grade *s, most scholars assumed that various lenited reflexes of *s reflected the nasal grade, so that the pair of *s grades seemed almost chaotic (Ross 1988:71–93; 1989b). Second, various regular local processes such as Admiralties secondary nasal grade (Ross 1988:337–341) and Eastern Fijian apical prenasalisation (Geraghty 1983:74–96) had masked consonant grade in some languages.

The fact that consonant grade can be reconstructed without ambiguity in most POc etyma largely rids POc of Dempwolff’s posited randomness, but, as the middle panel in Figure 1.6 indicates, PMP *p and *b must have merged as Proto X *p, which then split into POc *p and *b. Similar processes applied to PMP *k/*g and *s/*z/*j. This is the position adopted in the introductions to volumes 1 to 5 of this work. Ross (1988) retained the assumption that the POc voiced obstruents were “nasal grade”, i.e. reflected nasal + obstruent sequences. He attempted unsatisfactorily to explain the splits as the effects of derivational morphology (Reid 2000).

This still leaves two questions about the origin of POc consonant grade unanswered:

  1. How did the POc splits come about?
  2. Do POc “nasal-grade” consonants have a nasal origin?

As a result of new research based on the POc reconstructions in volumes 1–5, we have a partial answer to (a) and a definitive answer to (b), shown in the righthand panel of Figure 1.6. Following Proto X (§1.8.1), this panel shows two further interstages, “ePOc” and POc. “POc” denotes the language reconstructed in these volumes, equated with its state immediately before its break-up into daughter-languages (Pawley 2008); and “ePOc” denotes “early POc”, a stage sometime before POc, but after its speakers settled in the Bismarck Archipelago.

Comparing reconstructions in previous volumes with their ancestral PMP forms in the acd, we find that ePOc had three grades of obstruent: voiceless, voiced and prenasalised. Its voiceless obstruents are Grace’s oral-grade segments, but a majority of his “nasal-grade” segments reflect plain voiced obstruents. The prenasalised obstruents are true nasal-grade obstruents, reflecting inherited nasal + obstruent clusters. They may be inherited from PMP or from a more recent ancestor. This is the situation depicted in the righthand diagram of Figure 1.6, where the grey of the prenasalised obstruents indicates their rarity.

8.2.1. The POc voiceless and voiced obstruents

Our database of POc reconstructions from volumes 1–5, along with their PMP ancestral forms (drawn directly from the ACD), contains 729 etyma.31 In total these reconstructions contain 429 initial and medial instances of the the PMP obstruents listed in the leftmost column of Table 1.6. The columns headed ‘> POc’ show the voiceless and voiced outcomes of the PMP phonemes (prenasalised ePOc outcomes are discussed in the next subsection). To the right of each POc obstruent in Table 1.6 are shown its number of instances as an absolute figure and as a percentage of the PMP obstruent in the leftmost column.

Table 1.6. Instances of PMP obstruents and their POc voiceless and voiced reflexes
POc voiceless reflexes POc voiced reflexes
PMP total > POc total % > POc total %
*p 94 *p 82 87.2 *b 12 12.8
*b 128 *p 101 78.9 *b 27 21.1
*s 75 *s 69 92.0 *j 6 8.0
*z 14 *s 10 71.4 *j 4 28.6
*-j- 17 *-c- 13 76.5 *-j- 4 23.5
*k 93 *k 91 97.8 *g 2 2.2
*g 8 *k 8 100.0 (*g) 0
*C 429 *Cvoiceless 374 87.2 *Cvoiced 55 12.8

The table tells a somewhat unexpected story. Only 13 per cent of the instances of PMP obstruents end up as POc voiced obstruents. It is also unclear whether Proto X *k actually split into POc *k and *g. PMP *p/*b, *k/*g and *s/*z each evidently merged as the Proto X phonemes *p, *k and *s. Proto X *p and *s then split into POc *p/*b and *s/*j respectively. If Proto X *k split, the outcome is inconsequential. Only eight instances of PMP *g occur in the first place, against 93 instances of PMP *k. No instances of PMP *g end up as POc *g, and just two instances of PMP *k do so.

As noted above, PMP *t (129 instances) did not participate in these processes and is always reflected as POc *t. PMP *r, with 27 instances, is omitted from the table because all its POc outcomes are *r. PMP *d probably underwent a split, but the split was in prenasalisation, not in voicing (§1.8.2.3).

8.2.2. The POc prenasalised obstruents

POc reflexes of PMP nasal + obstruent clusters are omitted from Table 1.6, as the numbers of reflexes are generally few and would skew the table’s percentages. Instead, POc reflexes of these PMP clusters are shown separately in Table 1.7. The instances are all in etyma drawn from the ACD (and found among the POc reconstructions in volumes 1-5). Instances of nasal + obstruent clusters that arose sometime between the break-up of PMP and the break-up of POc are not shown in Table 1.7, as they would obscure the relationship between PMP and POc.

PMP nasal + obstruent clusters are reflected as POc unitary phonemes. In fact their POc outcomes appear to be the same as those of PMP voiceless and voiced obstruents in Table 1.6, but we argue below in §1.8.2.4 that this is incorrect, and reconstruct ePOc prenasalised rather than voiced outcomes in Table 1.7. The PMP clusters are shown in the table as *-Np-/*-Nb- etc as there are instances where the cluster is not homorganic. Some are the result of reduplication of a monosyllable, e.g., PAn/PMP *demdem ‘dark, gloomy, overcast’, attested with -md- in Formosan and many Philippine reflexes (ACD), but becoming *dendem at some intermediate interstage and thence POc *rodrom (vol.2:308). POc *-dr- is a unitary phoneme reflecting earlier *-nd- (PCEMP *-nd- according to Blust 1977a).

Table 1.7. Instances of PMP nasal + obstruent clusters and their POc reflexes
POc voiceless reflexes ePOc prenasalised reflexes
PMP total > POc total > ePOc total
*-Np- 4 *-p- 2 *-ᵐb- 2
*-Nb- 12 *-p- 5 *-ᵐb- 7
*-Nk- 13 *-k- 8 *-ᵑg- 5
*-Ng- 2 *-k- 0 *-ᵑg- 2
*-Nt- 6 *-t- 3 *-ⁿd- 3
*-Nd- 3 *-r- 1 *-ⁿr- 2
*-Ns- 2 *-s- 1 *-ñj- 1
*-Nz- 1 *-s- 1 *-ñj- 0
totals 44 21 22

Blust (2022) shows that homorganic nasal + obstruent clusters were present in PMP, but were rare, as Table 1.7 confirms. Their very rarity has meant that scholars have paid little attention to them as a discrete category (Collins 1983 and Mills 1991 are exceptions). Further, reconstructions in the ACD for PCEMP, the next node below PMP in Blust’s tree (Figure 1.5), show little sign of acquiring nasal + obstruent clusters, other than those resulting from reduplications.

The ACD includes just four PCEMP items which contain nasal + obstruent clusters and have no cognates outside CEMP. They are:32

PCEMP *tambu POc *tabu ‘forbidden, taboo’ (this volume, chapter 10)
PCEMP *kandoRa POc *kadroRa ‘cuscus’ (vol.4:225)
PCEMP *waŋka POc *waga ‘canoe’ (vol.1:178)
PCEMP *mans[ə,a]r POc *mʷaja(r,R) ‘bandicoot’ (vol.4:228)

Table 1.5 illustrates the fact that voiced and prenasalised obstruents are conventionally treated as a single—nasal-grade—POc category, as their reflexes in almost all Oceanic languages are identical. Of the POc medial nasal-grade items in that table, those reflecting PMP *t-umpu, *ambit, *-nta, *punti, *pandan and *laŋkaw ancestrally had a nasal + obstruent cluster, while those reflecting *ŋusuq and *tazim did not. Only 22 POc “nasal-grade” consonants in our database were descended from nasal + obstruent clusters (Table 1.7). Fifty-five reflect PMP plain voiceless or voiced obstruents (Table 1.6).

Table 1.6 allows us finally to understand where POc voiced initial consonants came from. Ever since Dempwolff (1927) the default assumption has been that they reflected nasal + obstruent clusters, with scholars trying—and failing—to find grounds to reconstruct ancestral initial nasal + obstruent clusters (Milner 1965; Ross 1988:39–43; Grace 1990; Reid 2000). The reason for the failure is now evident: POc initial “nasal- grade” obstruents actually reflect PMP plain voiceless or voiced obstruents (Table 1.6). PMP nasal + obstruent clusters were always medial (Table 1.7). They never occurred initially.

8.2.3. PMP *t, *d and *r

We have seen that PMP *t and *d did not form a voiceless/voiced pair, as they had different points of articulation.

With regard to PMP *t, there is a mismatch between the findings reported in Table 1.6 and Table 1.7. The former reports that PMP *t did not undergo the merger-and-split sequence that affected PMP *p and *s, and therefore did not give rise to POc “nasal-grade” (voiced) reflexes. Hence PMP initial *t is never reflected as POc *d. But Table 1.7 reports three POc etyma reflecting PMP *-nt-, namely the P:1INC.PL suffix *-ⁿda (< PMP *-nta < *=ni-ta; Blust 1977b), *puⁿdi ‘banana’ (< PMP *punti) and *maⁿdala(q) ‘the morning star’ (< PMP *mantalaq-). This was the sole source of “nasal-grade” reflexes of *t, and the overall rarity of earlier nasal + obstruent sequences explains why POc has so few reflexes of *-nt-.

POc *r and *dr, outcomes of the split of PMP/Proto X *d, have conventionally been treated as one of the POc oral-/nasal-grade phoneme pairs (§1.8.2.1). Within the earlier framework this characterisation was correct, as the POc phonological contrast was evidently *[r] vs *[n(d)r].33 However, we have above recast the conventional POc oral-/nasal-grade pairings as voiceless/voiced pairings. But the feature that distinguishes *dr from *r is prenasalisation, not voicing, so it does not belong to this pair set.

Our database has 40 instances of PMP *d, of which 33 are reflected as POc *r and seven as POc *dr. PMP *r, with 27 instances, is omitted from Table 1.6 because all its POc outcomes are *r. At some point the *r reflexes of PMP *d and *r merged as POc *r.

8.2.4. More evidence for POc prenasalised obstruents

In most Oceanic languages the proposed POc voiced (§1.8.2.1) and prenasalised (§1.8.2.2) phonemes at each point of articulation have merged. The evidence that they were once separate is based primarily on the different sources of each and on the fact that the theory accounts neatly for the relative rarity of reflexes of PMP *-nt-. Had they already merged in POc? In this section we propose that they had not, because there is evidence from five Western Oceanic languages that the distinction between voiced and prenasalised obstruents posited for ePOc was retained in POc.

We know of five Western Oceanic languages that contrast voiceless, plain voiced and prenasalised voiced obstruents. They are Mangap (now better known as Mbula), Sio, Tami, Numbami and Sudest. The only close examination of the contrasts that persist in one of these languages is Bradshaw (1978) on Numbami. The first four languages are located in the area of greatest diversity within the North New Guinea cluster, and are not especially closely related, making them possible candidates for retaining an ancient feature. Sudest is a Papuan Tip language. Contra Ross (1988:192) the immediate ancestor of Sudest and Nimowa now appears to have been the first language to break away from the rest of the early Papuan Tip family, making Sudest another candidate for ancient retentions.34 We refer to these five languages as the “distinction-retaining languages”.

The obstruent series in the distinction-retaining languages are:

Mangap
p t k
b d g
ᵐb ⁿd ᵑg
Sio
p t k
b d g
ᵐbʷ ᵐb ⁿd ᵑg
Tami
p t s k
b d j g
ᵐbʷ ᵐb ⁿd nj ᵑg ᵑgʷ
Numbami
p t s k
b d z g
-ᵐb- -ⁿd- -ⁿz- -ᵑg-
Sudest
p t s k
b d j g
ᵐbʷ ᵐb ⁿd nj ᵑg ᵑgʷ

A preliminary search for cognate sets reflecting POc etyma that include prenasalised consonants reveals an interesting pattern. A small group of etyma is almost always reflected with the prenasalised consonant intact, while a larger collection of etyma is reflected unpredictably with a mixture of plain voiced and prenasalised voiced reflexes. This larger collection suggests that in these items, plain and prenasalised consonants are gradually falling together into a single category. The membership of the small group of cognate sets is significant, as its members include some sets that reflect POc etyma that on independent evidence contained prenasalised obstruents in PMP or PCEMP.

Thus Blust (1977b) reconstructs PMP possessor suffixes that were prenasalised because they consisted of the morph ni + pronoun. They retain their prenasalised obstruents in ePOc:

*-ᵑgu P:1SG < PMP *-ŋku < *=ni-ku
*-ⁿda P:1INC.PL < PMP *-nta < *=ni-ta
*-dra P:3PL < PMP *-nda < *=ni-da

The first two of these are reflected in the distinction-retaining languages. The P:3PL suffix was replaced by PWOc *-dri.35 At some point prenasalisation has been copied onto this etymon.

P:1SG P:1INC.PL P:3PL
PMP *-ŋku *-nta *-nda
ePOc *-ᵑgu *-ⁿda *-ⁿra
POc *-gu *-da *-dra, PWOc *-dri
Mangap -ndV -n
Sio -ŋgu -nda -nzi
Tami -n -n
Numbami -ŋgi -ndi -ndi
Sudest -ŋgu -nda -nji

Further etyma with independent evidence of PMP or PCEMP prenasalised obstruents and reflected in the distinction-retaining languages are given below. A few comments are necessary. The blanks represent cases where, as far as we know, the etymon is not reflected in the relevant language. This pattern reflects the level of lexical replacement in Oceanic languages around the coasts of New Guinea.

‘pandanus’ ‘sago’ ‘canoe’ ‘betelnut’ ‘banana’
PMP/PCEMP *paŋdan *R(a,u)mbia *waŋka *buaq *punti
ePOc *paⁿran *Raᵐbia waᵑga *ᵐbuaq puⁿdi
POc *padran *Rabia *waga *buaq *pudi
Mangap pānda wōŋgo mbu pin
Sio ponda rambia woŋga
Tami lambi waŋ mbu pun
Numbami waŋga buwa undi
Sudest mbi waŋga

PMP *paŋdan acquired its nasal + stop sequence by losing *-u- from PAn *paŋudaN, leaving no doubt that the POc form had a prenasalised consonant. The evidence for the other forms above is less pressing, but they all have so many WMP reflexes with a nasal + stop cluster that one can be confident that the PMP or PCEMP form had the cluster, which was inherited into ePOc as a prenasalised obstruent (*waŋga is PCEMP). This is true of *punti, but if the argument about PMP *-nt- in §1.8.2.3 holds, then the POc form can only be a prenasalised stop.

POc *ᵐbuaq appears to be unique in having a prenasalised initial. The story of this form is difficult to reconstruct. According to the ACD’s version, PAn *buaq continued until POc, where it split into oral-grade-initial *puaq ‘fruit (including betelnut)’ and nasal-grade-initial *buaq ‘betelnut’. The mechanism of the split is unknown, but evidence shows that it occurred earlier than POc, as it is reflected in some Wallacean languages.36

These cognate sets attest to the presence of ePOc *ᵐb, *ⁿd and *ᵑg in addition to the consonants in Table 1.2. Given that this preliminary search in distinction-retaining languages was confined to the 200-word lists in the Austronesian Basic Vocabulary Database (Greenhill et al. 2008) with some small additions from single-language sources,37 the result is quite telling.

How do we account for the data from the distinction-retaining languages, four belonging to NNG, one to PT? More research is needed, but the account with the best fit says that they retain a distinction that was present in early POc, but lost in the vast majority of its daughter-languages. This represents drift, i.e. independent parallel innovation, probably due to the paucity of lexical items containing a prenasalised obstruent. Because almost all Oceanic languages lack the distinction between plain voiced and prenasalised voiced obstruents, researchers, including ourselves, have reconstructed POc without it. But since a few WOc languages retained the distinction at the time POc broke up, it should be reconstructed for POc.That is, “ePOc” and “POc” in the righthand panel of Figure 1.6 need to be recalibrated. “ePOc” is the real Proto Oceanic, and “POc” reflects the merger that by the time of its break-up had probably occurred in the dialects ancestral to all non-WOc languages, and in many WOc dialects too.

8.3. Revising the history of Proto Oceanic vowels?

Lynch (2022) argues entirely on the basis of Oceanic evidence that the POc vowel system was not the neat conventionally accepted five-vowel system shown in §1.8.1, but a system partway between the PMP four-vowel system of *i, *e [ə], *a, *u and the five-vowel system that emerged later in most Oceanic languages. We showed in §1.8.1 that in the conventional view the sources of POc vowels were as follows:

  • POc *i < PMP *i, *-uy(-)
  • POc *u < PMP *u
  • POc *a < PMP *a
  • POc *-e < PMP *-ay
  • POc *o < PMP , -aw

Lynch suggests that the POc system of non-final vowels (i.e. discounting POc *-ay and *-o from PMP *-ay and *-aw) was one of the following three:

(A) *i *u
*a
(B) *i *u
*o
*a
(C) *i *u
*e *o
*a

Lynch’s revision suggests no change to the origins of high *i and *u or low *a. It is the mid vowels that changed, but he is uncertain when. His system A infers that there had been no change in the PMP system by the time POc broke up. Systems B and C both assume that PMP was in the process of becoming *o when POc dispersed, and C assumes that also became POc *e under certain conditioning.

9. Where did Proto Oceanic come from?

The conventional answer to the question, “Where did Proto Oceanic come from?”, is the accepted hypothesis in Figure 1.5. It says that POc is the sibling of PSHWNG, and the two are the only children of PEMP (Blust 1978). PEMP in its turn is a sibling of the CMP languages, and they are all children of PCEMP (Blust 1982, 1983–84b, 1993). The latter is a sibling of WMP languages and a child of PMP. To our knowledge, no scholar disputes the claim that POc is descended from PMP. However, two recent pieces of research raise the need to look more closely at the intervening stages between PMP and POc.

The first, Kamholz (2014), uses a much larger body of evidence to establish the integrity of Blust’s (1978) PSHWNG on the basis of shared innovations. Kamholz does not examine the probity of PEMP, but the innovations that define his PSHWNG are different enough from those defining POc to invite a re-examination of the PEMP hypothesis.

The other work is Grimes & Edwards’ (in prep.) analysis of available CMP data. They identify eight CMP subgroups on the basis of mostly shared phonological innovations. They find areal similarities, some of them probably consequences of one or more Papuan substrates (see also Schapper 2015; 2018), but no significant exclusively shared innovations across subgroups, and thus no evidence for a putative Proto Central Malayo-Polynesian.

Blust (1993, 2009b) views the CMP languages as a linkage on the basis of innovations that chain (§1.4.3.1) various groups together,38 but Grimes & Edwards find little evidence to support such an analysis. Blust’s arguments for PCEMP have evoked vigorous criticism (Donohue & Grimes 2008; Schapper 2011) and responses (Blust 2009b, 2012). The lack of evidence for Proto Central Malayo-Polynesian logically entails abandoning PCEMP as well, and this leaves a gap in the the prehistory of POc according to the accepted hypothesis.

Kamholz and Grimes & Edwards indirectly prompt a further look at two POc-related questions:

  1. Are the SHWNG languages the closest relatives of Oceanic?
  2. How are SHWNG and Oceanic related to CMP groups?

Our answer to (a) is, no, the SHWNG languages are probably not the closest relatives of Oceanic. Our answer to (b) is that SHWNG appears more closely related to some of the CMP groups than to Oceanic, while the relationship of Oceanic to CMP languages is ambiguous, implying that it may have branched off the Austronesian tree separately from CMP, perhaps at a node from which various CMP groups branched, or perhaps at a higher node. We can only give a summary of findings here (for more detail see Ross, in prep.).

One other answer to the question, “Where did Proto Oceanic come from?” is implicit in the literature, and it would be remiss of us not to mention it. Bellwood (2011) suggests that Lapita pottery displays a likeness to contemporaneous pottery from the Marianas Islands in Micronesia. As far as we know, the only language then spoken in the Marianas was an earlier form of Chamorro, which originated in the northern Philippines (Blust 2000a). Bellwood’s hypothesis might imply a flow of early Chamorro speakers into the Bismarck archipelago, but there is no linguistic indication of such a presence in POc or its descendants.39

9.1. Blust (1978) on PEMP

Much of Blust (1978), the seminal work on PEMP, is devoted to demonstrating the integrity of SHWNG. Kamholz’s (2014) analysis agrees. A smaller part of Blust’s paper is devoted to PEMP, i.e. to innovations shared by SHWNG and POc. Blust offers 53 shared lexical innovations, but no shared phonological or morphosyntactic innovation

Claiming an exclusively shared lexical innovation carries with it an inherent risk. Might not the next dictionary of a non-EMP language include a cognate that renders the innovation non-exclusive and thereby non-probative? Of the 53 innovations, Ross (in prep.) rejects 32, or 60%, for the following reasons:

  • Eight are also found in one of the CMP groups to the west and south of SHWNG. The groups are, in Grimes & Edwards’ terminology, Seram-Tanimbar-Bomberai (6 innovations), Ambon-Seram (2), and Sula-Buru (1) (Map 1.6).40
  • Seven have cognates in WMP languages.
  • For 14, Ross was unable to verify the supporting data. Their PMP reconstructions are absent from the ACD, implying that Blust later abandoned them.
  • One, *ma- ‘directional particle’, is likely to be the result of drift, i.e. independent parallel innovation.
  • One, *dui ‘dugong’, is interpreted as an idiosyncratic innovation in the word form, but it is the outcome of regular phonological changes.
  • One, *mawa ‘enclosed space’, appears to be a chance resemblance.

Map 1.6: Grimes & Edwards’ Wallacean groups mentioned in the text

9.2. Phonological innovations in Oceanic and Wallacean languages

It is convenient to refer to CMP and SHWNG languages together as the Austronesian languages of “linguistic Wallacea” (Schapper 2016), or, more simply in the present context, as Wallacean.

Table 1.8 shows innovations in consonants in the protolanguages of Oceanic and various Wallacean subgroups including SHWNG and others clustered close to it.41 The table makes no reference to innovations that occur in smaller subgroups within those shown. Often one or more of the innovations listed in the table does not occur in a subgroup’s parent language but does occur in lower-order subgroup(s) within it. This is part and parcel of the Wallacean pattern of shared innovations whereby isoglosses intersect, forming possible linkages. However, close inspection of the innovations shows that they affect certain PMP consonants across two or more Wallacean groups, suggesting that drift resulting from pressures on similar consonant systems is as likely a cause as shared inheritance.

Table 1.8. Consonant innovations in the parent languages of Oceanic and Wallacean subgroups (key beneath table)
PMP > Oceanic SHWNG Ambon-Seram Seram-Tanimbar-Bomberai Aru Sula-Buru
*p > *f yes yes yes init
*p > *f > *h med
*p > *b some
*b > *p some yes yes yes
*t > *s/__*i_ yes
*mp/*mb > *ᵐb some yes yes ?
*mp/*mb > *ᵐp yes
*nt/*nd > *ⁿd yes yes yes yes
*d > *d-r- some yes yes
*d > *r some yes yes yes
*d > *dr [ⁿr] some
*d/*z > *d yes
*d/*z > *r yes
*d/*l > *r yes
*-j-/*s > *s yes
*-j-/*s > *j [ɟ] some
*-j-/*l > *l yes some
*-j- > 0̸ some
*-j-/*R > *R yes yes
*z/*s > *s yes
*z/*y merge yes
> *n yes
*q > *0̸ yes some yes yes
*qa- etc lost yes some yes yes
  • ‘some’ indicates that the change unpredictably applies to some etyma but not others;
  • an empty cell means ‘no’.

The innovation listed as ‘*qa- etc lost’ in the bottom row of Table 1.8 needs an explanation. It refers to the fact that words of three or more syllables of which the first PMP syllable was *qa- or *ha- regularly lose that syllable in most Wallacean languages. This loss is probably associated with the loss of *q- or *h-, which is almost universal in Wallacean languages. Just one language, Watubela of the Seram-Tanimbar-Bomberai group, clearly retains *q as k, meaning that its retention must be reconstructed to Proto Seram-Tanimbar-Bomberai. Thus, for example, PMP *qateluR ‘egg’ is regularly reflected as POc *qatoluR (vol.4:278–279) and Watubela katlu, but as PSHWNG *tolo (Taba tolo, Mayá tól, Umar tor), Uyir tuli (Aru), Maswiang tolin (STB), Paulohi terur (AS).

What mainly concerns us in Table 1.8 is not the details of the innovations but their patterning and particularly the considerable differences between Oceanic and the Wallacean groups. It is immediately clear that SHWNG innovations pattern more closely with those of other Wallacean subgroups, and barely at all with Oceanic.

As for the innovations of Oceanic, only one, the merger of PMP *s and *z as POc *s, is shared with a Wallacean group, Central Timor, far away from Oceanic. This is presumably a case of independent parallel innovation.

An obvious feature of POc in Table 1.8 is the number of cells containing ‘some’, indicating that the change applied to only some etyma. These refer to the obstruent splits noted in Table 1.6 and the associated discussion in §1.8.2.1 and §1.8.2.3.

Their significance here is that the merger-then-split pattern that gave rise to POc obstruent pairs has not occurred in the history of any Wallacean group. Table 1.9 shows PMP obstruents along with their PSHWNG and POc reflexes. The PSHWNG column shows one reflex for each PMP obstruent and for each PMP pair of nasal + obstruent clusters. This organisation is representative of all Wallacean groups as Grimes & Edwards (in prep.) reconstruct their histories. The POc column, however, shows the pairs of reflexes discussed earlier.

As an example, Figure 1.7 sets out the changes in PMP *p and *b, as they are reflected in PSHWNG and in POc. The PSHWNG changes are simple, and are similar to those in other Wallacean languages. The POc changes are more complex. Both PSHWNG and ePOc have three labial consonants, but they have developed along different routes.42

Table 1.9. PMP obstruents and their PSHWNG and POc reflexes
PMP PSHWNG POc
Bilabial *p *f *p/*b
*b *p *p/*b
*-Np-/*-Nb- *b *p/*ᵐb
Dental *t *t *t
*-Nt- *d *ⁿd
Alveolar *d *r *r/*dr
*-Nd- *d *dr
Alveolar *s *s *s/*j
*z *z *s/*j
*-Ns-/*-Nz- ? *s/*ñj ?
Velar *k *k *k/*g ?
*g ? *k
*-Nk-/*-Ng- *g *ᵑg

Figure 1.7: The phonological histories of PSHWNG and POc reflexes of PMP *p and *b

9.3. Conclusion: so where did Proto Oceanic come from?

Where then did Proto Oceanic come from? The phonological history that gave rise to the patterns in Table 1.9 is unlike that of the Wallacean languages and significantly more complicated. No Wallacean language—and as far as we know, no WMP language—underwent a set of obstruent mergers like those that gave rise to Proto X, followed by the set of splits that gave rise to the POc. Wallacean languages other than the Sula-Buru group, however, display a merger, of PMP *-Nt- and *-Nd-, where POc has no merger. This implies that the ancestor of POc was separate from the ancestor(s) of the Wallacean languages when the Wallacean merger occurred.

These differences, along with those in Table 1.8, indicate that POc has a history that is markedly different from those of the Wallacean languages, including SHWNG, and that Blust’s PEMP hypothesis is not valid, even though it was perfectly reasonable when it was proposed forty-five years ago. The question is, what do we replace it with? It is now obvious that it is not a Wallacean offshoot, so where did it come from, genealogically? We don’t know.

Figure 1.8: Schematic diagram showing the implications of our analysis for the genealogy of the Austronesian family.

Figure 1.8 shows our dilemma. Do the Wallacean languages and POc have a common ancestor? There is some lexical evidence that they do, in the shape of the PCEMP etyma in the ACD and the 1978 PEMP etyma that are now known to have Wallacean cognates (§1.9.1), but, as we have observed, using lexical data in this way has disadvantages. These are matters for future research.

Meanwhile, we can say that using the lexical reconstructions in volumes 1–5 as sources for phonological history has proven to be a fitting conclusion to the present work.

Notes