4.1. Materials and Methods
The Danish data for relative clause extractions were collected using KorpusDK. In light of the large quantity of relative clause extraction sentences that we found on KorpusDK, we decided to restrict the search to KorpusDK and did not extend the investigation to include BySoc, SamtaleBank or Google for the purpose of finding Danish relative clause extractions. Note also that
Jensen (
1998,
2002) has previously searched BySoc for extraction instances and retrieved the cases of relative clause extraction occurring in BySoc (see
Section 2.1). The English part of the search was once again carried out using the corpora BNC and COCA, as well as Google. Because we were not able to find any instances of relative clause extraction on BNC or COCA, we extended the search to also include the corpus of Global Web-based English (
Davies 2013, GloWbE, 1.9 billion words), which contains material from various websites in 20 different English-speaking countries.
For the search on KorpusDK, the search strings described in
Section 3.1 were reused with some adaptions to target relative clause extraction instead of adjunct clause extraction. Thus, the position specifying the adjunct clause subordinator (e.g.,
hvis) was replaced by the relative pronouns
som and
der and preceded by 1–4 optional words (rather than 0–3) to allow for a head noun of the relative clause, see the example search string in (21) (targeting topicalization from a relative clause).
21. | [ortho = “(Den|Det|De)”] []{0, 2} [pos = “N”] [pos = “V”] [pos = “PERS”] []{1, 4}[ortho = “(som|der)”] |
Also for English, the search strings used to target adjunct clause extractions (described in
Section 3.1) were reused, to the extent that this was possible, and adapted to target relative clauses instead of adjunct clauses. For instance, relativization from relative clauses was targeted with strings like (22a–b), searching for a noun phrase followed by one of the relative complementizers
that,
which or
who. Instead of an adjunct clause subordinator like
if, the search string ends on a relative clause complementizer (
who or
that), which in turn is preceded by a pronoun position (_p), since the head noun in the relative clause extractions reported in the literature is often an indefinite pronoun of some kind.
22. | a. | NOUN that|which|who _pp _vv _p who |
| b. | NOUN that|which|who _pp _vv _p that |
Like in the adjunct clause search, these strings were augmented with additional unspecified positions in the matrix clause in subsequent queries. Additional searches were carried out targeting specific matrix constructions that appear to be common in relative clause extraction sentences, e.g., by using strings that target relative clauses headed by NPs with
the only (23a) or
a lot of (23b), and modifications of these strings.
23. | a. | that|which|who _pp [be] the only * who| that |
| b. | that|which|who there [be] a lot of * who|that |
Again, Google was used to carry out further searches, mostly targeting relativization from relative clauses in constructions that appear to be common in the extraction sentences found in the literature or in our Danish material, e.g., which I don’t know anyone who or which there are many people who. Like with the adjunct clause study, the English search was restricted to wh-extraction and relativization and did not include topicalization.
4.3. Discussion
The quantity of sentences featuring relative clause extraction that we found in Danish (940 cases on KorpusDK alone) allows us to conclude that relative clause extraction, especially with topicalization, is a commonly produced construction in naturally occurring Danish. In fact, the frequency of relative clause extraction is roughly comparable to the rate of extraction from declarative
at (‘that’)-clauses that we found using equivalent search strings, with 1250 found cases on Korpus DK (see
Section 3.3). Note that this does not take into account the base frequency of complement clauses vs. relative clauses in Danish.
The considerable amount of relative clause extractions found on KorpusDK seems to support the previous observations that relative clauses do not behave like strong islands in Danish. However, it is striking that a vast majority of the found relative clause extractions (more than 92%) feature a relative clause embedded under an existential construction introduced by
der er ‘there is’. Relative clause extraction in Danish thus seems to be particularly productive in this specific environment. This is in line with similar observations for Swedish, where relative clause extraction is also reported to be most common in existential environments (e.g.,
Engdahl 1997;
Lindahl 2017). In light of the few extraction examples that we found featuring other matrix verbs (e.g.,
kende ‘know’,
møde ‘meet’, or
finde ‘find’), our data are compatible with the claims that MSc. relative clause extraction is in principle also possible with other (non-existential) predicates (see also
Lindahl 2017). However, the production of such cases in written language appears to be exceedingly rare (given that less than 1% in our sample featured a matrix verb other than
være ‘be’).
One possible explanation for the uneven distribution of matrix verbs is that there is a syntactic difference between relative clauses embedded under an existential construction and other relative clauses, such that the latter form islands for extraction and the former do not.
McCawley (
1981) makes a proposal along these lines, by suggesting that extraction from relative clauses is possible when the relative clause is embedded in an existential or negative existential clause, as the extraction domain in that case is not a regular relative clause, but a
pseudo-relative. Extraction from a pseudo-relative may not violate an island constraint if it is assumed that pseudo-relatives are not actually complex NPs (e.g.,
if pseudo-relatives are analyzed as a species of small clauses, see Casalicchio 2016).
2 However, this approach would not cover the cases where the matrix verb is a not an existential verb, such as e.g.,
finde ‘find’, see example (26e). See also
Lindahl (
2017) for a review of Swedish extraction examples that cannot be covered by
McCawley’s (
1981) proposal. In other words, the preference for an existential matrix verb in MSc. relative clause extractions seems to be more of a strong tendency than an absolute restriction.
This tendency can more likely be explained by a pragmatic account of islands, as suggested by
Chaves and Putnam (
2020). Chaves and Putnam propose that many island constraints traditionally assumed to be syntactic in nature, including the Complex NP Constraint responsible for relative clause islands, can be reduced to
Relevance Islands: The referent that is singled out by the extraction must be sufficiently relevant for the main action described by the utterance. Chaves and Putnam’s proposal builds on a line of other accounts that derive island effects from information-structural factors (e.g.,
Erteschik-Shir 1973;
Deane 1991;
Goldberg 2006,
2013;
Van Valin 1994,
1996,
2005). Generally, these accounts share the assumption that extraction is only felicitous if it occurs from a constituent that is in some sense prominent or relevant in the discourse. For example,
Goldberg (
2006) suggests that extraction is illicit from
backgrounded domains, since extracted phrases are typically in discourse-prominent positions, and extraction from a backgrounded domain thus causes a pragmatic clash.
Chaves and Putnam (
2020) recast the account by
Goldberg (
2006,
2013) and other related pragmatic accounts of islands in terms of the concept of
relevance: An extracted referent “must be highly relevant (e.g., part of the evoked conventionalized world knowledge) relative to the main action that the sentence describes” (
Chaves and Putnam 2020, p. 206). According to
Chaves and Putnam (
2020, p. 68), this relevance constraint can account for the difficulty to extract from most relative clauses: Because relative clauses tend to express presupposed or backgrounded information, a referent belonging to a relative clause can typically not be construed as sufficiently relevant for the main event. However, extraction may be acceptable if the relative clause is embedded under an existential
there is/
are or other matrix predicates that are low in semantic content, such as e.g.,
know or
have, since in those cases, the embedded clause can be deemed to be more informative than the matrix clause, and drawing attention to a referent from it by extraction thus does not violate the pragmatic relevance principle (
Chaves and Putnam 2020, p. 68). Indeed, this explanation seems to accommodate not just the cases of relative clause extraction under
there is/
are, but also the examples involving other matrix predicates that we found: All other matrix verbs attested in our extraction examples (
kende ‘know’,
møde ‘meet’,
have ‘have’,
blive ‘become’, and
finde ‘find’) are semantically rather abstract and are thus compatible with the embedded relative clause expressing the main assertion in the utterance.
As a way to identify relevant or prominent constituents,
Chaves and Putnam (
2020) and
Goldberg (
2006,
2013) both suggest that the distinction between discourse-prominent (relevant) and backgrounded content aligns with the distinction between asserted and presupposed content, such that asserted information tends to correspond to the main action (or in Goldberg’s terms, the
potential focus domain of a sentence), whereas presupposed clauses are backgrounded (
Chaves and Putnam 2020, pp. 71, 208;
Goldberg 2006, p. 130). Assertions in turn can be identified by testing whether a proposition can be negated by sentential negation. This negation test correctly predicts that existential/presentational relative clauses as well as predicative relative clauses (which together represent the bulk of relative clauses involved in our examples) should allow extraction, as they express assertions and can thus successfully be negated by negating the matrix (see
Lindahl 2017, p. 160, Kush et al. 2021, p. 38). However, as
Lindahl (
2017, pp. 160–61) points out, the negation test runs into difficulties with cleft structures, which appear to allow extraction in the MSc. languages, but are incorrectly identified as islands by the negation test, since they are presupposed (see also
Kush et al. 2021, p. 38). While extraction from cleft relative clauses was rare in our material, we did find 8 instances of extraction from a cleft clause in the Danish corpus that would be left unaccounted for under the proposal that islands correspond to presupposed clauses. Consider e.g., example (31a), which involves extraction from a cleft clause. As (31b) shows, it is not possible to negate the proposition expressed in the cleft relative clause by negating the matrix of the non-extracted version of this sentence.
31. | a. | [Dem]i | er | det | primært | centrum-venstre-partierne, | [der | har | svaret | på __i]. | |
| | them | is | it | primarily | center-left parties.the | that | have | replied | to | |
| | ‘It is primarily the center-left parties that have answered them.’ | |
| | [KorpusDK 2021] |
| b. | Det | er | ikke | primært | centrum-venstre-partierne, | [der | har | svaret | på | dem]. |
| | it | is | not | primarily | center-left-parties.the | that | have | answered | to | them |
| | ‘It is not primarily the center-left parties that have answered them.’ |
| | → Someone has replied to them. |
However,
Kush et al. (
2021) point out that another related information-structural factor seems to make the right cut between the types of relative clauses that allow for extraction and those that do not, viz. whether or not the clause in question conveys new information: “The [relative clauses] that allow movement are those that contribute wholly or partially new information to the discourse (or at least information that need not be known to the hearer)” (
Kush et al. 2021, p. 39). This proposal can account for the possibility to extract from cleft structures, since cleft clauses (despite being presupposed) may convey new information. For example, it is possible to utter the sentence in (31a) even if the information provided in the cleft clause (that someone has answered them) is new in the discourse.
Kush et al. (
2021) further suggest that clauses can be said to provide new information when they contain the sentence’s
main point of utterance (
MPU, Simons 2007). Since the matrix predicate in (31a) and in other cleft sentences is almost void of semantic content, the embedded relative clause necessarily constitutes the MPU in cleft sentences like (31a). An account of transparent relative clauses in terms of new information or MPU can arguably also account for our extraction examples where the matrix predicate is not
være ‘be’. As pointed out above, the other matrix verbs found in our extraction examples (e.g.,
kende ‘know’,
have ‘have’, and
blive ‘become’) are also low in semantic content and are thus compatible with the relative clause constituting the MPU. Finally, this proposal also has the potential to account for some of the Swedish examples of relative clause extraction that
Lindahl (
2017) has shown to be problematic for an account in terms of backgrounded or presupposed constituents, viz. examples with
beundra ‘admire’ or
störa sig på ‘be annoyed by’ as matrix predicates (see 6a–b). As
Lindahl (
2017, p. 161) shows, the clauses embedded under these verbs fail the negation test and are therefore considered to be backgrounded, yet they seem to allow extraction. However, it seems possible that clauses in the complement of these verbs still may constitute a sentence’s MPU: According to
Simons (
2007), an embedded clause can be the MPU if the matrix predicate conveys the speaker’s emotional orientation towards the information in the embedded clause, as the matrix verb is considered parenthetical in that case. This seems to fit the function of the matrix verbs in the Swedish examples mentioned above. Even though these examples are quite different from the ones discussed in
Simons (
2007), the predicates ’admiring’ and ’being annoyed’ can also be argued to express an emotional orientation towards the content of the relative clause.
Although we acknowledge that these remarks are of a very preliminary nature and require further investigation, we tentatively suggest (along with Kush et al. 2021) that a pragmatic account in terms of new information or MPU could make the relevant distinction between relative clauses that allow for extraction and those that do not (to the extent that a language allows extraction from relative clauses at all). However, more work is required to flesh out how exactly
Simons’ (
2007) proposal could be adapted to relative clauses.
Unlike with adjunct clause extraction, the extraction dependency that was by far most frequent in our sample of Danish relative clause extractions was topicalization, with relativization taking a second place. However, the results from our relative clause study share with the adjunct clause study that none of the found extraction instances involved
wh-extraction. This finding thus further strengthens the conjecture that topicalization and relativization from islands appears to be easier than
wh-extraction. Our results regarding the distribution of different extraction dependencies in relative clause extraction match
Lindahl’s (
2017, p. 47) finding that topicalization was by far the most common extraction dependency among her sample of Swedish relative clause extractions, whereas relativization occurred only in a few cases and
wh-extraction from relative clauses remained unattested.
Finally, our finding that not just arguments, but also adverbial or PP adjuncts are extracted from relative clauses in our sample (see examples 27 and 28) demonstrates that relative clause extraction in Danish apparently is not restricted to argument DPs, but that phrases of different categories can be topicalized (see
Lindahl 2017 for a similar observation for Swedish relative clauses). If MSc. relative clauses indeed allow for extraction of (some) adjuncts, as these observations suggest, this would distinguish MSc. relative clauses not just from strong islands, but also from traditional weak islands that generally permit extraction of arguments, but not adjuncts (
Huang 1982;
Szabolcsi 2006).
As for English, the extraction instances that we found on Google and GloWbE seem to support the anecdotal evidence that naturally produced cases of extraction from English relative clauses do occasionally appear in authentic language use. However, the low frequency of the results prevents a clear conclusion as to whether relative clause extraction in English is a phenomenon that extends beyond constructed examples and only sporadically found natural cases. It should, however, also be mentioned that the frequency for relative clause extractions cannot be directly compared between our Danish and English study, since the search methods we could employ for the English corpus study were somewhat restricted in comparison to the Danish study—in part because of limits on the length and generality of the search strings that can be used to query BNC, COCA and GloWbE, and in part due to the polysemy of English that. We were thus not able to conduct a search for English relative clause extractions that was as systematic and extensive as the Danish part of this study.
It is, however, remarkable that no cases of extraction from relative clauses in existential environments were found in English, given that a large part of our search methods specifically targeted such constructions, and that extraction from existential relative clauses was so common in the Danish material. Instead, the majority of the examples we found involved VP-extraction from a relative clause embedded under know and headed by anyone or anybody. In light of the low number of cases that we found in English, we refrain from any further analysis and conclusions regarding potential patterns or trends in English relative clause extraction.