Monday, September 10, 2007

I love the BNC

I was looking through the British National Corpus to try to find instances of constructions like The couch needs a cleaning. The last result of my most recent search:

No erm but I 'm sorry but whoever did that needs a fucking good kick in the head you know .

Corpus linguistics is where it's at.

Thursday, August 16, 2007

Commercials Annoy Me

I don't want to reveal too much of my personal life, of course, but I have to admit that from watching two-hour blocks of Daily Show/Colbert Report/Scrubs re-runs each weekday, I've seen this commercial for Astrive student loans somewhere on the order of twenty times. (Somewhat less than the Ditech commercial that admonishes me with "people are smart", but somewhat more than the Best Buy commercial where the dad hides his daughter's backpack to prevent her from going to college.)

One non-linguistic thing that bothers me about the commercial first: one of its claims is that an astrive loan is better than borrow from a "high-interest credit card". Nothing like informing us that your offer is better than the worst possible solution. Might as well say "better than paying for college by running small jobs for the Mob". Or "eating our hamburgers is more nutritious than subsisting on Crisco."

Returning to the linguistic point I wanted to make originally, the friendly narrator who keeps on talking down to me says at one point that college costs "major dollars... GRANDE dollars." This seems weird in a few ways:

1. It's highly nonstandard to use major to modify a plural noun.
2. It's highly nonstandard to use grande to modify a plural noun.
3. There is a standardized Spanish borrowing into English with the same meaning as grande dollars: mucho dinero.

So it's sort of a neologism,

Tuesday, August 14, 2007

A Cyclical Progression

As I was walking around yesterday, randomly taking pictures of things in the background with other things out-of-focus in the foreground, I started thinking about whether I am approaching linguistics correctly.

Early linguists did descriptive linguistics, and the whole field up to the Chomskyian revolution was, by and large, a bunch of people pointing out different neat language anomalies to each other and saying "Well, isn't that neat?", without any major theoretical framework emerging. Kind of, in my opinion, a waste.

Then along comes Chomsky to introduce some rigor to the field, and it worked. Suddenly people were combining grammar, logic, math, computer science, set theory, (a teensy bit of) psychology and cognitive science, and a bunch of other jazz together and actually getting a pretty nice little theoretical framework out. A lot of the success of this revolution came from abstracting away from language and reducing all of the beautiful neat idiosyncrasies of language to categories, rules, and various cleanly-defined abstract concepts. It worked.

But not perfectly. The problem is that language isn't quite the same as logic. Our linguistic theories work really well on these abstractions, but the problem is that these abstractions don't really translate back into real, observed language so well. Take, for instance, the abstract category verb. There are tons of things that are sort of verbs, like passive participles, gerunds, nominalizations, etc., that vary in how verb-like they are from language to language. Likewise, as my current attempt to label corpus subjects as singular/plural/mass nouns is showing me, there're some grey areas even in abstractions that aren't all that abstract (it's usually pretty clear whether there is one or more of something, but for abstract and mass nouns, it can be unclear whether something is countable). This is the sort of thing that has been shunted off for years with the old refrain "We'll let pragmatics take care of that."

But pragmatics has not taken care of these problems, which is why a lot of linguists are switching over to what is, in some ways, a less abstract approach to linguistics. I am in this camp, but the question that bugged me as I was walking yesterday was whether this is justified. Basically, we're turning back toward descriptive linguistics. We're not going all the way back there, but at the same time (and perhaps with a twinge of guilt in my math-major heart), I worry that we shouldn't go back toward descriptivism at all.

I think the loss of abstraction is justified, for two reasons: 1) the lack of progress in connecting real language usage, the sort that humans use so effortlessly, to the abstractions that are becoming increasingly tenuous and complex, and 2) we have the computational tools to make something of consequence out of a more descriptivist, less abstract approach now. We can say with confidence that animate subjects prefer certain constructions, or longer subjects favor others. I think that even if we ended up back at truly descriptive linguistics, we'd still be way ahead of the game by being able to state statistically significant tendencies and such. At worst, we'd pave a better road for a new Chomskyian revolution.

I feel much better now.

Friday, August 3, 2007

Speech v. Writing

Kate's post last week got me thinking about a lot of stuff, and coupled with part of a book I'm reading on self-organizing systems, I think there're some other relevant divisions in the goals of linguistics that need to be addressed. One that's gnawing at me is the distinction between spoken and written language. I don't think that there's a qualitative distinction in the underlying theory of how people construct sentences in the two modalities. But something's going on.

For instance, it's commonly agreed upon that spoken English is not always grammatical. People seeing transcripts often report that they surely did not say what was transcribed. And as any corpus linguist will tell you, spoken corpora are full of ungrammatical sentences. But what's interesting is that the spoken stuff seems to be locally coherent.

So here's my thought. Written stuff, thanks to the ability to see clearly what preceded the current point in the sentence, is based on global information. Spoken stuff, on the other hand, is based on what you can recall in a complicated setting where you're trying to formulate a novel thought in a stimulating environment with a reactive audience. In such situations, you should expect to have imperfect recall even of what specific words were at the start of your sentence. Rather, you could just remember the gist of what was said before and the last few spoken words, and assume that this is what your listener is doing as well. In that case, you can build the rest of your sentence based on local coherence with the recent words and the general sentence gist.

If that's how speaking and writing work, then it looks like we need different models for the grammars of the two modalities - one with rules/constraints that depend on pure global information, and the other with rules/constraints that depend almost solely on local information. This doesn't imply separate grammar types for written and spoken language, but rather a different set of constraints (or perhaps a different ranking of the same constraints, if you're particularly enamoured of OT). Alternatively, it may be that written English is subject to grammaticality judgments, and spoken English is subject to acceptability judgments, and that we're really honestly using different measures.

I don't know if this is totally the right direction, but I think the time will come (if it's not already here) when we need to address the differences in grammaticality judgments in written and spoken language.

Thursday, August 2, 2007

if you're ever asked for the difference between "DRT" and "File Change Semantics"...

A quote from David Beaver I found in my class notes from the LSA:

"Hmm, representing discourse. Well, Gilles Fauconnier's theory of discourse representation, called "Mental Spaces", uses circles with lines connecting them, like this. [draws circles]

Hans Kamp's theory of discourse representation, called "Discourse Representation Theory" (DRT), uses rectangles inside of other rectangles. [draws rectangles]

Irene Heim's theory of discourse representation, called "File Change Semantics," uses skinnier rectangles than in DRT. [draws skinner rectangles]

Those are basically the differences, except mental spaces doesn't have a model theoretic interpretation, so forget that. Ok, back to the class material..."

Wednesday, July 25, 2007

Formal Model vs. Human Mind

Yikes- I'm a bit overwhelmed with all the thinking in my mind this week! I argued rather strongly in a discussion tonight a position I had never clearly understood before: that linguists should very clearly divide themselves into two camps, computer scientists and cognitive scientists. What I mean by this is that I think there are two possible goals for linguistics, and we shouldn't get them confused:

1) create a formal model/grammar of language: the more accurate it predicts what humans actually do, the better
2) understand what is going on in our mind when we use language

It seems to me that confusing these leads to syntacticians making theories of traces, which works (goal 1), and then predicting this will lead to increased processing time (goal 2), which is not true. Or in another case of confusion, Montague grammar and lambdas are nice model theoretic tools for talking about meaning, but no one wants to say that this is what goes on in our head... bad things happen when we start trying to say that it is.

Anyway, I think most people pursue either goal 1 or goal 2, but aren't always clear about it. This makes it hard for them and their readers. From my one conversation with Roger L., he gave me the impression that computational psycholinguists try to answer questions of theory 2 using tools developed for theory 1... is this a sensible way to interpret the sub-field? I think interactions in general between the two camps are good, as long as people respect what I see as two very distinct goals. I'd really love to know what you all think about this, especially because our department seems to have people at the far ends of each of these camps... at least it seems so to me.

What the hell is grammar anyway?

I feel like a real academic today, as I just now indirectly (with Roger) won a bet over whether a sentence like "He needs showing the way" could be acceptable. The answer is that, at least according to Google, it can in British English:

kewell needs showing the exit door


as a resource for someone who's really getting to grips with CSS and needs 'showing the light' then this is an ideal purchase.

But this brings up an important point that is incessantly being brought up anymore: what does it mean to say "This sentence isn't ungrammatical; I found it on Google"?

(1) Discussion of Columbus, his men and the food the ate.
(2) i end up with this undertaking, floured thoroughly the cloth made the little pre done seed blocks.

First off, we obviously can't just make this claim without a little bit of analysis. (1) is a sentence I found on Google, from an academic site no less. More relevant to this blog, perhaps, is sentence (2), generated by the TPS in an earlier post. Both of these could be found on Google, though neither would be considered grammatical. So in making the "found online = grammatical" claim, one must obviously impose some sort of sanity check on the data to make sure it is not a typo (as I presume the first sentence is) or a blog-making robot's sentence (as I am almost certain the second sentence is) So let's set a ground rule:

An internet example is valid if an (or better, a few) unbiased native speaker does not consider the sentence ungrammatical.

This is sort of a combination of Labov's (1975) Consensus Principle (unless you have reason to think otherwise, assume one native speaker agrees with all the rest) and Experimenter Principle (in unclear cases, trust the judgment of someone unfamiliar with the theory over someone familiar with the theory).

Those sentences would definitely not be accepted by unbiased native speakers. So with this principle in hand, we don't have to worry about absurd sentences be argued for by appearances in Google. (This is, as far as I can tell, the gist of Joan Bresnan's response to Ivan Sag's comment that teh appears millions of times on the Web but should not be considered a word of English.)

Finding something on Google, then, is not evidence in and of itself of the grammaticality or ungrammaticality of a sentence. Rather, a Google search points out instances of a construction or sentence that may be valid. The Google search, like any other corpus search, just gives us a direction to go. If one of the found sentences is clearly grammatical, then it answers our question. If there is no clear valid example of a construction/sentence, then the question remains open.

Does this seem like a reasonable framework to employ for online searches? I feel like this isn't controversial, but at the same time I think it's not as strong as we could go on what can be accepted as "grammatical" from a Google search. And what do you think is a good meaning for "grammatical"? I'm having an awfully hard time formulating a definition and would be interested in what you guys think.