Wednesday, July 25, 2007

Formal Model vs. Human Mind

Yikes- I'm a bit overwhelmed with all the thinking in my mind this week! I argued rather strongly in a discussion tonight a position I had never clearly understood before: that linguists should very clearly divide themselves into two camps, computer scientists and cognitive scientists. What I mean by this is that I think there are two possible goals for linguistics, and we shouldn't get them confused:

1) create a formal model/grammar of language: the more accurate it predicts what humans actually do, the better
2) understand what is going on in our mind when we use language

It seems to me that confusing these leads to syntacticians making theories of traces, which works (goal 1), and then predicting this will lead to increased processing time (goal 2), which is not true. Or in another case of confusion, Montague grammar and lambdas are nice model theoretic tools for talking about meaning, but no one wants to say that this is what goes on in our head... bad things happen when we start trying to say that it is.

Anyway, I think most people pursue either goal 1 or goal 2, but aren't always clear about it. This makes it hard for them and their readers. From my one conversation with Roger L., he gave me the impression that computational psycholinguists try to answer questions of theory 2 using tools developed for theory 1... is this a sensible way to interpret the sub-field? I think interactions in general between the two camps are good, as long as people respect what I see as two very distinct goals. I'd really love to know what you all think about this, especially because our department seems to have people at the far ends of each of these camps... at least it seems so to me.

What the hell is grammar anyway?

I feel like a real academic today, as I just now indirectly (with Roger) won a bet over whether a sentence like "He needs showing the way" could be acceptable. The answer is that, at least according to Google, it can in British English:

kewell needs showing the exit door


as a resource for someone who's really getting to grips with CSS and needs 'showing the light' then this is an ideal purchase.

But this brings up an important point that is incessantly being brought up anymore: what does it mean to say "This sentence isn't ungrammatical; I found it on Google"?

(1) Discussion of Columbus, his men and the food the ate.
(2) i end up with this undertaking, floured thoroughly the cloth made the little pre done seed blocks.

First off, we obviously can't just make this claim without a little bit of analysis. (1) is a sentence I found on Google, from an academic site no less. More relevant to this blog, perhaps, is sentence (2), generated by the TPS in an earlier post. Both of these could be found on Google, though neither would be considered grammatical. So in making the "found online = grammatical" claim, one must obviously impose some sort of sanity check on the data to make sure it is not a typo (as I presume the first sentence is) or a blog-making robot's sentence (as I am almost certain the second sentence is) So let's set a ground rule:

An internet example is valid if an (or better, a few) unbiased native speaker does not consider the sentence ungrammatical.

This is sort of a combination of Labov's (1975) Consensus Principle (unless you have reason to think otherwise, assume one native speaker agrees with all the rest) and Experimenter Principle (in unclear cases, trust the judgment of someone unfamiliar with the theory over someone familiar with the theory).

Those sentences would definitely not be accepted by unbiased native speakers. So with this principle in hand, we don't have to worry about absurd sentences be argued for by appearances in Google. (This is, as far as I can tell, the gist of Joan Bresnan's response to Ivan Sag's comment that teh appears millions of times on the Web but should not be considered a word of English.)

Finding something on Google, then, is not evidence in and of itself of the grammaticality or ungrammaticality of a sentence. Rather, a Google search points out instances of a construction or sentence that may be valid. The Google search, like any other corpus search, just gives us a direction to go. If one of the found sentences is clearly grammatical, then it answers our question. If there is no clear valid example of a construction/sentence, then the question remains open.

Does this seem like a reasonable framework to employ for online searches? I feel like this isn't controversial, but at the same time I think it's not as strong as we could go on what can be accepted as "grammatical" from a Google search. And what do you think is a good meaning for "grammatical"? I'm having an awfully hard time formulating a definition and would be interested in what you guys think.

Tuesday, July 24, 2007

notes from the lsa

Gee Gabe, I think your automated blogger has set the bar on this site a little too high. I know my life was changed the moment I heard that "the next vanguard of waiters sashayed towards him with the sauce over the world." Wow, I think I just got goosebumps there.

Anyway, my organic-matter mind will still attempt to contribute a list (just a list for now, hopefully expansions on these will follow) of a few things I've discovered this last month that on some level blew my mind:

-pragmatic intrusion: "it's better to drive home and drink three beers than to drink three beers and drive home"... truth conditionally this sentence is contradictory, but the implicatures which allow this to make sense are below the level of a sentence. people at the institute have ideas about compute below-sentence-level implicatures, but no agreement on solutions

-"illogical negation" constructions like the following:
-he could care less/ he couldn't care less
-that'll teach you to swim/ that'll teach you not to swim
-the box is packed / the box is unpacked
-don't fail to miss the sign / don't miss the sign
(these pairs are all synonymous)
I've found that some of these do odd things when embedded under attitude predicates... some of them actually become logical! this suggests there is more going on than just problems computing the number of negatives

-the question of whether, if we construct a probabilistic grammar, there is a difference between a very low probability sentence, and sentences at the limit point (i.e. completely ungrammatical)... more crazy to me, is how in the world could we test this empirically?

-RELEVANCE... entire theories of semantics and pragmatics (and syntax and phonology...) depend on humans being able to figure out what is "relevant" for effective communication. Shouldn't we be finding out more from psychologists and cognitive scientists about how people actually might accomplish this feat? Otherwise, at least many talks I've seen here go down the drain without a good understanding of how we might determine relevance. Or even what it means for something to be "relevant."

-David Beaver is doing some really neat stuff on the semantics/phonology interface, with discourse particles, focus, and prosody. I hope to post more about this later.

-the F-word... I took a class with Chris Potts about other "dimensions" of meaning. One of those is the expressive realm, which includes all your favorite curses. Note how "fucking" doesn't contribute to the truth conditions of a sentence OR the assertive content:
-A: "I saw your fuckin' dog in the park."
-B: #"No, he's very sweet."
You can't refute expressive content. We basically concluded in this class that expressives have a quality of "speech act" about them, like "promise." One you say "I promise to clean up" an action takes place just in the speech act. Seems to be the same with expressives. But there's a lot of interesting things about them that I forget at the moment but hopefully I can expand upon later... if the autoblogger doesn't beat me to a brilliant post about them first :-)

Friday, July 20, 2007

Was the TPS nurtured or natured?

So I figured it would be a while before the TPS came up with a new post, but this morning it did some research on the nature-versus-nurture debate (by which i mean it spent about five minutes reading through 200 blog posts on the subject), and was positively bursting at the seams to unleash its (i.e., the blogosphere's) opinions upon the world:

methamphetamine, or "meth," as it is important and can teach genetics. (there are always tomatoes to dice). when the truth unless i could share here with all these years of evolution in his beliefs was, he said, "'thank god,' and it crystallized into gay identity."in a worldnetdaily.com posting accompanying his essay, glatze was only when a friend who was mentioned in the popular online 3d virtual environment of second life. he has also posted a more active one.so opens another chapter in my hearta man alonein search of some of the government school establishment lies - they lie about anything that they need to wonder when the night was a wunder product designed to satisfy each nation's concerns and put u.s.-russian cooperation on a whole new place. end-post

Now I just have to convince the TPS to research psycholinguistics, and I'll never have to write a paper myself again.

Thursday, July 19, 2007

On Asparagus

I'm trying to automate the internet. Think of how much time I can save in a day if I just convince my computer to do all of the online stuff I normally do. Respond to emails. Check sports scores. Get lost on Wikipedia determining what countries have non-rectangular flags.

Well, I'm one step closer to automating myself. I've created a python script that can create blog entries on my behalf with minimal effort of my own. Let's say, for instance, that I want to write about asparagus but don't have the time to bother with research, or don't actually have any asparagus opinions. In the olden days, I'd have had to call it a day, and the world would miss out on one more rambling post about asparagus. No longer is this the case!

Now the Trigram Post Simulator (TPS) reports what other bloggers have already thought about asparagus in the form of a simple, muddled post. Observe:

it’s been a prize for the peaches, apricots, mangoes or whatever your favorite chutney recipe calls for. you can see asparagus spears lurking in there. i have to admit, it's the main elements of caesar, and i get strawberries in huge quantities, i end up with this undertaking, floured thoroughly the cloth made the little pre done seed blocks. i do not have a blog in another language i would make no sound at all. elul is the quirky side of popcorn. the lesson: one gets what they would have taken both. they look incredibly good!i definitely couldn't share a salad nicoise the other half of the exeter book riddles clearly belong to the image of weaving cloth, and the garlic and the carnival of the rectangle, 2.5cm in from the folks at klamath fallsjuly 07, 2007this letter was written by charles grennel and his developments in kansas city."it'll make documentarists' hard academic work so much more helpful to city planners!this is a really tasty alternative to bubble and leek is a lot less sauce. the next vanguard of waiters sashayed towards him with the sauce over the world. they can accept the fact that i can get, although i suspect includes most readers of this by adding ground almonds and sugar into a lot better if it'd taken excluded peoples' ideas of "good plan" or "bad plan" one could make my grandmother's pasty recipe and the capacity for violence, and a white wine and reduce. add cream and reduce by half.5. add crab claws and boil 6 minutes. drain crab and according to package directions. steam asparagus for about 15 minutes)6. (if you are planning to visit eckert's, so you have lying around - overripe bananas aren't seasonal, but sometimes they are both intense and light. obviously, they match

Sure it's imperfect, but it's only at version 1.0. By version 3.0 no one will even want to talk to me; they'll just head straight for the TPS. And it's really easy - I just tell TPS what I want to post about, and it goes and finds 200 posts about it. Then it creates a trigram model of the posts it found and uses that model to construct a novel post of its own. How much time I'm saving! And it's no less coherent than most of my posts!

Tuesday, July 17, 2007

Intermediate Cases

An epigram:
"We may assume for this discussion that certain sequences of phonemes are definitely sentences, and that certain other sequences are definitely non-sentences. In many intermediate cases we shall be prepared to let the grammar itself decide, when the grammar is set up in the simplest way so that it includes the clear sentences and excludes the clear non-sentences." -Chomsky 1957, p. 14.

The idea:
The grammaticality of some sentences can't be determined easily by humans. So we make our grammar without considering these sentences. Then, once we have our grammar, we can look at these sentences to determine their grammaticality.

Some thoughts:
So this seems initially reasonable; people do this all the time. For instance, let's say you see someone who you can't tell the gender of. There is a theory that says there are certain body parts that men have and others that women have. This theory was built up based on observations from certainly-males and certainly-females. So we go to this androgynous person and see what body parts they have to determine their gender. If we want to be even more scientific about it, we could do genetic testing, look at their chromosomes, and have our answer.

Of course, nothing in life is easy like this - even in genetic testing there are border cases. For instance, there's Klinefelter's Syndrome, where a person has two X and one Y chromosome. Categoricality just isn't there - there're always intermediate cases. And so it's almost certainly not right to say there are grammatical and ungrammatical sentences and nothing in between. But let's just suppose that these strict categories exist for now.

The bigger problem is that grammar isn't separate from us. There is no such thing as a "right" grammar of English. (To the dismay of many old fuddy-duddies.) There is no oracle that we can go to and ask "Is this sentence grammatical?" What we call the grammar of a language is really the consensus of the idiolects of its speakers. the grammar of a language is what we say, not what we think we say, nor what we think we should say, nor what a theoretical construct says we should say. A cornerstone of modern linguistics is that our grammars are descriptive, not prescriptive. A descriptive grammar predicts what we think about sentences; it can't tell us what to think. If there is an intermediate case, a good grammar must predict that.

Returning to the gender example, asking a grammar to determine the grammaticality of an intermediately-grammatical sentence is akin to asking a theory to determine whether a person suffering from Klinefelter's Syndrome is a man or woman, when the person isn't entirely in either category. Or like saying that physics should be based on waves and particles and thus should be able to settle once and for all which one light is. But, it's both. It's intermediate. And we have to build theories around that. Likewise, we as linguists have to make our theories fit the intermediacy of some sentences, rather than asking the theories to tell us which way those sentences go.

I think that's what's problematic with Chomsky's idea, which is sort of ingrained in us (me, at least): our theories can't tell the data what to do.