Monday, September 10, 2007

I love the BNC

I was looking through the British National Corpus to try to find instances of constructions like The couch needs a cleaning. The last result of my most recent search:

No erm but I 'm sorry but whoever did that needs a fucking good kick in the head you know .

Corpus linguistics is where it's at.

Thursday, August 16, 2007

Commercials Annoy Me

I don't want to reveal too much of my personal life, of course, but I have to admit that from watching two-hour blocks of Daily Show/Colbert Report/Scrubs re-runs each weekday, I've seen this commercial for Astrive student loans somewhere on the order of twenty times. (Somewhat less than the Ditech commercial that admonishes me with "people are smart", but somewhat more than the Best Buy commercial where the dad hides his daughter's backpack to prevent her from going to college.)

One non-linguistic thing that bothers me about the commercial first: one of its claims is that an astrive loan is better than borrow from a "high-interest credit card". Nothing like informing us that your offer is better than the worst possible solution. Might as well say "better than paying for college by running small jobs for the Mob". Or "eating our hamburgers is more nutritious than subsisting on Crisco."

Returning to the linguistic point I wanted to make originally, the friendly narrator who keeps on talking down to me says at one point that college costs "major dollars... GRANDE dollars." This seems weird in a few ways:

1. It's highly nonstandard to use major to modify a plural noun.
2. It's highly nonstandard to use grande to modify a plural noun.
3. There is a standardized Spanish borrowing into English with the same meaning as grande dollars: mucho dinero.

So it's sort of a neologism,

Tuesday, August 14, 2007

A Cyclical Progression

As I was walking around yesterday, randomly taking pictures of things in the background with other things out-of-focus in the foreground, I started thinking about whether I am approaching linguistics correctly.

Early linguists did descriptive linguistics, and the whole field up to the Chomskyian revolution was, by and large, a bunch of people pointing out different neat language anomalies to each other and saying "Well, isn't that neat?", without any major theoretical framework emerging. Kind of, in my opinion, a waste.

Then along comes Chomsky to introduce some rigor to the field, and it worked. Suddenly people were combining grammar, logic, math, computer science, set theory, (a teensy bit of) psychology and cognitive science, and a bunch of other jazz together and actually getting a pretty nice little theoretical framework out. A lot of the success of this revolution came from abstracting away from language and reducing all of the beautiful neat idiosyncrasies of language to categories, rules, and various cleanly-defined abstract concepts. It worked.

But not perfectly. The problem is that language isn't quite the same as logic. Our linguistic theories work really well on these abstractions, but the problem is that these abstractions don't really translate back into real, observed language so well. Take, for instance, the abstract category verb. There are tons of things that are sort of verbs, like passive participles, gerunds, nominalizations, etc., that vary in how verb-like they are from language to language. Likewise, as my current attempt to label corpus subjects as singular/plural/mass nouns is showing me, there're some grey areas even in abstractions that aren't all that abstract (it's usually pretty clear whether there is one or more of something, but for abstract and mass nouns, it can be unclear whether something is countable). This is the sort of thing that has been shunted off for years with the old refrain "We'll let pragmatics take care of that."

But pragmatics has not taken care of these problems, which is why a lot of linguists are switching over to what is, in some ways, a less abstract approach to linguistics. I am in this camp, but the question that bugged me as I was walking yesterday was whether this is justified. Basically, we're turning back toward descriptive linguistics. We're not going all the way back there, but at the same time (and perhaps with a twinge of guilt in my math-major heart), I worry that we shouldn't go back toward descriptivism at all.

I think the loss of abstraction is justified, for two reasons: 1) the lack of progress in connecting real language usage, the sort that humans use so effortlessly, to the abstractions that are becoming increasingly tenuous and complex, and 2) we have the computational tools to make something of consequence out of a more descriptivist, less abstract approach now. We can say with confidence that animate subjects prefer certain constructions, or longer subjects favor others. I think that even if we ended up back at truly descriptive linguistics, we'd still be way ahead of the game by being able to state statistically significant tendencies and such. At worst, we'd pave a better road for a new Chomskyian revolution.

I feel much better now.

Friday, August 3, 2007

Speech v. Writing

Kate's post last week got me thinking about a lot of stuff, and coupled with part of a book I'm reading on self-organizing systems, I think there're some other relevant divisions in the goals of linguistics that need to be addressed. One that's gnawing at me is the distinction between spoken and written language. I don't think that there's a qualitative distinction in the underlying theory of how people construct sentences in the two modalities. But something's going on.

For instance, it's commonly agreed upon that spoken English is not always grammatical. People seeing transcripts often report that they surely did not say what was transcribed. And as any corpus linguist will tell you, spoken corpora are full of ungrammatical sentences. But what's interesting is that the spoken stuff seems to be locally coherent.

So here's my thought. Written stuff, thanks to the ability to see clearly what preceded the current point in the sentence, is based on global information. Spoken stuff, on the other hand, is based on what you can recall in a complicated setting where you're trying to formulate a novel thought in a stimulating environment with a reactive audience. In such situations, you should expect to have imperfect recall even of what specific words were at the start of your sentence. Rather, you could just remember the gist of what was said before and the last few spoken words, and assume that this is what your listener is doing as well. In that case, you can build the rest of your sentence based on local coherence with the recent words and the general sentence gist.

If that's how speaking and writing work, then it looks like we need different models for the grammars of the two modalities - one with rules/constraints that depend on pure global information, and the other with rules/constraints that depend almost solely on local information. This doesn't imply separate grammar types for written and spoken language, but rather a different set of constraints (or perhaps a different ranking of the same constraints, if you're particularly enamoured of OT). Alternatively, it may be that written English is subject to grammaticality judgments, and spoken English is subject to acceptability judgments, and that we're really honestly using different measures.

I don't know if this is totally the right direction, but I think the time will come (if it's not already here) when we need to address the differences in grammaticality judgments in written and spoken language.

Thursday, August 2, 2007

if you're ever asked for the difference between "DRT" and "File Change Semantics"...

A quote from David Beaver I found in my class notes from the LSA:

"Hmm, representing discourse. Well, Gilles Fauconnier's theory of discourse representation, called "Mental Spaces", uses circles with lines connecting them, like this. [draws circles]

Hans Kamp's theory of discourse representation, called "Discourse Representation Theory" (DRT), uses rectangles inside of other rectangles. [draws rectangles]

Irene Heim's theory of discourse representation, called "File Change Semantics," uses skinnier rectangles than in DRT. [draws skinner rectangles]

Those are basically the differences, except mental spaces doesn't have a model theoretic interpretation, so forget that. Ok, back to the class material..."

Wednesday, July 25, 2007

Formal Model vs. Human Mind

Yikes- I'm a bit overwhelmed with all the thinking in my mind this week! I argued rather strongly in a discussion tonight a position I had never clearly understood before: that linguists should very clearly divide themselves into two camps, computer scientists and cognitive scientists. What I mean by this is that I think there are two possible goals for linguistics, and we shouldn't get them confused:

1) create a formal model/grammar of language: the more accurate it predicts what humans actually do, the better
2) understand what is going on in our mind when we use language

It seems to me that confusing these leads to syntacticians making theories of traces, which works (goal 1), and then predicting this will lead to increased processing time (goal 2), which is not true. Or in another case of confusion, Montague grammar and lambdas are nice model theoretic tools for talking about meaning, but no one wants to say that this is what goes on in our head... bad things happen when we start trying to say that it is.

Anyway, I think most people pursue either goal 1 or goal 2, but aren't always clear about it. This makes it hard for them and their readers. From my one conversation with Roger L., he gave me the impression that computational psycholinguists try to answer questions of theory 2 using tools developed for theory 1... is this a sensible way to interpret the sub-field? I think interactions in general between the two camps are good, as long as people respect what I see as two very distinct goals. I'd really love to know what you all think about this, especially because our department seems to have people at the far ends of each of these camps... at least it seems so to me.

What the hell is grammar anyway?

I feel like a real academic today, as I just now indirectly (with Roger) won a bet over whether a sentence like "He needs showing the way" could be acceptable. The answer is that, at least according to Google, it can in British English:

kewell needs showing the exit door


as a resource for someone who's really getting to grips with CSS and needs 'showing the light' then this is an ideal purchase.

But this brings up an important point that is incessantly being brought up anymore: what does it mean to say "This sentence isn't ungrammatical; I found it on Google"?

(1) Discussion of Columbus, his men and the food the ate.
(2) i end up with this undertaking, floured thoroughly the cloth made the little pre done seed blocks.

First off, we obviously can't just make this claim without a little bit of analysis. (1) is a sentence I found on Google, from an academic site no less. More relevant to this blog, perhaps, is sentence (2), generated by the TPS in an earlier post. Both of these could be found on Google, though neither would be considered grammatical. So in making the "found online = grammatical" claim, one must obviously impose some sort of sanity check on the data to make sure it is not a typo (as I presume the first sentence is) or a blog-making robot's sentence (as I am almost certain the second sentence is) So let's set a ground rule:

An internet example is valid if an (or better, a few) unbiased native speaker does not consider the sentence ungrammatical.

This is sort of a combination of Labov's (1975) Consensus Principle (unless you have reason to think otherwise, assume one native speaker agrees with all the rest) and Experimenter Principle (in unclear cases, trust the judgment of someone unfamiliar with the theory over someone familiar with the theory).

Those sentences would definitely not be accepted by unbiased native speakers. So with this principle in hand, we don't have to worry about absurd sentences be argued for by appearances in Google. (This is, as far as I can tell, the gist of Joan Bresnan's response to Ivan Sag's comment that teh appears millions of times on the Web but should not be considered a word of English.)

Finding something on Google, then, is not evidence in and of itself of the grammaticality or ungrammaticality of a sentence. Rather, a Google search points out instances of a construction or sentence that may be valid. The Google search, like any other corpus search, just gives us a direction to go. If one of the found sentences is clearly grammatical, then it answers our question. If there is no clear valid example of a construction/sentence, then the question remains open.

Does this seem like a reasonable framework to employ for online searches? I feel like this isn't controversial, but at the same time I think it's not as strong as we could go on what can be accepted as "grammatical" from a Google search. And what do you think is a good meaning for "grammatical"? I'm having an awfully hard time formulating a definition and would be interested in what you guys think.

Tuesday, July 24, 2007

notes from the lsa

Gee Gabe, I think your automated blogger has set the bar on this site a little too high. I know my life was changed the moment I heard that "the next vanguard of waiters sashayed towards him with the sauce over the world." Wow, I think I just got goosebumps there.

Anyway, my organic-matter mind will still attempt to contribute a list (just a list for now, hopefully expansions on these will follow) of a few things I've discovered this last month that on some level blew my mind:

-pragmatic intrusion: "it's better to drive home and drink three beers than to drink three beers and drive home"... truth conditionally this sentence is contradictory, but the implicatures which allow this to make sense are below the level of a sentence. people at the institute have ideas about compute below-sentence-level implicatures, but no agreement on solutions

-"illogical negation" constructions like the following:
-he could care less/ he couldn't care less
-that'll teach you to swim/ that'll teach you not to swim
-the box is packed / the box is unpacked
-don't fail to miss the sign / don't miss the sign
(these pairs are all synonymous)
I've found that some of these do odd things when embedded under attitude predicates... some of them actually become logical! this suggests there is more going on than just problems computing the number of negatives

-the question of whether, if we construct a probabilistic grammar, there is a difference between a very low probability sentence, and sentences at the limit point (i.e. completely ungrammatical)... more crazy to me, is how in the world could we test this empirically?

-RELEVANCE... entire theories of semantics and pragmatics (and syntax and phonology...) depend on humans being able to figure out what is "relevant" for effective communication. Shouldn't we be finding out more from psychologists and cognitive scientists about how people actually might accomplish this feat? Otherwise, at least many talks I've seen here go down the drain without a good understanding of how we might determine relevance. Or even what it means for something to be "relevant."

-David Beaver is doing some really neat stuff on the semantics/phonology interface, with discourse particles, focus, and prosody. I hope to post more about this later.

-the F-word... I took a class with Chris Potts about other "dimensions" of meaning. One of those is the expressive realm, which includes all your favorite curses. Note how "fucking" doesn't contribute to the truth conditions of a sentence OR the assertive content:
-A: "I saw your fuckin' dog in the park."
-B: #"No, he's very sweet."
You can't refute expressive content. We basically concluded in this class that expressives have a quality of "speech act" about them, like "promise." One you say "I promise to clean up" an action takes place just in the speech act. Seems to be the same with expressives. But there's a lot of interesting things about them that I forget at the moment but hopefully I can expand upon later... if the autoblogger doesn't beat me to a brilliant post about them first :-)

Friday, July 20, 2007

Was the TPS nurtured or natured?

So I figured it would be a while before the TPS came up with a new post, but this morning it did some research on the nature-versus-nurture debate (by which i mean it spent about five minutes reading through 200 blog posts on the subject), and was positively bursting at the seams to unleash its (i.e., the blogosphere's) opinions upon the world:

methamphetamine, or "meth," as it is important and can teach genetics. (there are always tomatoes to dice). when the truth unless i could share here with all these years of evolution in his beliefs was, he said, "'thank god,' and it crystallized into gay identity."in a worldnetdaily.com posting accompanying his essay, glatze was only when a friend who was mentioned in the popular online 3d virtual environment of second life. he has also posted a more active one.so opens another chapter in my hearta man alonein search of some of the government school establishment lies - they lie about anything that they need to wonder when the night was a wunder product designed to satisfy each nation's concerns and put u.s.-russian cooperation on a whole new place. end-post

Now I just have to convince the TPS to research psycholinguistics, and I'll never have to write a paper myself again.

Thursday, July 19, 2007

On Asparagus

I'm trying to automate the internet. Think of how much time I can save in a day if I just convince my computer to do all of the online stuff I normally do. Respond to emails. Check sports scores. Get lost on Wikipedia determining what countries have non-rectangular flags.

Well, I'm one step closer to automating myself. I've created a python script that can create blog entries on my behalf with minimal effort of my own. Let's say, for instance, that I want to write about asparagus but don't have the time to bother with research, or don't actually have any asparagus opinions. In the olden days, I'd have had to call it a day, and the world would miss out on one more rambling post about asparagus. No longer is this the case!

Now the Trigram Post Simulator (TPS) reports what other bloggers have already thought about asparagus in the form of a simple, muddled post. Observe:

it’s been a prize for the peaches, apricots, mangoes or whatever your favorite chutney recipe calls for. you can see asparagus spears lurking in there. i have to admit, it's the main elements of caesar, and i get strawberries in huge quantities, i end up with this undertaking, floured thoroughly the cloth made the little pre done seed blocks. i do not have a blog in another language i would make no sound at all. elul is the quirky side of popcorn. the lesson: one gets what they would have taken both. they look incredibly good!i definitely couldn't share a salad nicoise the other half of the exeter book riddles clearly belong to the image of weaving cloth, and the garlic and the carnival of the rectangle, 2.5cm in from the folks at klamath fallsjuly 07, 2007this letter was written by charles grennel and his developments in kansas city."it'll make documentarists' hard academic work so much more helpful to city planners!this is a really tasty alternative to bubble and leek is a lot less sauce. the next vanguard of waiters sashayed towards him with the sauce over the world. they can accept the fact that i can get, although i suspect includes most readers of this by adding ground almonds and sugar into a lot better if it'd taken excluded peoples' ideas of "good plan" or "bad plan" one could make my grandmother's pasty recipe and the capacity for violence, and a white wine and reduce. add cream and reduce by half.5. add crab claws and boil 6 minutes. drain crab and according to package directions. steam asparagus for about 15 minutes)6. (if you are planning to visit eckert's, so you have lying around - overripe bananas aren't seasonal, but sometimes they are both intense and light. obviously, they match

Sure it's imperfect, but it's only at version 1.0. By version 3.0 no one will even want to talk to me; they'll just head straight for the TPS. And it's really easy - I just tell TPS what I want to post about, and it goes and finds 200 posts about it. Then it creates a trigram model of the posts it found and uses that model to construct a novel post of its own. How much time I'm saving! And it's no less coherent than most of my posts!

Tuesday, July 17, 2007

Intermediate Cases

An epigram:
"We may assume for this discussion that certain sequences of phonemes are definitely sentences, and that certain other sequences are definitely non-sentences. In many intermediate cases we shall be prepared to let the grammar itself decide, when the grammar is set up in the simplest way so that it includes the clear sentences and excludes the clear non-sentences." -Chomsky 1957, p. 14.

The idea:
The grammaticality of some sentences can't be determined easily by humans. So we make our grammar without considering these sentences. Then, once we have our grammar, we can look at these sentences to determine their grammaticality.

Some thoughts:
So this seems initially reasonable; people do this all the time. For instance, let's say you see someone who you can't tell the gender of. There is a theory that says there are certain body parts that men have and others that women have. This theory was built up based on observations from certainly-males and certainly-females. So we go to this androgynous person and see what body parts they have to determine their gender. If we want to be even more scientific about it, we could do genetic testing, look at their chromosomes, and have our answer.

Of course, nothing in life is easy like this - even in genetic testing there are border cases. For instance, there's Klinefelter's Syndrome, where a person has two X and one Y chromosome. Categoricality just isn't there - there're always intermediate cases. And so it's almost certainly not right to say there are grammatical and ungrammatical sentences and nothing in between. But let's just suppose that these strict categories exist for now.

The bigger problem is that grammar isn't separate from us. There is no such thing as a "right" grammar of English. (To the dismay of many old fuddy-duddies.) There is no oracle that we can go to and ask "Is this sentence grammatical?" What we call the grammar of a language is really the consensus of the idiolects of its speakers. the grammar of a language is what we say, not what we think we say, nor what we think we should say, nor what a theoretical construct says we should say. A cornerstone of modern linguistics is that our grammars are descriptive, not prescriptive. A descriptive grammar predicts what we think about sentences; it can't tell us what to think. If there is an intermediate case, a good grammar must predict that.

Returning to the gender example, asking a grammar to determine the grammaticality of an intermediately-grammatical sentence is akin to asking a theory to determine whether a person suffering from Klinefelter's Syndrome is a man or woman, when the person isn't entirely in either category. Or like saying that physics should be based on waves and particles and thus should be able to settle once and for all which one light is. But, it's both. It's intermediate. And we have to build theories around that. Likewise, we as linguists have to make our theories fit the intermediacy of some sentences, rather than asking the theories to tell us which way those sentences go.

I think that's what's problematic with Chomsky's idea, which is sort of ingrained in us (me, at least): our theories can't tell the data what to do.

Friday, May 18, 2007

Season's End

An intriguing headline (subheadline, really) I noticed in yesterday's Guardian:

(1) "Men's tennis completes its most successful season despite losing in the NCAA Division II tournament."

It's a wild scopal resolution problem, I think. You can't conclude a season despite losing in a tournament. The loss in the tournament is the event that causes the season to end. I think you could say something like "Florida completed its season despite winning the NCAA basketball tournament", but that's questionable to me. And there are certainly cases where "concluding a season despite losing" can happen:

(2) "College of Charleston completed their season despite losing in the Southern Conference Championship."

Where, had they won the championship game they could have entered the season-ending tournament, and given their showing the conference tournament, one might have expected them to be selected for the season-ending tournament, thereby continuing their season.

But in (1), the despite clause doesn't make sense modifying "completes its most successful season"
- unless we read it as modifying "most successful season". Surely this sentence is relatively fine, though:

(3) "It was our most successful season, despite losing in the tournament."

Is this a general fact, that despite clauses can modify either the VP or NP? These sentences seem to support NP-modification by despite:

(4a) Bill ate a satisfying meal despite eating day-old pizza.
(4b) Carl satisfied his obligation to the mobsters, despite only smashing a couple storefront windows.
(4c) Davida finished her paper despite typing it on a computer.
(4d) Zillah pruned the topiary despite using dull clippers.

But this still strikes me as a bit odd.

Thursday, April 19, 2007

Doncha Wanna Do Good?

In my post-breakfast trek from my room back to the sink to clean off the plate that formerly hosted my peanut-butter-and-granola toast, I passed by a small blue pamphlet lying discarded on the floor. It's one of my roommate's textbook accompaniments, and I personally think it's an affront to the purchaser of the textbook.

I haven't looked inside of the pamphlet, because it's still in its shrink wrap, but it comes with some sort of math or science textbook that is exceedingly thick and costly. Its title (and basically the only thing on the cover at all) is "Doncha Wanna Do Good?" Inside, I think, it contains a bunch of tips, tricks, and shortcuts to do the math or science problems more quickly. but here are my problems with it:

1. Tone
If there's anything I hate, it's when advertising attempts to take on the tone of your friend. Like those commercials where they try so hard to create a natural conversation where one person does nothing but endorse a single product with unwavering resolve.

A: Hey Bob, how's it goin'?
B: Great, ever since I started drinking Triphoxyline brand dietary supplement!
A: What's Triphoxyline brand dietary supplement?
B: Glad you asked! I don't know how Triphoxyline brand dietary supplement works, but it's got 14 antioxidants, 23 grams of protein, and a great taste!
A: [chuckling, reaching for B's glass of Tri-] Hey, let me try some!
B: [also chuckling, pulling the glass to his chest] Get your own! It's available at most grocers, in the soup aisle. Also at drugstores.

So anyway, come on. You're a textbook. You're not supposed to sound like a friend with a hangover trying to convince me to come to a study group. ("No, Tom, I think I'll watch Lost tonight instead of studying." "C'mon, man, doncha wanna do good?" "Oh, good point. Forget the Others.") And, not that I generally care, but it's "well". Please don't dumb yourself down when you talk to me.

2. Presupposition
My second objection is the presupposition in the title question. It implies that if I do not purchase this overpriced accompaniment, I do not want to do "good". (Perhaps if I don't buy it, I want to do well.) Furthermore, it implicates that I am not actually capable of doing good without the supplement. Frankly, J. Wiley and Sons, I think I am capable of doing good, thank you, and I would argue that you, as publishers, are likely far less equipped to do good in this class than I am. So yes, I wanna do good, which is why I'll read the textbook and study my notes and do good on my own, despite your unfair implicature. Jerks.

3. Proliferation
Now let's suppose that the information contained in "Doncha Wanna Do Good?" actually would significantly increase my chances of doing good in this class. I've already shelled out over $100 for your textbook. You couldn't put this Holy Grail of information into the textbook? If it's really the key to doing good, why should I even get the textbook? I could save like $80 right there.

4. Pamphlets
I am inherently distrustful of pamphlets.

So, in summary, J. Wiley and Sons, I don't like you.

Tuesday, March 20, 2007

Gesture and Intonation: brothers!

I just wanted to post a link to a very cool dissertation concerning specifics about the relationship between gesture and intonation. I feel like at some point we've been discussing these two things, and this seems pretty well done (from the small part I've read so far):

http://www9.georgetown.edu/faculty/loehrd/pubs_files/Loehr04.pdf


Cool Stuff!

Monday, March 19, 2007

More than just a cool stencil

Our blog shares its name with a remarkable branch of cognitive science!

"Distributed Cognition," to quote the eponymous Wikipedia article, proposes that "human knowledge and cognition are not confined to the individual." In contrast to the widespread focus on the performance of the individual mind, DC understands cognition as a social practice.

And here's the real quietus -- knowledge is not even confined to the collective cognition of human participants. Tools, like this blog, can be repositories of cognition with which we all interact.

I don't know about you, but I am wiping the froth from my mouth already.

One of the fertile minds that hath sired this attractive baby (if you'll allow me some intellectual favoritism, which is quite un-DCish) is that of none other than UCSD's own Edwin Hutchins. A mighty fortress is our alma mater.

To where our distributed cognition takes us, friends.

Grant

Thursday, March 15, 2007

inceptum

Hey guys... just thought I'd take a break from reviewing the McCarthy article and claim this URL. If Grant wants to host the blog, that it totally cool... but I thought in case that doesn't work this would be easy.

The goal of this blog (wherever it is/whatever it is called):
to have a place where we can comment on linguistics... problems with what we've been studying, things that would be cool to study, things we're unclear about, thing that seem wrong, experiments that need (to be) run

The idea is that 5 minds are better than one. Also, Gabe's hopes of establishing a "school of thought" could be realized, and maybe we could contribute to bringing a little more sense to this field full of very brilliant and very stupid ideas :-)

Also, if we do it on blogspot, all you need to edit the blog is a google account.