You can't understand the answer if you don't understand the question.
I think that the longer you spend thinking about language, the more you spend your time
trying to understand what the question is.
I spend a lot of time working on algorithms that take relatively raw representations of
natural language and infer structure in the device that generated the data. Why?
Because there is no more direct a way to understand the idea of structure in
the data. (If you don't think it's too metaphysical, we could even speak of structure
that is immanent in the data.) We live in a cognitive age: the conventional wisdom
is that this structure exists only in a derivative sense: it is dependent on the
existence of a psychological faculty that inheres in speakers and hearers. I don't
think we need to go there. There is another way to understand the reality of
structure in the data.
So no sooner is the question clearly posed than it twists and turns into
a philosophical question before our very eyes.
And here's another sad fact: the only way you can understand the answers provided
by a generation of scholars is to go back to what they were learning when they were
students so that you can figure out what the question is that they are answering.
We can trust scholars to give us their best understanding of the answers they have
found, but not to give us an equally good understanding of the question. And if you
don't understand the question, then you don't really understand the answer. So what's
to be done?
One answer is to read the thoughts of the people who have created the thoughts of
the preceding generations. It's not just one answer; it's by far the most important
answer.
All of this is some kind of explanation for why my publications seem to be all
over the place: in the last ten years, I've been writing about unsupervised computational
learning of morphology, about the history of linguistics, about empiricism and what
it might mean for linguistics, about maximum likelihood models in phonology,
about tone in Tonga, Shi, and Kirundi. They are just ways to understanding
the question, that's all.
Hmm. Here's how a U of C cartoonist saw me.
Here are slides from a talk at Ohio State University this coming week, entitled "Optimization
is the answer. Now, what is the question?"
|
Recent papers
Syllables. A chapter in the forthcoming second volume of the Handbook of Phonological Theory.
Theory, kernels, data, methods.Not a formal paper, but a paper that I read at the 17th Manchester Phonology Meeting, May 2009.
Segmentation and morphology. To appear in The Handbook of Computational Linguistics and Natural Language Processing. September 2008. An overview
of computational work on morphology and on the problem of learning to segment words and morphemes.
Information theoretic approaches to phonology: the case of Finnish vowel harmony With Jason Riggle. November 2007.
Draft of: Your Turing Machine or Mine? June 2007.
A video presentation of this at UCL: meeting on Machine Learning and Cognitive Science of Language Acquisition
Towards a new empiricism June 2007. This will appear as part of a book written with Alex Clark, Nick Chater, and Amy Perfors.
Generative phonology in the late 1940s. February 2007. This appeared more recently in Phonology.
Analogy in morphology. February 2007.
Learning phonological categories
December 2006. With Aris Xanthos. This paper appeared in Language, March 2009.
Probability for linguists. March 2007
Generative phonology: its origins, its
principles, its successors. With Bernard Laks.
An algorithm for the unsupervised learning
of morphology
|
The Linguistica Project
See our Linguistica homepage at Linguistica.uchicago.edu
for executable,
for source code, and for research papers. This project
is the development of a computer program that automatically performs
morphological analysis of a raw text corpus that you give it.
|
Bantu tone
I'm working on two papers on tone in Bantu languages, one on Shi,
based on material in Louise Polak-Bynon's grammar (a brief handout on this), and one on KiHunde,
based on material I have gathered.
|
Observations
When you can measure what you are speaking about, and
express it in numbers, you know something about it; but when you cannot
measure it, when you cannot express it in numbers, your knowledge is of
a meagre and unsatisfactory kind. It may be the beginning of knowledge,
but you have scarcely, in your thoughts, advanced to the stage of
science. William Thomson, Lord Kelvin.
|
|
The History of Linguistics
Towards a new empiricism is a paper in progress, since the fall of 2006. The point of the paper is to justify a view of linguistics
that sees linguistics as a science and neither a part nor a satellite of psychology, and one that places equal emphasis on
theoretical insight and on empirical coverage. Best of all, it provides an explicit means for testing and justifying
hypotheses, through the good offices of bayesian reasoning, and Minimum Description Length analysis in particular.
A paper (with Bernard Laks, Université de Paris X) on the history of generative phonology.
A recent review article on Bruce Nevin's book on Zellig Harris. (That's a photo of Zellig Harris, on the right.)
My 2004 CLS
paper on the role of the algorithm in generative grammar.
See also the reference to a paper below on information theory,
entropy, and phonology in the 20th century.
Geoff Huck and I wrote a book Ideology
and Linguistic Theory (1995) dealing with generative semantics
and interpretive semantics in the post-Aspects period, showing
how many of the critical suggestions of generative semantics were
integrated into mainstream linguistic thought in following generations,
and that the professional battle waged during this period was often
disconnected from the intellectual issues that were referenced.
One of our earlier papers is available here: Distributionalist
and Mediationalist Themes in the Development of Linguistic Theory.
A review
article on Robert Barsky's Noam Chomsky: A Life of Dissent.
|
Phonology
Along with Jason Riggle and Aris Xanthos, I have been working over the past couple of years on rethinking phonological theory
from a bayesian point of view, asking the question: Can phonology be understood as the search for models that are
the most probable, given what we know of the phonological complexities of the world's languages? Out of this work have
come two papers so far, one focusing on learning phonological categories and the other on treating vowel harmony
in Finnish.
Alan Yu, Jason Riggle, and I are editing a second volume of the Handbook of Phonological
Theory. I edited the first edition, which came out in 1995. A book that I edited, Essential Readings in Phonological Theory, was published by Blackwell's in 1999. The second volume should come out in late 2009 or so.
What is phonology? First chapter of a book that I'm going to finish before too long, probably entitled What is Phonology? This chapter is primarily about flapping in American English.
Probabilistic
models of grammar: phonology as information minimization.
First paper on autosegmental phonology: pretty rough. November 1973 My second paper on autosegmental phonology, but the first one with a theory and a name. Marginal comments from Noam Chomsky. Spring 1974. Strange to look at it now.
|
Bioinformatics
Work with Ridg Scott, Terry Clark, and Jing Liu on extending algorithms
that have been developed in computational linguistics to applications
in bioinformatics.
|
|