Seminar on Computational Linguistics
Practicum on Machine Translation with Morphological Analysis
John Goldsmith
Spring 2005
** Statistical Machine Translation: Final Report JHU Workshop 1999 (Al-Onaizan et al.) **
Thanks to Okan Kolak, Philip Resnik, and Gina Levow in helping us get Egypt up on lingua.cs -- JG
1. Some papers on statistical MT to start off with.
Knight, Kevin. 1997. Automating Knowledge Acquisition for Machine Translation. AI magazine 18(4).
Knight, Kevin. Teaching Statistical Machine Translation.
*Knight, Kevin. 1999. A Statistical MT Tutorial Workbook. (See also Kevin Knight's list of publications.)
*Brown, P., J. Cocke, S. Della Pietra, V. Della Pietra, F. Jelinek, J. Lafferty,
R. Mercer, and P. Roossin, "A
Statistical Approach to Machine Translation," Computational Linguistics,
16(2).
*Brown, P., S. Della Pietra, V. Della Pietra, and R. Mercer, Computational Linguistics, 19(2). The mathematics of statistical machine translation: parameter estimation..
Good overview powerpoint by Bonnie Dorr and Christof Monz.
Good overview powerpoint, with emphasis on text alignment, by Leila Kosseim.
My running notes on this.
Robert Moore. Improving IBM Word-Alignment Model 1. Be aware that page 2 seems to have a glitch that makes it impossible (for me) to print. I had to print the first page, then the 3rd to the last. If you don't have this problem, let me know!
2. Some material on the history of machine translation. I expect you to read this by yourself.
3. Evaluation procedures:
Bleu: A method for automatic evaluation of machine translation. Papineni et al., 2001.
4. Useful links for getting things set up:
Evaluation of two word alignment system, by Xiaoyang Wang. 2004.+
5. Morphology and MT for May 26
Normalizing
German and English inflectional morphology to improve statistical word alignment.
Simon Corston-Oliver and Michael Gamon. In R. E. Frederking and K. B. Taylor
(eds.) Machine Translation: From Real Users to Research. Springer Verlag.
Statistical machine translation with scarce resources using morpho-syntactic information. Sonja Niessen and Hermann Ney. Computational Linguistics 30(2) 181-204. 2004. You can access this through the U of Chicago library links.
Reducing parameter space for word alignment. Herve Dejean, Eric Gaussier, Cyril Goutte, and Kenji Yamada. HTL-NAACL 2003 Workshop.
Word clustering for distributional categories
Peter F. Brown, Vincent J. Della Pietra, Peter V. deSourza, Jenifer Lai, Robert L. Mercer. 1990. Class-based N-gram models of natural language. That is the original version: here is the link to the Computational Linguistics (journal) version.(published 1992)
Reinhard Kneser and Hermann Ney. 1991. Forming word classes by statistical clustering for statistical language modeling.Decoding
Link to ISI (Daniel Marcu and Ulrich Germann) ReWrite Decoder
Decoding Algorithm in Statistical Machine Translation Ye-Yi Wang and Alex Waibel. 1997. Note that this precedes Giza by two years.
Fast decoding and optimal decoding for machine translation. Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada. Artificial Intelligence 154 (1). 127-43. 2004. This paper is really 2001 (ACL meeting in Toulouse). It compares A* search with a greedy search and a TSP IP solution. The A* search is based on the 1995 IBM patent application. The greedy search is revised and improved in the following paper:
Greedy decoding for statistical machine tranlation in almost linear time. Ulrich Germann. 2003. Preceedings of HLT-NAACL. Edmonton, Canada.
An Efficient A* Search Algorithm for Statistical Machine Translation. Franz Josef Och, Nicola Ueffing, Hermann Ney. 2001.
Statistical Phrase-Based Translation. Philipp Koehn, Franz Josef Ock and Daniel Marcu. 2003.
Syntax in the model: please read these for May 19 and 24.
A syntax-based statistical translation model. 2003. Kenji Yamada and Kevin Knight. ACL 2001.Syntax-based languag emodels for statistical machine translation. 2003 Eurgene Charniak, Kevin Kngiht, and Kenji Yamada.Proceedings of MT Summit IX.