Extending a Persian Morphological Analyzer to Blogs
Abstract
This paper describes a two-level morphological analyzer for Persian using a system based on the Xerox finite state tools. Persian language presents certain challenges to computational analysis: There is a complex verbal conjugation paradigm which includes long-distance morphological dependencies; phonological alternations apply at morpheme boundaries; word boundaries are difficult to define since morphemes may be detached from their stems and distinct words can appear without an intervening space. In this work, we develop these problems and provide solutions in a finite-state morphology system. The paper also presents an overview of new issues that have arisen since the advent of blogs and the propagation of informal Persian text on the web. This new mode of writing provides the computational system with further challenges. The paper proposes approaches for extending the current morphological system to analyze the material found in Persian blogs.
Public released
N/A
External link:
Download Document
(if available)