One of the reason for creation of this tagger was the availability of publicly available accurate multilingual and domain independent temporal tagger for extraction and normalization of temporal expression. Heideltime developed to fulfill the following requirements. A. Extraction and normalization should be of high quality. B. High quality results should be achieved across domain. C. Further languages should be integerable without modifying the source code. D. The architecture should allow the integration of new modules, e.g. for additional implicit expressions. E. When needed, adding and modifying rules should be simple. HeidelTime is developed as a rule based system. The following are the reasons why it is rule-based system; 1. The divergence …show more content…
There are many similarities between TIMEX3 and TIMEX2 and it is possible to convert from TIMEX3 TO TIMEX2 tags, even if some attributes are not supported. Similar to the transformation from TIMEX2 to TIMEX3 described by [33], though the other way around. This conversion method helps the temporal tagger to use TIMEX3 annotated corpora for evaluation. HeidelTime’s Architecture The most important feature HeidelTime architecture is the strict separation between the algorithmic part, i.e., the source code, and the resources for patterns, rules, and normalization information. HeidelTime resources are organized in modular manner. A new resource added to the system, it automatically loaded by HeidelTime and built according to HeidelTime convention. Figure 3.1: HeidelTime’s system architecture with algorithm (source code) and resources Extraction and normalization of temporal expressions are the two major tasks for temporal …show more content…
However, other constraints can be set as well, e.g., the part-of-speech tag of a specific token in the expression itself or before or after the temporal expression. For the normalization, it use normalization resources containing mappings between an expression and its value in standard format. Furthermore, linguistic clues are applied to normalize ambiguous expressions. For example, the tense of a sentence may indicate the temporal relation between an expression and its reference time. HeidelTime’s resources The HeidelTime algorithm read and interpret HeidelTime resources and organized in a directory structure. There are three directories for every language used, the three resources are (1) pattern resources, (2) normalization resources, and (3) rule resources. Within these directories, every resource item is represented as a file in which one can easily modify the resource or include comments and examples without influencing the resource itself. The following paragraphs describe in detail the three