Using document annotation and n-gram analysis to extract, link, and retrieve information from corpora. This has been used by a variety of clients, particularly to condense, summarise, and index large volumes of reports.
Among other tools, we’ve particularly enjoyed using GATE.
- Large scale extraction of information (200+ documents collected over 5 years)
- Derivation of an index of issues with annotations in context for future reference
- Quantitative analysis of term and issue frequency
analysis information-extraction text-analysis semantic-analysis