The long-term goal of the research presented in this paper is to incorporate linguistic text analysis into a system for evaluation of biological pathways. In this system, relations extracted from biomedical texts will be compared with pathways encoded in existing specialized databases. In this way, the biologist's conclusions regarding the plausibility and/or novelty of a certain relation between genes, proteins, etc., can be supported by fused information from biological databases and biological literature. We aim at overcoming the shortcomings of existing systems for information retrieval by proposing a method based on thorough linguistic analysis of a large text corpus. In this paper, we present a comparative analysis of two corpora: one consisting of biomedical texts from PubMed, the other one of general English prose. The results stress the importance of taking multiword entries into account when constructing a system for extracting biological relations from texts
ISBN: 0-9721844-6-5