his.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Dura, Elzbieta
Publications (6 of 6) Show all publications
Dura, E. (2009). Term ambiguity and variation in biomedical nomenclature and literature - problems for information extraction. In: Zygmunt Vetulani (Ed.), Proceedings of the 4th Language and Technology Conference: . Paper presented at 4th Language & Technology Conference, November, 6-8, 2009, Poznań, Poland (pp. 492-496). Fundacja Uniwersytetu im A. Mickiewicza
Open this publication in new window or tab >>Term ambiguity and variation in biomedical nomenclature and literature - problems for information extraction
2009 (English)In: Proceedings of the 4th Language and Technology Conference / [ed] Zygmunt Vetulani, Fundacja Uniwersytetu im A. Mickiewicza , 2009, p. 492-496Conference paper, Published paper (Refereed)
Abstract [en]

Named entity recognition in life sciences is reported to achieve up to 0.9 F-score when tested on test corpora. The results which are obtained for casually chosen texts are usually not as good. We believe that the task may still be underestimated, together with the basic tasks of tokenization. We present here the problems which we have encountered in our attempt to identify gene names and chemical substance names in research articles. The two problems which information extraction has to cope with are language variation and ambiguity. Both are present not only in unstructured texts but also in the nomenclature of life sciences. We also note the discrepancies between the nomenclature registered in terminologies and the actual use of terms in texts. These problems are intimately entangled with text segmentation problems.

Place, publisher, year, edition, pages
Fundacja Uniwersytetu im A. Mickiewicza, 2009
National Category
Computer and Information Sciences
Research subject
Technology
Identifiers
urn:nbn:se:his:diva-3545 (URN)978-83-7177-746-2 (ISBN)
Conference
4th Language & Technology Conference, November, 6-8, 2009, Poznań, Poland
Available from: 2010-01-07 Created: 2010-01-07 Last updated: 2018-01-12Bibliographically approved
Dura, E. & Gawronska, B. (2008). Natural Language Processing in Information Fusion Terminology Management. In: Proceedings of the 11th International Conference on Information Fusion. Paper presented at 11th International Conference on Information Fusion, FUSION 2008;Cologne;30 June 2008through3 July 2008 (pp. 1388-1395). IEEE
Open this publication in new window or tab >>Natural Language Processing in Information Fusion Terminology Management
2008 (English)In: Proceedings of the 11th International Conference on Information Fusion, IEEE , 2008, p. 1388-1395Conference paper, Published paper (Refereed)
Abstract [en]

 

The dynamic development of information fusion research implies introduction of new terms and concepts, which in turn requires tools and methods for terminology organization and standardization, as well as tools for creating domain-specific ontology. In this paper, we show how natural language processing and corpus technology tools applied for term extraction from texts in biomedicine can successfully be used for the field of information fusion. We demonstrate term and information extraction from a corpus of research articles in information fusion, showing how a vision of a combined text retrieval and information extraction service can be made real.

 

Place, publisher, year, edition, pages
IEEE, 2008
Keywords
Text databases, information extraction, term extraction, soft data, natural language processing
Research subject
Technology
Identifiers
urn:nbn:se:his:diva-3608 (URN)2-s2.0-56749172493 (Scopus ID)978-3-00-024883-2 (ISBN)
Conference
11th International Conference on Information Fusion, FUSION 2008;Cologne;30 June 2008through3 July 2008
Available from: 2010-01-29 Created: 2010-01-29 Last updated: 2017-11-27
Dura, E. & Gawronska, B. (2007). Novelty Extraction from Special and parallel corpora. In: Proceedings of 3rd Language & Technology Conference 2007. Paper presented at 3rd Language and Technology Conference, LTC 2007;Poznan;5 October 2007through7 October 2007 (pp. 305-309). Springer Berlin/Heidelberg
Open this publication in new window or tab >>Novelty Extraction from Special and parallel corpora
2007 (English)In: Proceedings of 3rd Language & Technology Conference 2007, Springer Berlin/Heidelberg, 2007, p. 305-309Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2007
Series
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; 5603 LNAI
National Category
Computer Sciences
Research subject
Technology
Identifiers
urn:nbn:se:his:diva-2222 (URN)000270337100025 ()2-s2.0-70349339535 (Scopus ID)978-83-7177-407-2 (ISBN)
Conference
3rd Language and Technology Conference, LTC 2007;Poznan;5 October 2007through7 October 2007
Available from: 2008-10-06 Created: 2008-10-06 Last updated: 2018-01-12
Gawronska, B., Olsson, B., Erlendsson, B., Lindlöf, A. & Dura, E. (2006). Automated text analysis of biomedical abstracts applied to the extraction of signaling pathways involved in plant cold-adaptation. In: the Fifth International Conference on Bioinformatics of Genome Regulation and Structure: vol 3 (pp. 296-299). Russian Academy of Sciences
Open this publication in new window or tab >>Automated text analysis of biomedical abstracts applied to the extraction of signaling pathways involved in plant cold-adaptation
Show others...
2006 (English)In: the Fifth International Conference on Bioinformatics of Genome Regulation and Structure: vol 3, Russian Academy of Sciences, 2006, p. 296-299Conference paper, Published paper (Other academic)
Abstract [en]

Motivation: Automated text analysis is an important tool for facilitating the extraction of knowledge from biomedical abstracts, thereby enabling researchers to build pathway models that integrate and summarize information from a large number of sources. Advanced methods of in-depth analysis of texts using grammar-based approaches developed within the field of computational linguistics must be adapted to the special requirements and challenges posed by biomedical texts, so that these methods can be made available to the bioinformatics and computational biology communities. Results: Our system for automated text analysis and extraction of pathway information is here applied to a set of PubMed abstracts concerning the CBF signaling pathway, which is a key pathway involved in the cold-adaptation response of plants subjected to cold non-freezing temperatures. The system successfully and accurately re-discovers the main features of this pathway, while also pointing to interesting and plausible new hypotheses. The evaluation also reveals a number of issues which will be important targets in the continued development of the system, e.g. the need for an extended lexicon of taxonomic terms and an improved procedure for recognition of sentence boundaries.

Place, publisher, year, edition, pages
Russian Academy of Sciences, 2006
Identifiers
urn:nbn:se:his:diva-1928 (URN)000243859500067 ()5-7692-0848-1 (ISBN)
Available from: 2007-09-21 Created: 2007-09-21 Last updated: 2017-11-27
Dura, E., Gawronska, B., Olsson, B. & Erlendsson, B. (2006). Towards Information Fusion in Pathway Evaluation: Encoding Relations in Biomedical Texts. In: The 9th International Conference on Information Fusion: Florence, Italy, 10-13 July 2006 (pp. 240-247). IEEE Press
Open this publication in new window or tab >>Towards Information Fusion in Pathway Evaluation: Encoding Relations in Biomedical Texts
2006 (English)In: The 9th International Conference on Information Fusion: Florence, Italy, 10-13 July 2006, IEEE Press, 2006, p. 240-247Conference paper, Published paper (Other academic)
Abstract [en]

The long-term goal of the research presented in this paper is to incorporate linguistic text analysis into a system for evaluation of biological pathways. In this system, relations extracted from biomedical texts will be compared with pathways encoded in existing specialized databases. In this way, the biologist's conclusions regarding the plausibility and/or novelty of a certain relation between genes, proteins, etc., can be supported by fused information from biological databases and biological literature. We aim at overcoming the shortcomings of existing systems for information retrieval by proposing a method based on thorough linguistic analysis of a large text corpus. In this paper, we present a comparative analysis of two corpora: one consisting of biomedical texts from PubMed, the other one of general English prose. The results stress the importance of taking multiword entries into account when constructing a system for extracting biological relations from texts

Place, publisher, year, edition, pages
IEEE Press, 2006
Identifiers
urn:nbn:se:his:diva-1916 (URN)10.1109/ICIF.2006.301666 (DOI)000245998000106 ()2-s2.0-50149093482 (Scopus ID)0-9721844-6-5 (ISBN)
Available from: 2007-09-21 Created: 2007-09-21 Last updated: 2017-11-27
Dura, E. & Gawronska, B. (2005). Towards Automatic Translation of Support Verbs Constructions: the Case of Polish 'robić/zrobić' and Swedish 'göra'. In: Zygmunt Vetulani (Ed.), Zygmunt Vetulani (Ed.), Human language technologies as a challenge for computer science and linguistics: 2nd Language & Technology Conference, April, 21-23, 2005, Poznań, Poland: proceedings. Paper presented at 2nd Language & Technology Conference, April, 21-23, 2005, Poznań, Poland (pp. 450-454). Paper presented at 2nd Language & Technology Conference, April, 21-23, 2005, Poznań, Poland. Poznań: Wydawnictwo Naukowe Uniwersytetu im. Adama Mickiewicza
Open this publication in new window or tab >>Towards Automatic Translation of Support Verbs Constructions: the Case of Polish 'robić/zrobić' and Swedish 'göra'
2005 (English)In: Human language technologies as a challenge for computer science and linguistics: 2nd Language & Technology Conference, April, 21-23, 2005, Poznań, Poland: proceedings / [ed] Zygmunt Vetulani, Poznań: Wydawnictwo Naukowe Uniwersytetu im. Adama Mickiewicza, 2005, p. 450-454Chapter in book (Refereed)
Abstract [en]

Support verb constructions range from idiosyncratic to predictable. Lexical functions provide a solution to translation of idiosyncratic constructions only. Our corpus research aims to contribute to automatic translation of support verb constructions where the verb selects certain semantic groups of collocates, and where novel collocations can be expected. We investigate samples of support verb constructions with Polish robić/zrobić and Swedish göra. Nouns attested on the Internet as objects of these verbs are subdivided into semantic groups. Translation rules are then proposed for each group, and the similarities and differences in the behaviour of the verbs in both languages are discussed.

Place, publisher, year, edition, pages
Poznań: Wydawnictwo Naukowe Uniwersytetu im. Adama Mickiewicza, 2005
Identifiers
urn:nbn:se:his:diva-1701 (URN)83-7177-341-2 (ISBN)978-83-7177-341-9 (ISBN)
Conference
2nd Language & Technology Conference, April, 21-23, 2005, Poznań, Poland
Available from: 2007-08-13 Created: 2007-08-13 Last updated: 2017-11-27Bibliographically approved
Organisations

Search in DiVA

Show all publications