Named entity recognition in life sciences is reported to achieve up to 0.9 F-score when tested on test corpora. The results which are obtained for casually chosen texts are usually not as good. We believe that the task may still be underestimated, together with the basic tasks of tokenization. We present here the problems which we have encountered in our attempt to identify gene names and chemical substance names in research articles. The two problems which information extraction has to cope with are language variation and ambiguity. Both are present not only in unstructured texts but also in the nomenclature of life sciences. We also note the discrepancies between the nomenclature registered in terminologies and the actual use of terms in texts. These problems are intimately entangled with text segmentation problems.