Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Wrapping XML-Sources to Support Update Awareness
University of Skövde, Department of Computer Science.
2000 (English)Independent thesis Advanced level (degree of Master (One Year))Student thesis
Abstract [en]

Data warehousing is a generally accepted method of providing corporate decision support. Today, the majority of information in these warehouses originates from sources within a company, although changes often occur from the outside. Companies need to look outside their enterprises for valuable information, increasing their knowledge of customers, suppliers, competitors etc.

The largest and most frequently accessed information source today is the Web, which holds more and more useful business information. Today, the Web primarily relies on HTML, making mechanical extraction of information a difficult task. In the near future, XML is expected to replace HTML as the language of the Web, bringing more structure and content focus.

One problem when considering XML-sources in a data warehouse context is their lack of update awareness capabilities, which restricts eligible data warehouse maintenance policies. In this work, we wrap XML-sources in order to provide update awareness capabilities.

We have implemented a wrapper prototype that provides update awareness capabilities for autonomous XML-sources, especially change awareness, change activeness, and delta awareness. The prototype wrapper complies with recommendations and working drafts proposed by W3C, thereby being compliant with most off-the-shelf XML tools. In particular, change information produced by the wrapper is based on methods defined by the DOM, implying that any DOM-compliant software, including most off-the-shelf XML processing tools, can be used to incorporate identified changes in a source into an older version of it.

For the delta awareness capability we have investigated the possibility of using change detection algorithms proposed for semi-structured data. We have identified similarities and differences between XML and semi-structured data, which affect delta awareness for XML-sources. As a result of this effort, we propose an algorithm for change detection in XML-sources. We also propose matching criteria for XML-documents, to which the documents have to conform to be subject to change awareness extension.

Place, publisher, year, edition, pages
Skövde: Institutionen för datavetenskap , 2000. , p. 112
Keywords [en]
Update Awareness, XML, Change Detection, Data Warehousing, Wrapping
National Category
Information Systems
Identifiers
URN: urn:nbn:se:his:diva-488OAI: oai:DiVA.org:his-488DiVA, id: diva2:2867
Presentation
(English)
Uppsok
Social and Behavioural Science, Law
Supervisors
Available from: 2008-01-11 Created: 2008-01-11 Last updated: 2018-01-12

Open Access in DiVA

fulltext(8737 kB)183 downloads
File information
File name FULLTEXT02.pdfFile size 8737 kBChecksum SHA-512
8118d79b8edadb97d511aa2afe075f327e93b3f12e3c55a7e71457311f40a56b794fe7fd9da06228af65e0ed9269101d976845426006d713d68df421d0a2aeb8
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 230 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 237 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf