his.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Wrapping XML-Sources to Support Update Awareness
Högskolan i Skövde, Institutionen för datavetenskap.
2000 (Engelska)Självständigt arbete på avancerad nivå (magisterexamen)Studentuppsats
Abstract [en]

Data warehousing is a generally accepted method of providing corporate decision support. Today, the majority of information in these warehouses originates from sources within a company, although changes often occur from the outside. Companies need to look outside their enterprises for valuable information, increasing their knowledge of customers, suppliers, competitors etc.

The largest and most frequently accessed information source today is the Web, which holds more and more useful business information. Today, the Web primarily relies on HTML, making mechanical extraction of information a difficult task. In the near future, XML is expected to replace HTML as the language of the Web, bringing more structure and content focus.

One problem when considering XML-sources in a data warehouse context is their lack of update awareness capabilities, which restricts eligible data warehouse maintenance policies. In this work, we wrap XML-sources in order to provide update awareness capabilities.

We have implemented a wrapper prototype that provides update awareness capabilities for autonomous XML-sources, especially change awareness, change activeness, and delta awareness. The prototype wrapper complies with recommendations and working drafts proposed by W3C, thereby being compliant with most off-the-shelf XML tools. In particular, change information produced by the wrapper is based on methods defined by the DOM, implying that any DOM-compliant software, including most off-the-shelf XML processing tools, can be used to incorporate identified changes in a source into an older version of it.

For the delta awareness capability we have investigated the possibility of using change detection algorithms proposed for semi-structured data. We have identified similarities and differences between XML and semi-structured data, which affect delta awareness for XML-sources. As a result of this effort, we propose an algorithm for change detection in XML-sources. We also propose matching criteria for XML-documents, to which the documents have to conform to be subject to change awareness extension.

Ort, förlag, år, upplaga, sidor
Skövde: Institutionen för datavetenskap , 2000. , s. 112
Nyckelord [en]
Update Awareness, XML, Change Detection, Data Warehousing, Wrapping
Nationell ämneskategori
Systemvetenskap, informationssystem och informatik
Identifikatorer
URN: urn:nbn:se:his:diva-488OAI: oai:DiVA.org:his-488DiVA, id: diva2:2867
Presentation
(Engelska)
Uppsök
samhälle/juridik
Handledare
Tillgänglig från: 2008-01-11 Skapad: 2008-01-11 Senast uppdaterad: 2018-01-12

Open Access i DiVA

fulltext(8737 kB)138 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 8737 kBChecksumma SHA-512
8118d79b8edadb97d511aa2afe075f327e93b3f12e3c55a7e71457311f40a56b794fe7fd9da06228af65e0ed9269101d976845426006d713d68df421d0a2aeb8
Typ fulltextMimetyp application/pdf

Av organisationen
Institutionen för datavetenskap
Systemvetenskap, informationssystem och informatik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 185 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 211 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf