Högskolan i Skövde

his.sePublikasjoner
Endre søk
Begrens søket
1 - 14 of 14
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Bae, Juhee
    et al.
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    Ventocilla, Elio
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    Riveiro, Maria
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    Helldin, Tove
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    Falkman, Göran
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    Evaluating Multi-Attributes on Cause and Effect Relationship Visualization2017Inngår i: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017): Volumne 3: IVAPP / [ed] Alexandru Telea; Jose Braz; Lars Linsen, SciTePress, 2017, s. 64-74Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents findings about visual representations of cause and effect relationship's direction, strength, and uncertainty based on an online user study. While previous researches focus on accuracy and few attributes, our empirical user study examines accuracy and the subjective ratings on three different attributes of a cause and effect relationship edge. The cause and effect direction was depicted by arrows and tapered lines; causal strength by hue, width, and a numeric value; and certainty by granularity, brightness, fuzziness, and a numeric value. Our findings point out that both arrows and tapered cues work well to represent causal direction. Depictions with width showed higher conjunct accuracy and were more preferred than that with hue. Depictions with brightness and fuzziness showed higher accuracy and were marked more understandable than granularity. In general, depictions with hue and granularity performed less accurately and were not preferred compared to the ones with numbers or with width and brightness.

    Fulltekst (pdf)
    fulltext
  • 2.
    Bae, Juhee
    et al.
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Ventocilla, Elio
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Riveiro, Maria
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Torra, Vicenç
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    On the Visualization of Discrete Non-additive Measures2018Inngår i: Aggregation Functions in Theory and in Practice AGOP 2017 / [ed] Vicenç Torra; Radko Mesiar; Bernard De Baets, Springer, 2018, s. 200-210Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Non-additive measures generalize additive measures, and have been utilized in several applications. They are used to represent different types of uncertainty and also to represent importance in data aggregation. As non-additive measures are set functions, the number of values to be considered grows exponentially. This makes difficult their definition but also their interpretation and understanding. In order to support understability, this paper explores the topic of visualizing discrete non-additive measures using node-link diagram representations.

  • 3.
    Ventocilla, Elio
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    Big Data programming with Apache Spark2019Inngår i: Data science in Practice / [ed] Alan Said; Vicenç Torra, Springer, 2019, s. 171-194Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    In this chapter we give an introduction to Apache Spark, a Big Data programming framework. We describe the framework’s core aspects as well as some of the challenges that parallel and distributed computing entail.

  • 4.
    Ventocilla, Elio
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    On Making Machine Learning Accessible for Exploratory Data Analysis Through Visual Analytics2017Rapport (Annet vitenskapelig)
    Abstract [en]

    Visual Analytics is a field of study that seeks to aid human cognition in the process of analyzing data. It aims at doing so through visual interfaces and automated computations. Such automated computations often translate to Machine Learning algorithms. Users can leverage from these algorithms in order to find interesting patterns in massive and unstructured data. These algorithms are, however, still often regarded as black-boxes i.e. mathematical models which are hard to interpret and difficult to interact with. In a single Machine Learning library more than 40 variants of algorithms can be found, and some with up to 14 different parameters. The Visual Analytics community has pointed out a lack of research in helping users set parameters and compare results between algorithms. Moreover, it is argued that most technology in the field, which has aimed at bringing transparency to black-boxes, is embedded in domain-specific systems thus making it hard to reach and use for other users and in other domains. In general, Machine Learning has an accessibility challenge: it is hard to use, to interpret and, if not either of these, to reach. This research proposal motivates the need for research in making Machine Learning more accessible for exploratory data analysis, reviews existing work in the field, presents a research plan and, finally, describes preliminary results.

  • 5.
    Ventocilla, Elio
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Visualizing and Explaining Cluster Patterns: A Framework for the Exploratory Analysis of Large Multidimensional Datasets2019Rapport (Annet vitenskapelig)
    Abstract [en]

    Large quantities of data are being collected and analyzed by companies and institutions, with the intention of drawing knowledge and value. Advances in storage, computation, automated analysis and visual and interactive techniques have facilitated this process. It is, however, not always transparent on how these can be brought together for the effective and efficient exploration, monitoring and/or processing of data. That is, knowing which automated techniques and frameworks to use for a given task, how to deploy them and integrate them, how to interpret their results and processes, and how to ease their use through visual and interactive techniques.

    In the context of exploratory data analysis, where users approach data without preconceived hypotheses, this thesis proposal argues for the benefit of a framework describing the components, techniques and relations that contribute to the visualization and explanation of cluster patterns in large and multidimensional datasets. That is, a Visual Analytics framework aimed at supporting data scientists in their first steps towards understanding the overall structure of a dataset. The problem area is large and, therefore, the scope is limited to the visualization of cluster patterns in table-like datasets with meaningful attributes.

    This thesis proposal motivates the relevance of conducting research for the development of such framework. It formalizes a set of research questions and presents a research plan based on the design science research method. Moreover, it provides a description of preliminary results as well as related background theory and state-of-the-art research.

    Fulltekst (pdf)
    fulltext
  • 6.
    Ventocilla, Elio
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Visualizing Cluster Patterns at Scale: A Model and a Library2021Doktoravhandling, med artikler (Annet vitenskapelig)
    Abstract [en]

    Large quantities of data are being collected and analyzed by companies and institutions, with the aim of extracting knowledge and value. When little is known about the data at hand, analysts engage in exploratory data analysis to achieve a better understanding. One approach in doing so is through the modeling and visualization of a dataset's structure, i.e., the neighborhood relations among its data points, and their distribution in the multidimensional space. Such a process allows users to disclose and discover neighborhoods, outliers and cluster patterns - insights that enable more informed subsequent analytical decisions.

    Visualizing the structure of multidimensional data (i.e., with four or more dimensions or features) is generally done via two steps: modeling distance or neighborhood relations among data points, and visually encoding those modeled relations. As datasets grow in size (number of data points) and dimensionality (number of features), different scalability challenges arise. High-dimensional datasets, on the one hand, are more sparse in the multidimensional space, making it more difficult to make meaningful assessments about distances during the modeling, which, in turn, hinders the meaningfulness of the visual representations. Large datasets, on the other hand, make it more difficult to maintain the usability of a system in terms of the effectiveness of the visual representation (due to clutter or overplotting), or the efficiency of the solution (time and memory-wise). Different approaches have been proposed to overcome these challenges, but they apply to a particular combination of modeling and visual encoding, and their usability still degrades when dealing with very large, potentially distributed, multidimensional datasets.

    Moreover, the availability or format of existing Visual Analytics solutions (i.e., solutions that aid data analysis through Machine Learning, visual and interactive techniques) for visualizing data structure - and, hence, cluster patterns - presents an accessibility challenge to the data science community. Namely, many solutions are either unavailable for use, thus requiring their re-implementation, or come as domain-dependent or standalone applications which are too rigid to use in other scenarios, or to integrate with other data analysis tools. 

    This thesis addresses these challenges and makes two contributions: a process model, describing a generic approach for the effective and efficient visualization of cluster patterns in large and multidimensional datasets; and an open source library for the interactive visualization of cluster patterns, even in distributed datasets, packaged in an accessible format that allows its integration with other tools within a data analysis environment. The process model suggests sampling and vector quantization to avoid cluttering and overplotting, as well as to improve the efficiency of the system in terms of memory and latency (i.e., times taken to produce visual feedback from the modeling process and from user interactions). The library instantiates one of the possible configurations of the model, using Apache Spark for distributed computations, the Growing Neural Gas for vector quantization, and Force-directed Placement for constructing the two-dimensional layout. Seven research publications provide empirical and theoretical groundings to the validity of both the model and the library.

    Fulltekst (pdf)
    fulltext
  • 7.
    Ventocilla, Elio
    et al.
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    Bae, Juhee
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    Riveiro, Maria
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    Said, Alan
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningscentrum för Informationsteknologi.
    A Billiard Metaphor for Exploring Complex Graphs2017Inngår i: Second Workshop on Supporting Complex Search Tasks / [ed] Marijn Koolen; Jaap Kamps; Toine Bogers; Nick Belkin; Diane Kelly; Emine Yilmaz, CEUR-WS , 2017, Vol. 1798, s. 37-40Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Exploring and revealing relations between the elements is a fre-quent task in exploratory analysis and search. Examples includethat of correlations of attributes in complex data sets, or facetedsearch. Common visual representations for such relations are di-rected graphs or correlation matrices. These types of visual encod-ings are often - if not always - fully constructed before being shownto the user. This can be thought of as a top-down approach, whereusers are presented with a full picture for them to interpret andunderstand. Such a way of presenting data could lead to a visualoverload, specially when it results in complex graphs with highdegrees of nodes and edges. We propose a bottom-up alternativecalled Billiard where few elements are presented at rst and fromwhich a user can interactively construct the rest based on whats/he nds of interest. The concept is based on a billiard metaphorwhere a cue ball (node) has an eect on other elements (associatednodes) when stroke against them.

  • 8.
    Ventocilla, Elio
    et al.
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Helldin, Tove
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Riveiro, Maria
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Bae, Juhee
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Boeva, Veselka
    Blekinge Institute of Technology, Department of Computer Science and Engineering.
    Falkman, Göran
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Lavesson, Niklas
    Jönköping University, School of Engineering.
    Towards a Taxonomy for Interpretable and Interactive Machine Learning2018Inngår i: / [ed] David W. Aha, Trevor Darrell, Patrick Doherty, Daniele Magazzeni, 2018, s. 151-157Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We propose a taxonomy for classifying and describing papers which contribute to making Machine Learning (ML) techniques interactive and interpretable for users. The taxonomy is composed of six elements – Dataset, Optimizer, Model, Predictions, Evaluator and Goodness – where each can bemade available for user interpretation and interaction. We give definitions to the terms interpretable and interactive in the context of useroriented Machine Learning, describe the role of each of the elements in the taxonomy, and describe papers as seen through the lens of the proposed taxonomy.

    Fulltekst (pdf)
    fulltext
  • 9.
    Ventocilla, Elio
    et al.
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Martins, Rafael M.
    Linnaeus University, Department of Computer Science and Media Technology.
    Paulovich, Fernando
    Dalhousie University, Faculty of Computer Science, Canada.
    Riveiro, Maria
    Jönköping University, School of Engineering.
    Progressive Multidimensional Projections: A Process Model based on Vector Quantization2020Inngår i: Machine Learning Methods in Visualisation for Big Data / [ed] Daniel Archambault, Ian Nabney, Jaakko Peltonen, Eurographics - European Association for Computer Graphics, 2020, s. 1-5Konferansepaper (Fagfellevurdert)
    Abstract [en]

    As large datasets become more common, so becomes the necessity for exploratory approaches that allow iterative, trial-and-error analysis. Without such solutions, hypothesis testing and exploratory data analysis may become cumbersome due to long waiting times for feedback from computationally-intensive algorithms. This work presents a process model for progressive multidimensional projections (P-MDPs) that enables early feedback and user involvement in the process, complementing previous work by providing a lower level of abstraction and describing the specific elements that can be used to provide early systemfeed back, and those which can be enabled for user interaction. Additionally, we outline a set of design constraints that must be taken into account to ensure the usability of a solution regarding feedback time, visual cluttering, and the interactivity ofthe view. To address these constraints, we propose the use of incremental vector quantization (iVQ) as a core step within the process. To illustrate the feasibility of the model, and the usefulness of the proposed iVQ-based solution, we present a prototype that demonstrates how the different usability constraints can be accounted for, regardless of the size of a dataset.

    Fulltekst (pdf)
    fulltext
  • 10.
    Ventocilla, Elio
    et al.
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Martins, Rafael M.
    Linnaeus University, Department of Computer Science and Media Technology.
    Paulovich, Fernando
    Dalhousie University, Faculty of Computer Science, Canada.
    Riveiro, Maria
    Jönköping University, School of Engineering.
    Scaling the Growing Neural Gas for Visual Cluster Analysis2021Inngår i: Big Data Research, ISSN 2214-5796, E-ISSN 2214-580X, Vol. 26, artikkel-id 100254Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The growing neural gas (GNG) is an unsupervised topology learning algorithm that models a data space through interconnected units that stand on the populated areas of that space. Its output is a graph that can be visually represented on a two-dimensional plane, and be used as means to disclose cluster patterns in datasets. GNG, however, creates highly connected graphs when trained on high dimensional data, which in turn leads to highly clutter representations that fail to disclose any meaningful patterns. Moreover, its sequential learning limits its potential for faster executions on local datasets, and, more importantly, its potential for training on distributed datasets while leveraging from the computational resources of the infrastructures in which they reside.

    This paper presents two methods that improve GNG for the visualization of cluster patterns in large and high-dimensional datasets. The first one focuses on providing more meaningful and accurate cluster pattern representations of high-dimensional datasets, by avoiding connections that lead to high-dimensional graphs in the modeled topology, which may, in turn, lead to visual cluttering in 2D representations. The second method presented in this paper enables the use of GNG on big and distributed datasets with faster execution times, by modeling and merging separate parts of a dataset using the MapReduce model.

    Quantitative and qualitative evaluations show that the first method leads to the creation of lower-dimensional graph structures, which in turn provide more accurate and meaningful cluster representations; and that the second method preserves the accuracy and meaning of the cluster representations while enabling its execution in distributed settings.

    Fulltekst (pdf)
    fulltext
  • 11.
    Ventocilla, Elio
    et al.
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Riveiro, Maria
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi. School of Engineering, University of Jönköping, Sweden.
    A comparative user study of visualization techniques for cluster analysis of multidimensional data sets2020Inngår i: Information Visualization, ISSN 1473-8716, E-ISSN 1473-8724, Vol. 19, nr 4, s. 318-338Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This article presents an empirical user study that compares eight multidimensional projection techniques for supporting the estimation of the number of clusters, k, embedded in six multidimensional data sets. The selection of the techniques was based on their intended design, or use, for visually encoding data structures, that is, neighborhood relations between data points or groups of data points in a data set. Concretely, we study: the difference between the estimates of k as given by participants when using different multidimensional projections; the accuracy of user estimations with respect to the number of labels in the data sets; the perceived usability of each multidimensional projection; whether user estimates disagree with k values given by a set of cluster quality measures; and whether there is a difference between experienced and novice users in terms of estimates and perceived usability. The results show that: dendrograms (from Ward's hierarchical clustering) are likely to lead to estimates of k that are different from those given with other multidimensional projections, while Star Coordinates and Radial Visualizations are likely to lead to similar estimates; t-Stochastic Neighbor Embedding is likely to lead to estimates which are closer to the number of labels in a data set; cluster quality measures are likely to produce estimates which are different from those given by users using Ward and t-Stochastic Neighbor Embedding; U-Matrices and reachability plots will likely have a low perceived usability; and there is no statistically significant difference between the answers of experienced and novice users. Moreover, as data dimensionality increases, cluster quality measures are likely to produce estimates which are different from those perceived by users using any of the assessed multidimensional projections. It is also apparent that the inherent complexity of a data set, as well as the capability of each visual technique to disclose such complexity, has an influence on the perceived usability.

    Fulltekst (pdf)
    fulltext
  • 12.
    Ventocilla, Elio
    et al.
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Riveiro, Maria
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi. Department of Computer Science and Informatics, School of Engineering, Jönköping University, Jönköping, Sweden.
    A Model for the Progressive Visualization of Multidimensional Data Structure2020Inngår i: Computer Vision, Imaging and Computer Graphics Theory and Applications: 14th International Joint Conference, VISIGRAPP 2019, Prague, Czech Republic, February 25–27, 2019, Revised Selected Papers / [ed] Ana Paula Cláudio; Kadi Bouatouch; Manuela Chessa; Alexis Paljic; Andreas Kerren; Christophe Hurter; Alain Tremeau; Giovanni Maria Farinella, Cham: Springer, 2020, Vol. 1182, s. 203-226Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    This paper presents a model for the progressive visualization and exploration of the structure of large datasets. That is, an abstraction on different components and relations which provide means for constructing a visual representation of a dataset’s structure, with continuous system feedback and enabled user interactions for computational steering, in spite of size. In this context, the structure of a dataset is regarded as the distance or neighborhood relationships among its data points. Size, on the other hand, is defined in terms of the number of data points. To prove the validity of the model, a proof-of-concept was developed as a Visual Analytics library for Apache Zeppelin and Apache Spark. Moreover, nine user studies where carried in order to assess the usability of the library. The results from the user studies show that the library is useful for visualizing and understanding the emerging cluster patterns, for identifying relevant features, and for estimating the number of clusters. 

  • 13.
    Ventocilla, Elio
    et al.
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Riveiro, Maria
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Visual Analytics Solutions as 'off-the-shelf' Libraries2017Inngår i: 2017 21st International Conference Information Visualisation (IV): Computer Graphics, Imaging and Visualisation. Biomedical Visualization, Visualisation on Built and Rural Environments & Geometric Modelling and Imaging, IEETeL2017 / [ed] Ebad Banissi, Mark W. McK. Bannatyne, Fatma Bouali, Nuno Miguel Soares Datia, Georges Grinstein, Dennis Groth, Weidong Huang, Malinka Ivanova, Sarah Kenderdine, Minoru Nakayama, Joao Moura Pires, Muhammad Sarfraz, Marco Temperini, Anna Ursyn, Gilles Venturini, Theodor G. Wyeld, Jian J. Zhang, IEEE Computer Society, 2017, s. 281-287Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Visual Analytics has brought forward many solutions to different tasks such as exploring topics, understanding user and customer behavior, comparing genomes, or detecting anomalies. Many of these solutions, if not most, are standalone applications with technological contributions which cannot be easily taken for: reuse in other domains, further improvement, benchmarking, or integration and deployment alongside other solutions. The latter can prove specially helpful for exploratory data analysis. This often leads researchers to re-implement solutions and thus to a suboptimal use of skills and resources. This paper discusses further the lack of off-the-shelf libraries for Visual Analytics, and proposes the creation of pluggable libraries on top of existing technologies such as Spark and Zeppelin. We provide an illustrative example of a pluggable, Visual Analytics library using these technologies.

  • 14.
    Ventocilla, Elio
    et al.
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Riveiro, Maria
    Högskolan i Skövde, Institutionen för informationsteknologi. Högskolan i Skövde, Forskningsmiljön Informationsteknologi.
    Visual Growing Neural Gas for Exploratory Data Analysis2019Inngår i: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications: Volume 3: IVAPP, 58-71, 2019, Prague, Czech Republic / [ed] Andreas Kerren; Christophe Hurter; Jose Braz, SciTePress, 2019, Vol. 3, s. 58-71Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper argues for the use of a topology learning algorithm, the Growing Neural Gas (GNG), for providing an overview of the structure of large and multidimensional datasets that can be used in exploratory data analysis. We introduce a generic, off-the-shelf library, Visual GNG, developed using the Big Data framework Apache Spark, which provides an incremental visualization of the GNG training process, and enables user-in-the-loop interactions where users can pause, resume or steer the computation by changing optimization parameters. Nine case studies were conducted with domain experts from different areas, each working on unique real-world datasets. The results show that Visual GNG contributes to understanding the distribution of multidimensional data; finding which features are relevant in such distribution; estimating the number of k clusters to be used in traditional clustering algorithms, such as K-means; and finding outliers.

1 - 14 of 14
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf