Högskolan i Skövde

his.sePublications
Change search
Refine search result
1 - 7 of 7
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Tavara, Shirin
    University of Skövde, School of Informatics. University of Skövde, Informatics Research Environment. Data Science and AI division, Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Sweden.
    Distributed and federated learning of support vector machines and applications2022Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Machine Learning (ML) has achieved remarkable success in solving classification, regression, and related problems over the past decade. In particular the exponential growth of digital data, makes using ML inevitable and necessary to exploit the wealth of information hidden inside the data. However, this posits new challenges for traditional serial learning methods concerning scalability as the learning process becomes slow, if it is feasible at all. Distributed and parallel learning are natural approaches for improving the performance of ML algorithms in terms of running time. Support Vector Machines (SVM) are successful and popular supervised machine learning models that enjoy the availability of parallel and distributed computational approaches to tackle the challenges arising from big data. However, efficient parallel and distributed implementation of SVMs is a complex endeavour. To make matters worse, distributed learning of SVMs may impose privacy concerns, particularly with models built from data distributed across multiple nodes where due to confidentiality of data, intellectual property interest, or other reasons, the information cannot be easily shared. Despite the large amounts of prior work, the problems regarding learning SVMs with respect to big data are not fully solved. Therefore in this dissertation, we try to shed light on efficient parallel SVM implementations and on how they can be further improved. This dissertation is based on a collection of publications. The research presented here addresses some problems in parallel and distributed computing of SVMs through four research questions. The key contribution is to provide answers to these research questions through five research articles. For the first research question, we explore available parallel approaches for learning SVMs for large-scale problems. We investigate important factors such as algorithmic approaches, HPC tools, strategies and heuristics used for effective parallel SVM implementations. All of these helped identifying the state-of-the-art parallel SVMs, their pros and cons, and provide suggestions for potential avenues for future studies. We conclude that it is the responsibility of the user to make judicious choices to balance the trade-offs. To address the second research question, we explore the impact of changes in the problem size and the important SVM parameters that lead to significant performance improvements. It turns out that the problem size has impact on the optimal choice of the important SVM parameters. In addition, we show the existence of a threshold on the number of cores after which the training time does not improve. The third research question investigates the effect of network topology on the performance of network-based distributed SVMs in terms of convergence. The three key contributions are to 1) show the effect of the expansion property and the connectivity of the underlying communication network on the convergence of the algorithm, 2) present a preferable network topology, and 3) supply an implementation that makes the theoretical advances practically available. The results I suggest that the graphs with higher node degrees and larger spectral gaps, thus higher connectivity, exhibit accelerated convergence. The fourth research question investigates federated learning of SVMs in which the data privacy is protected under limited and private communication between agents. The key contribution is incorporating differential privacy during the learning procedure. The results show that the learning process 1) respects data privacy, 2) achieves accuracy comparable to the algorithm without considering privacy-preserving, and 3) yields tight empirical guarantees for privacy after convergence. Finally, we combine all the contributions in the dissertation and offer recommendations for implementing an efficient privacy-preserving framework for parallel computing of SVMs for large-scale problems.

    Download full text (pdf)
    fulltext
  • 2.
    Tavara, Shirin
    University of Skövde, School of Informatics. University of Skövde, The Informatics Research Centre.
    High-Performance Computing For Support Vector Machines2018Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    Machine learning algorithms are very successful in solving classification and regression problems, however the immense amount of data created by digitalization slows down the training and predicting processes, if solvable at all. High-Performance Computing(HPC) and particularly parallel computing are promising tools for improving the performance of machine learning algorithms in terms of time. Support Vector Machines(SVM) is one of the most popular supervised machine learning techniques that enjoy the advancement of HPC to overcome the problems regarding big data, however, efficient parallel implementations of SVM is a complex endeavour. While there are many parallel techniques to facilitate the performance of SVM, there is no clear roadmap for every application scenario. This thesis is based on a collection of publications. It addresses the problems regarding parallel implementations of SVM through four research questions, all of which are answered through three research articles. In the first research question, the thesis investigates important factors such as parallel algorithms, HPC tools, and heuristics on the efficiency of parallel SVM implementation. This leads to identifying the state of the art parallel implementations of SVMs, their pros and cons, and suggests possible avenues for future research. It is up to the user to create a balance between the computation time and the classification accuracy. In the second research question, the thesis explores the impact of changes in problem size, and the value of corresponding SVM parameters that lead to significant performance. This leads to addressing the impact of the problem size on the optimal choice of important parameters. Besides, the thesis shows the existence of a threshold between the number of cores and the training time. In the third research question, the thesis investigates the impact of the network topology on the performance of a network-based SVM. This leads to three key contributions. The first contribution is to show how much the expansion property of the network impact the convergence. The next is to show which network topology is preferable to efficiently use the computing powers. Third is to supply an implementation making the theoretical advances practically available. The results show that graphs with large spectral gaps and higher degrees exhibit accelerated convergence. In the last research question, the thesis combines all contributions in the articles and offers recommendations towards implementing an efficient framework for SVMs regarding large-scale problems.

    Download full text (pdf)
    fulltext
  • 3.
    Tavara, Shirin
    University of Skövde, School of Informatics. University of Skövde, The Informatics Research Centre. University of Borås.
    Parallel Computing of Support Vector Machines: A Survey2019In: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341, Vol. 51, no 6, p. 123:1-123:38, article id 123Article, review/survey (Refereed)
    Abstract [en]

    The immense amount of data created by digitalization requires parallel computing for machine-learning methods. While there are many parallel implementations for support vector machines (SVMs), there is no clear suggestion for every application scenario. Many factor—including optimization algorithm, problem size and dimension, kernel function, parallel programming stack, and hardware architecture—impact the efficiency of implementations. It is up to the user to balance trade-offs, particularly between computation time and classification accuracy. In this survey, we review the state-of-the-art implementations of SVMs, their pros and cons, and suggest possible avenues for future research.

  • 4.
    Tavara, Shirin
    et al.
    Information Technology, University of Borås, Borås, Sweden.
    Schliep, Alexander
    Computer Science and Engineering, University of Gothenburg, Gothenburg, Sweden.
    Effect Of Network Topology On The Performance Of ADMM-based SVMs2018In: Proceedings 2018 30th International Symposium on Computer Architecture and High Performance Computing SBAC-PAD 2018: Lyon, France 24-27 September 2018, IEEE Computer Society, 2018, p. 388-393Conference paper (Other academic)
    Abstract [en]

    Alternating Direction Method Of Multipliers(ADMM) is one of the promising frameworks for training Support Vector Machines (SVMs) on large-scale data in adistributed manner. In a consensus-based ADMM, nodes may only communicate with one-hop neighbors and this may cause slow convergence. In this paper, we investigate the impact of network topology on the convergence speed of ADMM-basedSVMs using expander graphs. In particular, we investigate how much the expansion property of the network influence the convergence and which topology is preferable. Besides, we supply an implementation making these theoretical advances practically available. The results of the experiments show that graphs with large spectral gaps and higher degrees exhibit accelerated convergence.

  • 5.
    Tavara, Shirin
    et al.
    Data Science and AI division, Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden.
    Schliep, Alexander
    Data Science and AI division, Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden.
    Effects of network topology on the performance of consensus and distributed learning of SVMs using ADMM2021In: PeerJ Computer Science, E-ISSN 2376-5992, Vol. 7, article id e397Article in journal (Refereed)
    Abstract [en]

    The Alternating Direction Method of Multipliers (ADMM) is a popular and promising distributed framework for solving large-scale machine learning problems. We consider decentralized consensus-based ADMM in which nodes may only communicate with one-hop neighbors. This may cause slow convergence. We investigate the impact of network topology on the performance of an ADMM-based learning of Support Vector Machine using expander, and mean-degree graphs, and additionally some of the common modern network topologies. In particular, we investigate to which degree the expansion property of the network influences the convergence in terms of iterations, training and communication time. We furthermore suggest which topology is preferable. Additionally, we provide an implementation that makes these theoretical advances easily available. The results show that the performance of decentralized ADMM-based learning of SVMs in terms of convergence is improved using graphs with large spectral gaps, higher and homogeneous degrees.

    Download full text (pdf)
    fulltext
  • 6.
    Tavara, Shirin
    et al.
    CSE, University of Gothenburg ; Chalmers University of Technology, Gothenburg, Sweden.
    Schliep, Alexander
    CSE, University of Gothenburg ; Chalmers University of Technology, Gothenburg, Sweden.
    Basu, Debabrota
    Equipe Scool, Inria, UMR 9189-CRIStAL, CNRS, Univ. Lille, Centrale Lille, Lille, France.
    Federated Learning of Oligonucleotide Drug Molecule Thermodynamics with Differentially Private ADMM-Based SVM2022In: Machine Learning and Principles and Practice of Knowledge Discovery in Databases: International Workshops of ECML PKDD 2021, Virtual Event, September 13-17, 2021, Proceedings, Part II / [ed] Michael Kamp; Irena Koprinska; Adrien Bibal; Tassadit Bouadi; Benoît Frénay; Luis Galárraga; José Oramas; Linara Adilova; Yamuna Krishnamurthy; Bo Kang; Christine Largeron; Jefrey Lijffijt; Tiphaine Viard; Pascal Welke; Massimiliano Ruocco; Erlend Aune; Claudio Gallicchio; Gregor Schiele; Franz Pernkopf; Michaela Blott; Holger Fröning; Günther Schindler; Riccardo Guidotti; Anna Monreale; Salvatore Rinzivillo; Przemyslaw Biecek; Eirini Ntoutsi; Mykola Pechenizkiy; Bodo Rosenhahn; Christopher Buckley; Daniela Cialfi; Pablo Lanillos; Maxwell Ramstead; Tim Verbelen; Pedro M. Ferreira; Giuseppina Andresini; Donato Malerba; Ibéria Medeiros; Philippe Fournier-Viger; M. Saqib Nawaz; Sebastian Ventura; Meng Sun; Min Zhou; Valerio Bitetta; Ilaria Bordino; Andrea Ferretti; Francesco Gullo; Giovanni Ponti; Lorenzo Severini; Rita Ribeiro; João Gama; Ricard Gavaldà; Lee Cooper; Naghmeh Ghazaleh; Jonas Richiardi; Damian Roqueiro; Diego Saldana Miranda; Konstantinos Sechidis; Guilherme Graça, Springer Nature Switzerland AG , 2022, Vol. 1, p. 459-467Conference paper (Refereed)
    Abstract [en]

    A crucial step to assure drug safety is predicting off-target binding. For oligonucleotide drugs this requires learning the relevant thermodynamics from often large-scale data distributed across different organisations. This process will respect data privacy if distributed and private learning under limited and private communication between local nodes is used. We propose an ADMM-based SVM with differential privacy for this purpose. We empirically show that this approach achieves accuracy comparable to the non-private one, i.e. ∼86%, while yielding tight empirical privacy guarantees even after convergence. 

  • 7.
    Tavara, Shirin
    et al.
    University of Skövde, School of Informatics. University of Skövde, The Informatics Research Centre. Department of Information Technology, University of Borås, Borås, Sweden.
    Sundell, Håkan
    Department of Information Technology, University of Borås, Borås, Sweden.
    Dahlbom, Anders
    University of Skövde, School of Informatics. University of Skövde, The Informatics Research Centre.
    Empirical Study of Time Efficiency and Accuracy of Support Vector Machines Using an Improved Version of PSVM2015In: Proceedings of the 2015 International Conference on Parallel and Distributed Processing Techniques and Applications: PDPTA 2015: Volume 1 / [ed] Hamid R. Arabnia; Hiroshi Ishii; Kazuki Joe; Hiroaki Nishikawa; Havaru Shouno, Printed in the United States of America: CSREA Press, 2015, Vol. 1, p. 177-183Conference paper (Refereed)
    Abstract [en]

    We present a significantly improved implementation of a parallel SVM algorithm (PSVM) together with a comprehensive experimental study. Support Vector Machines (SVM) is one of the most well-known machine learning classification techniques. PSVM employs the Interior Point Method, which is a solver used for SVM problems that has a high potential of parallelism. We improve PSVM regarding its structure and memory management for contemporary processor architectures. We perform a number of experiments and study the impact of the reduced column size p and other important parameters as C and gamma on the class-prediction accuracy and training time. The experimental results show that there exists a threshold between the number of computational cores and the training time, and that choosing an appropriate value of p effects the choice of the C and gamma parameters as well as the accuracy.

1 - 7 of 7
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf