his.sePublications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
High-Performance Computing For Support Vector Machines
University of Skövde, School of Informatics. University of Skövde, The Informatics Research Centre. (Skövde Artificial Intelligence Lab (SAIL))
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Machine learning algorithms are very successful in solving classification and regression problems, however the immense amount of data created by digitalization slows down the training and predicting processes, if solvable at all. High-Performance Computing(HPC) and particularly parallel computing are promising tools for improving the performance of machine learning algorithms in terms of time. Support Vector Machines(SVM) is one of the most popular supervised machine learning techniques that enjoy the advancement of HPC to overcome the problems regarding big data, however, efficient parallel implementations of SVM is a complex endeavour. While there are many parallel techniques to facilitate the performance of SVM, there is no clear roadmap for every application scenario. This thesis is based on a collection of publications. It addresses the problems regarding parallel implementations of SVM through four research questions, all of which are answered through three research articles. In the first research question, the thesis investigates important factors such as parallel algorithms, HPC tools, and heuristics on the efficiency of parallel SVM implementation. This leads to identifying the state of the art parallel implementations of SVMs, their pros and cons, and suggests possible avenues for future research. It is up to the user to create a balance between the computation time and the classification accuracy. In the second research question, the thesis explores the impact of changes in problem size, and the value of corresponding SVM parameters that lead to significant performance. This leads to addressing the impact of the problem size on the optimal choice of important parameters. Besides, the thesis shows the existence of a threshold between the number of cores and the training time. In the third research question, the thesis investigates the impact of the network topology on the performance of a network-based SVM. This leads to three key contributions. The first contribution is to show how much the expansion property of the network impact the convergence. The next is to show which network topology is preferable to efficiently use the computing powers. Third is to supply an implementation making the theoretical advances practically available. The results show that graphs with large spectral gaps and higher degrees exhibit accelerated convergence. In the last research question, the thesis combines all contributions in the articles and offers recommendations towards implementing an efficient framework for SVMs regarding large-scale problems.

Place, publisher, year, edition, pages
Skövde: University of Skövde , 2018. , p. 115
Series
Dissertation Series ; 26 (2018)
National Category
Computer Sciences
Research subject
Skövde Artificial Intelligence Lab (SAIL); INF301 Data Science
Identifiers
URN: urn:nbn:se:his:diva-16556ISBN: 978-91-984187-8-1 (print)OAI: oai:DiVA.org:his-16556DiVA, id: diva2:1278490
Presentation
2019-01-28, G207, Skövde, 13:15 (English)
Opponent
Supervisors
Available from: 2019-01-22 Created: 2019-01-14 Last updated: 2019-02-14Bibliographically approved
List of papers
1. Parallel Computing of Support Vector Machines: A Survey
Open this publication in new window or tab >>Parallel Computing of Support Vector Machines: A Survey
2019 (English)In: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341, Vol. 51, no 6, p. 123:1-123:38, article id 123Article, review/survey (Refereed) Published
Abstract [en]

The immense amount of data created by digitalization requires parallel computing for machine-learning methods. While there are many parallel implementations for support vector machines (SVMs), there is no clear suggestion for every application scenario. Many factor—including optimization algorithm, problem size and dimension, kernel function, parallel programming stack, and hardware architecture—impact the efficiency of implementations. It is up to the user to balance trade-offs, particularly between computation time and classification accuracy. In this survey, we review the state-of-the-art implementations of SVMs, their pros and cons, and suggest possible avenues for future research.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2019
Keywords
Dual optimization, primal optimization, decomposition, CPU parallelism, GPU parallelism, speedup, data movement
National Category
Engineering and Technology Computer Sciences Computer and Information Sciences Information Systems
Research subject
INF301 Data Science; Skövde Artificial Intelligence Lab (SAIL)
Identifiers
urn:nbn:se:his:diva-16554 (URN)10.1145/3280989 (DOI)000460376100014 ()2-s2.0-85061196907 (Scopus ID)
Funder
Knowledge Foundation
Available from: 2019-01-14 Created: 2019-01-14 Last updated: 2019-07-10Bibliographically approved
2. Empirical Study of Time Efficiency and Accuracy of Support Vector Machines Using an Improved Version of PSVM
Open this publication in new window or tab >>Empirical Study of Time Efficiency and Accuracy of Support Vector Machines Using an Improved Version of PSVM
2015 (English)In: Proceedings of the 2015 International Conference on Parallel and Distributed Processing Techniques and Applications: PDPTA 2015: Volume 1 / [ed] Hamid R. Arabnia, Hiroshi Ishii, Kazuki Joe, Hiroaki Nishikawa, Havaru Shouno, Printed in the United States of America: CSREA Press, 2015, Vol. 1, p. 177-183Conference paper, Published paper (Refereed)
Abstract [en]

We present a significantly improved implementation of a parallel SVM algorithm (PSVM) together with a comprehensive experimental study. Support Vector Machines (SVM) is one of the most well-known machine learning classification techniques. PSVM employs the Interior Point Method, which is a solver used for SVM problems that has a high potential of parallelism. We improve PSVM regarding its structure and memory management for contemporary processor architectures. We perform a number of experiments and study the impact of the reduced column size p and other important parameters as C and gamma on the class-prediction accuracy and training time. The experimental results show that there exists a threshold between the number of computational cores and the training time, and that choosing an appropriate value of p effects the choice of the C and gamma parameters as well as the accuracy.

Place, publisher, year, edition, pages
Printed in the United States of America: CSREA Press, 2015
Keywords
parallel svm, processor technology, training time
National Category
Computer Sciences
Research subject
Technology; Skövde Artificial Intelligence Lab (SAIL)
Identifiers
urn:nbn:se:his:diva-11644 (URN)1-60132-400-6 (ISBN)1-60132-401-4 (ISBN)1-60132-402-2 (ISBN)
Conference
PDPTA'15 - The 21st International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, July 27-30, 2015
Available from: 2015-10-30 Created: 2015-10-30 Last updated: 2019-01-14Bibliographically approved
3. Effect Of Network Topology On The Performance Of ADMM-based SVMs
Open this publication in new window or tab >>Effect Of Network Topology On The Performance Of ADMM-based SVMs
2018 (English)In: Proceedings 2018 30th International Symposium on Computer Architecture and High Performance Computing SBAC-PAD 2018: Lyon, France 24-27 September 2018, IEEE Computer Society, 2018, p. 388-393Conference paper, Published paper (Other academic)
Abstract [en]

Alternating Direction Method Of Multipliers(ADMM) is one of the promising frameworks for training Support Vector Machines (SVMs) on large-scale data in adistributed manner. In a consensus-based ADMM, nodes may only communicate with one-hop neighbors and this may cause slow convergence. In this paper, we investigate the impact of network topology on the convergence speed of ADMM-basedSVMs using expander graphs. In particular, we investigate how much the expansion property of the network influence the convergence and which topology is preferable. Besides, we supply an implementation making these theoretical advances practically available. The results of the experiments show that graphs with large spectral gaps and higher degrees exhibit accelerated convergence.

Place, publisher, year, edition, pages
IEEE Computer Society, 2018
Series
International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), ISSN 1550-6533
Keywords
ADMM, SVMs, Expander Graphs, Distributed Optimization, Convergence
National Category
Engineering and Technology Computer and Information Sciences Information Systems Computer Sciences
Identifiers
urn:nbn:se:his:diva-16555 (URN)10.1109/CAHPC.2018.8645857 (DOI)2-s2.0-85063145524 (Scopus ID)978-1-5386-7769-8 (ISBN)978-1-5386-7770-4 (ISBN)
Conference
High Performance Machine Learning (HPML2018) workshop, September 24, 2018, Lyon, France
Available from: 2019-01-14 Created: 2019-01-14 Last updated: 2019-04-10Bibliographically approved

Open Access in DiVA

fulltext(4329 kB)59 downloads
File information
File name FULLTEXT01.pdfFile size 4329 kBChecksum SHA-512
8739dbdbcbc667e0ffc181489f89894be44cc81be7c48b8bc635327075114af746741aa715db31ee3f2501bfe2aee50b2fd2fe6b7a58c09213d1bc47942bee08
Type fulltextMimetype application/pdf

Authority records BETA

Tavara, Shirin

Search in DiVA

By author/editor
Tavara, Shirin
By organisation
School of InformaticsThe Informatics Research Centre
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 59 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 209 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf