Högskolan i Skövde

his.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Empirical Study of Time Efficiency and Accuracy of Support Vector Machines Using an Improved Version of PSVM
University of Skövde, School of Informatics. University of Skövde, The Informatics Research Centre. Department of Information Technology, University of Borås, Borås, Sweden. (Skövde Artificial Intelligence Lab (SAIL))ORCID iD: 0000-0003-0669-9978
Department of Information Technology, University of Borås, Borås, Sweden.
University of Skövde, School of Informatics. University of Skövde, The Informatics Research Centre. (Skövde Artificial Intelligence Lab (SAIL))
2015 (English)In: Proceedings of the 2015 International Conference on Parallel and Distributed Processing Techniques and Applications: PDPTA 2015: Volume 1 / [ed] Hamid R. Arabnia; Hiroshi Ishii; Kazuki Joe; Hiroaki Nishikawa; Havaru Shouno, Printed in the United States of America: CSREA Press, 2015, Vol. 1, p. 177-183Conference paper, Published paper (Refereed)
Abstract [en]

We present a significantly improved implementation of a parallel SVM algorithm (PSVM) together with a comprehensive experimental study. Support Vector Machines (SVM) is one of the most well-known machine learning classification techniques. PSVM employs the Interior Point Method, which is a solver used for SVM problems that has a high potential of parallelism. We improve PSVM regarding its structure and memory management for contemporary processor architectures. We perform a number of experiments and study the impact of the reduced column size p and other important parameters as C and gamma on the class-prediction accuracy and training time. The experimental results show that there exists a threshold between the number of computational cores and the training time, and that choosing an appropriate value of p effects the choice of the C and gamma parameters as well as the accuracy.

Place, publisher, year, edition, pages
Printed in the United States of America: CSREA Press, 2015. Vol. 1, p. 177-183
Keywords [en]
parallel svm, processor technology, training time
National Category
Computer Sciences
Research subject
Technology; Skövde Artificial Intelligence Lab (SAIL)
Identifiers
URN: urn:nbn:se:his:diva-11644ISBN: 1-60132-400-6 (print)ISBN: 1-60132-401-4 (print)ISBN: 1-60132-402-2 (print)OAI: oai:DiVA.org:his-11644DiVA, id: diva2:866010
Conference
PDPTA'15 - The 21st International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, July 27-30, 2015
Available from: 2015-10-30 Created: 2015-10-30 Last updated: 2023-02-01Bibliographically approved
In thesis
1. High-Performance Computing For Support Vector Machines
Open this publication in new window or tab >>High-Performance Computing For Support Vector Machines
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Machine learning algorithms are very successful in solving classification and regression problems, however the immense amount of data created by digitalization slows down the training and predicting processes, if solvable at all. High-Performance Computing(HPC) and particularly parallel computing are promising tools for improving the performance of machine learning algorithms in terms of time. Support Vector Machines(SVM) is one of the most popular supervised machine learning techniques that enjoy the advancement of HPC to overcome the problems regarding big data, however, efficient parallel implementations of SVM is a complex endeavour. While there are many parallel techniques to facilitate the performance of SVM, there is no clear roadmap for every application scenario. This thesis is based on a collection of publications. It addresses the problems regarding parallel implementations of SVM through four research questions, all of which are answered through three research articles. In the first research question, the thesis investigates important factors such as parallel algorithms, HPC tools, and heuristics on the efficiency of parallel SVM implementation. This leads to identifying the state of the art parallel implementations of SVMs, their pros and cons, and suggests possible avenues for future research. It is up to the user to create a balance between the computation time and the classification accuracy. In the second research question, the thesis explores the impact of changes in problem size, and the value of corresponding SVM parameters that lead to significant performance. This leads to addressing the impact of the problem size on the optimal choice of important parameters. Besides, the thesis shows the existence of a threshold between the number of cores and the training time. In the third research question, the thesis investigates the impact of the network topology on the performance of a network-based SVM. This leads to three key contributions. The first contribution is to show how much the expansion property of the network impact the convergence. The next is to show which network topology is preferable to efficiently use the computing powers. Third is to supply an implementation making the theoretical advances practically available. The results show that graphs with large spectral gaps and higher degrees exhibit accelerated convergence. In the last research question, the thesis combines all contributions in the articles and offers recommendations towards implementing an efficient framework for SVMs regarding large-scale problems.

Place, publisher, year, edition, pages
Skövde: University of Skövde, 2018. p. 115
Series
Dissertation Series ; 26 (2018)
National Category
Computer Sciences
Research subject
Skövde Artificial Intelligence Lab (SAIL); INF301 Data Science
Identifiers
urn:nbn:se:his:diva-16556 (URN)978-91-984187-8-1 (ISBN)
Presentation
2019-01-28, G207, Skövde, 13:15 (English)
Opponent
Supervisors
Available from: 2019-01-22 Created: 2019-01-14 Last updated: 2023-01-02Bibliographically approved
2. Distributed and federated learning of support vector machines and applications
Open this publication in new window or tab >>Distributed and federated learning of support vector machines and applications
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Machine Learning (ML) has achieved remarkable success in solving classification, regression, and related problems over the past decade. In particular the exponential growth of digital data, makes using ML inevitable and necessary to exploit the wealth of information hidden inside the data. However, this posits new challenges for traditional serial learning methods concerning scalability as the learning process becomes slow, if it is feasible at all. Distributed and parallel learning are natural approaches for improving the performance of ML algorithms in terms of running time. Support Vector Machines (SVM) are successful and popular supervised machine learning models that enjoy the availability of parallel and distributed computational approaches to tackle the challenges arising from big data. However, efficient parallel and distributed implementation of SVMs is a complex endeavour. To make matters worse, distributed learning of SVMs may impose privacy concerns, particularly with models built from data distributed across multiple nodes where due to confidentiality of data, intellectual property interest, or other reasons, the information cannot be easily shared. Despite the large amounts of prior work, the problems regarding learning SVMs with respect to big data are not fully solved. Therefore in this dissertation, we try to shed light on efficient parallel SVM implementations and on how they can be further improved. This dissertation is based on a collection of publications. The research presented here addresses some problems in parallel and distributed computing of SVMs through four research questions. The key contribution is to provide answers to these research questions through five research articles. For the first research question, we explore available parallel approaches for learning SVMs for large-scale problems. We investigate important factors such as algorithmic approaches, HPC tools, strategies and heuristics used for effective parallel SVM implementations. All of these helped identifying the state-of-the-art parallel SVMs, their pros and cons, and provide suggestions for potential avenues for future studies. We conclude that it is the responsibility of the user to make judicious choices to balance the trade-offs. To address the second research question, we explore the impact of changes in the problem size and the important SVM parameters that lead to significant performance improvements. It turns out that the problem size has impact on the optimal choice of the important SVM parameters. In addition, we show the existence of a threshold on the number of cores after which the training time does not improve. The third research question investigates the effect of network topology on the performance of network-based distributed SVMs in terms of convergence. The three key contributions are to 1) show the effect of the expansion property and the connectivity of the underlying communication network on the convergence of the algorithm, 2) present a preferable network topology, and 3) supply an implementation that makes the theoretical advances practically available. The results I suggest that the graphs with higher node degrees and larger spectral gaps, thus higher connectivity, exhibit accelerated convergence. The fourth research question investigates federated learning of SVMs in which the data privacy is protected under limited and private communication between agents. The key contribution is incorporating differential privacy during the learning procedure. The results show that the learning process 1) respects data privacy, 2) achieves accuracy comparable to the algorithm without considering privacy-preserving, and 3) yields tight empirical guarantees for privacy after convergence. Finally, we combine all the contributions in the dissertation and offer recommendations for implementing an efficient privacy-preserving framework for parallel computing of SVMs for large-scale problems.

Abstract [sv]

Machine Learning (ML) har haft en anmärkningsvärd framgång bland annat när det gäller att lösa klassificerings- och regressionsproblem under det senaste decenniet. Särskilt när det gäller den exponentiella tillväxten av digital data, är det oundvikligt och nödvändigt att använda ML för att utnyttja den mängd information som är gömd i datan. Detta innebär dock nya utmaningar för traditionella seriella inlärningsmetoder när det gäller skalbarhet eftersom inlärningsprocessen blir långsam, om alls möjligt. Distribuerat och parallellt lärande är lovande metoder för att för[1]bättra prestandan hos ML-algoritmer när det gäller körtid. Support Vector Machines (SVM) är framgångsrika och populära modeller för övervakad maskininlärning som åtnjuter tillgången till parallell och distribuerad datoranvändning för att hantera de utmaningar som uppstår från big data. Hur som helst är effektiv parallell och distribuerad implementering av SVM ett komplext problem. För att göra saken värre kan distribuerad inlärning av SVM skapa integritetsproblem, särskilt med modeller byggda från data som distribueras över flera noder där informationen inte enkelt kan delas på grund av datakonfidentialitet, immateriella intressen eller andra skäl. Trots de stora mängderna tidigare arbete är problemen med att lära SVM:er med avseende på big data inte helt lösta, därför försöker vi i denna avhandling belysa effektiva parallella SVM-implementeringar och hur de kan förbättras ytterligare. Denna avhandling bygger på en samling publikationer. Forskningen som presenteras i denna avhandling tar upp några problem parallellt med och distribuerad beräkning av SVM:er genom fem forskningsfrågor. Det viktigaste bidraget för denna avhandling är att ge svar på avhandlingarnas forskningsfrågor genom fem forskningsartiklar. För den första forskningsfrågan utforskar vi tillgängliga parallella tillvägagångssätt för att lära SVM:er för storskaliga problem. Vi undersöker viktiga faktorer som algoritmiska tillvägagångssätt, HPC-verktyg, strategier och heuristik som används för effektiva parallella SVM-implementeringar. Alla dessa hjälpte till att identifiera de senaste parallella SVM:erna, deras för- och nackdelar, och ge förslag på potentiella vägar för framtida studier. Vi drar sen slutsatsen att det är användarens ansvar att göra kloka val för att balansera avvägningarna. För att ta itu med den andra forskningsfrågan utforskar vi effekterna av förändringar i problemstorleken och de viktiga SVM-parametrar som leder till betydande prestanda. Det visar sig att problemets storlek har inverkan på det optimala valet av viktiga SVM-parametrar. Dessutom visar vi att det finns en tröskel mellan träningstiden och antalet kärnor. Den tredje forskningsfrågan undersöker effekten av nätverkstopologi på prestandan hos nätverksbaserade distribuerade SVM när det gäller konvergens. De tre nyckelbidragen är att 1) visa effekten av expansionsegenskapen och det underliggande nätverkets uppkoppling på algoritmens konvergens, 2) presentera en föredragen nätverkstopologi och 3) tillhandahålla en implementering som gör de teoretiska framstegen praktiskt tillgängliga. Resultaten tyder på att graferna med högre grader och större spektralgap, alltså högre anslutningsmöjligheter, uppvisar accelererad konvergens. Den fjärde forskningsfrågan undersöker federerad inlärning av SVM:er där dataintegriteten skyddas under begränsad och privat kommunikation mellan agenter. Det viktigaste bidraget är att införliva differentierad integritet under inlärningsproceduren. Resultaten visar att inlärningsprocessen 1) respekterar dataintegritet, 2) uppnår en noggrannhet jämförbar med den icke-privata algoritmen och 3) ger snäva empiriska garantier för integritet efter konvergens. Slutligen löser man den sista forskningsfrågan genom att kombinera alla bidrag i avhandlingen och ger rekommendationer för att implementera ett effektivt ramverk för parallell beräkning av SVM för storskaliga problem.

Place, publisher, year, edition, pages
Skövde: University of Skövde, 2022. p. xv, 180
Series
Dissertation Series ; 44
National Category
Computer Sciences Computer Systems Other Computer and Information Science
Research subject
Skövde Artificial Intelligence Lab (SAIL)
Identifiers
urn:nbn:se:his:diva-21776 (URN)978-91-984919-8-2 (ISBN)
Public defence
2022-09-30, G110, Högskolan i Skövde, Skövde, 14:00 (English)
Opponent
Supervisors
Available from: 2022-09-08 Created: 2022-09-08 Last updated: 2022-09-08Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Länk till fulltext

Authority records

Tavara, ShirinDahlbom, Anders

Search in DiVA

By author/editor
Tavara, ShirinDahlbom, Anders
By organisation
School of InformaticsThe Informatics Research Centre
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 854 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-cv
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf