Hide menu

Examensarbeten och uppsatser / Final Theses

Framläggningar på IDA / Presentations at IDA


Se även framläggningar annonserade hos ISY och ITN i Norrköping / See also presentations announced at ISY and ITN in Norrköping (in Swedish)

If nothing is stated about the presentation language then the presentation is in Swedish.


WExUpp - kommande framläggningar
2024-05-16 - ADIT
Analys, design och utvärdering av databasscheman i Azure Data Explorer
Angelica Ferlin, Linn Petersson
Avancerad (30hp)
kl 13:00, Babbage (In English)
[Abstract]
Data warehouses are today used to store large amounts of data. This thesis investigates the impact different database schema designs have on query execution time within the cloud platform Azure Data Explorer. It is a relatively new platform, and limited research exists on how the database schema should be designed in Azure Data Explorer. Further, the design of the database schema has a direct impact on the query execution times. The design should also align with the use case of the data warehouse. This thesis conducts a requirements analysis, determines the use case, and designs three database schemas. The three database schemas are implemented and evaluated through a performance test. Schema 1 is designed utilizing results tables from stored functions, while schema 2 utilizes sub-functions divided by different departments or products aimed to minimize the data accessed per query. Finally, schema 3 uses the results tables from the sub-functions found in schema 2. The conclusion from the performance test shows that schema 3 has the best overall improvement in terms of query execution time compared to the other designs and the original design. The findings emphasize the critical role of database schema design in influencing query performance. Additionally, a conclusion is drawn that using more than one approach to enhance query performance is increasing the potential query performance.
2024-05-17 - AIICS
Automatic De-Identification of Magnetic Resonance Images
Victor Dahlsberg, Adam Sundberg
Avancerad (30hp)
kl 15:00, Alan Turing (In English)
[Abstract]
Magnetic resonance (MR) images of the head need to be de-identified to enable research and education while complying with rules and regulations such as HIPAA and GDPR. This
thesis explores a new approach to de-identifying MR images by utilizing generative machine learning (ML) techniques. The presented solution combines a vector quantized variational autoencoder (VQ-VAE) with a latent diffusion model (LDM) featuring a modified reverse process to enable postconditional inpainting of 3D MR images. The solution takes
two inputs: the image to be de-identified and a binary mask of the regions that should not be modified by the inpainting process. Given these, the VQ-VAE encodes the inputs into a
latent representation, where the modified LDM blends the original image with new sampled data. The output of LDM is then decoded to achieve a new de-identified MR image that looks like an unmodified image but contains synthesized data to hide the identity of the original subject.

Three different defacing tools are used to produce binary masks of different sizes to test the solution. The result shows that the size of the mask has a large impact on how different an inpainted image is compared to the original, and how different multiple inpainted images are from each other. Furthermore, downstream performance is measured by applying skull stripping tools to original, defaced, and inpainted images. Inpainted images are shown to perform as well or slightly better than defaced images in this task. Finally, the
inpainted images are fed to a classifier trained to classify whether an image contains a face or not. The model predicts that the images contain a face more than 93 % of the time.
2024-05-20 - HCS
Damage Assessment on Remote Sensing Imagery with Foundation Models
Gustaf Lindgren
Avancerad (30hp)
kl 08:15, Alan Turing (In English)
[Abstract]
There is currently an ongoing paradigm shift in machine learning; instead of training task-specific models from scratch, foundation models i.e., large pre-trained models are adapted for various downstream tasks. Foundation models excel in zero- and few-shot learning, ideal for domains with limited labeled data, such as disaster assessment on remote sensing imagery (RSI).

This thesis explores how the foundation models CLIP and SAM can be utilized to classify RSI affected by natural disasters and segment intact and damaged infrastructure without extensive retraining. For the scene classifications, various text prompt techniques are tested as well as zero-shot prompting with images. Moreover, few-shot learning methods such as linear probing and prompt learning are explored. For the open vocabulary semantic segmentation task, "pipelines" are implemented that leverage the open vocabulary classification abilities of CLIP and zero-shot image segmentation capabilities of SAM.

This work demonstrates that foundation models can be used effectively for detecting flooding on RSI and there were promising results on other disaster types as well. While handcrafted text prompts yielded the best accuracy, the zero- and few-shot learning methods with images offered a better trade-off between accuracy and consistency. Although the performance of the zero-shot segmentation pipelines was generally poor, they showcased the potential of SAM for accurate segmentations on disaster imagery when being provided with prompts of sufficient quality.
2024-05-20 - AIICS
Forecasting Patient Occupancy in Hospital Wards Using a Supervised Machine Learning Approach
Axel Falk, Philip Folkunger
Avancerad (30hp)
kl 09:15, John von Neumann (In English)
[Abstract]
o The healthcare sector faces challenges in balancing resource allocation and meeting patient demand, especially in the Emergency Department (ED) and other wards. This study explores the potential of supervised machine learning models to predict occupancy rates across different hospital departments using data from a hospital in western Norway from 2020 to 2023. The research combines Fourier analysis, seasonal decomposition (STL), and crosscorrelation techniques to identify cyclical patterns and dependencies within the data. Various supervised machine learning models, including Linear Regression, Random Forests, XGBoost, and neural networks, are evaluated using k-fold cross-validation and performance metrics such as MAPE and MAE. The results reveal distinct daily and weekly patterns in hospital occupancy rates, with notable anomalies during holidays and weekends. The study finds that occupancy rates are consistent over time, as ED, Cardiology Ward (CW), and Total Patients (TP) series are stationary, with stable mean values and variances. Both TP and ED exhibit daily seasonality, while all three series display weekly seasonality. Machine learning models perform differently across wards. The smallest prediction errors using only time features were 5.595 MAE for ED, 1.794 MAE for cardio, and 0.096 MAPE for TP. Cross-correlation analysis revealed strong correlations in daily cycles between ED and TP when lagged in time, suggesting that ED and TP occupancy rates are closely linked, while cardio shows slightly different patterns. The study concludes that simpler models, like Linear Regression, may offer a more efficient and effective approach for hospital occupancy forecasting.
2024-05-22 - HCS
Säkerställande av förarsäkerhet vid interaktion med touchskärmar för arbetsverktyg i bilar
Johanna Lundin
Avancerad (30hp)
kl 10:15, IDA Alan Turing (På svenska)
[Abstract]
The integration of devices within cars is continuously evolving, enabling us to interact with them to an increasingly greater extent. This has transformed the way we drive, communicate and access information on the go. Despite this, there is a lack of research on how to guarantee driver safety while interacting with these systems, especially when looking at in-car systems used in professional settings as a work tool. This master's thesis was conducted in collaboration with NIRA Dynamics and aimed to investigate how the interface of in-car touch screen work tools can be designed to ensure usability and safety for the driver. The study included development of a prototype in the form of a new touch screen interface for a data acquisition system which was used by NIRAs test drivers for the purpose of testing their products. The prototype design was developed iteratively based on the test drivers' opinions as well as theory about important design aspects related to designing in-vehicle systems for high safety and usability. The resulting prototype was evaluated using the System Usability Scale in order to compare the prototype to the original system design and asses to what extent the new interface contributed to increased safety for the driver. The study revealed that some of the main issues that needed to be taken into account in the prototype design was prioritization of information, placement and gathering of elements and reduced amount clicks and scrolling. The final usability evaluation was conducted through user tests and the results indicated that the usability of the prototype was higher than for the original system design, thereby indicating an increased safety for the driver. Overall, this thesis contributes to the research of mitigating the risks of drivers related to interaction with in-car software systems.
2024-05-23 - SaS-UND
Energy consumption of video streaming – A literature review and a model
John Lindström
Avancerad (30hp)
kl 15:00, Alan Turing (In English)
[Abstract]
Energy consumption and correlated greenhouse gas emissions are a big global problem.
It affects all parts of society, and each industrial sector must work toward reducing its
carbon footprint. This thesis details the research of different methods to model the energy
consumption of video streaming, and works towards creating a final model. The video
streaming process is broken down into a core process consisting of head-end, distribution
and transmission, and terminals. The process that contributes the most to energy consumption
at the head-end is found to be video encoding. This thesis explores video encoding in
depth and how it is affected by parameters such as hardware, codec choice, codec preset
selection, and video details such as resolution, framerate, and duration, but these parameters
are found to be insufficient to model the energy consumption of video encoding. In
distribution and transmission, the highest contributor is found to be content delivery networks.
The energy consumption of content delivery networks is investigated however no
appropriate model is found. For terminals, the most important factor is the kind of terminal
used. The energy consumption of televisions, desktop computers, laptops, and mobile
terminals is investigated, and models are presented for each. The thesis also discusses the
different models, their advantages, and their shortcomings. Additionally, an application to
visualize features of the model is created and presented.
2024-06-04 - ADIT
Evaluation of Unsupervised Anomaly Detection in Structured API Logs
Gabriel Hult
Avancerad (30hp)
kl 13:15, Charles Babbage (In English)
[Abstract]
With large quantities of API logs being stored, it becomes difficult to manually inspect them and determine whether the requests are benign or anomalies, indicating incorrect access to an application or perhaps actions with malicious intent. Today, companies can rely on third-party penetration testers who occasionally attempt various techniques to find vulnerabilities in software applications. However, to be a self-sustainable company, implementing a system capable of detecting abnormal traffic, potentially maliciously, would be beneficial. By doing so, attacks can be proactively prevented, mitigating risks faster than waiting for third parties to detect these issues. A potential solution to this is applying machine learning, specifically anomaly detection, which is detecting patterns not behaving to the normal standard. This thesis covers the process of having structured log data to find anomalies in the log data. Various unsupervised anomaly detection models were evaluated on their capabilities of detecting anomalies in API logs. These models were K-means, Gaussian Mixture Model, Isolation Forest and One-Class Support Vector Machine. The findings from the evaluation show that the best baseline model without tuning can reach a precision of 63%, a recall of 72%, resulting in an F1-score of 0.67, an AUC score of 0.76 and an accuracy of 0.71. By tuning the models, the best model could reach a precision of 67% and a recall of 80%, resulting in an F1-score of 0.73, an AUC score of 0.83 and an accuracy of 0.75. The pros and cons of each model are presented and discussed along with insights related to anomaly detection and its applicability in API log analysis and API security.
2024-06-05 - ADIT
How does the use of Autonomous Penetration Testing Strengthen The Continuous Integration Flow?
Jonatan Eshak
Avancerad (30hp)
kl 13:15, Herbert Simon (In English)
[Abstract]
The thesis introduces the problems developers face when creating, optimizing, and testing their systems. The focus of this thesis is the testing of a web application with the use of autonomous penetration testing integrated into a GitLab CI/CD pipeline. The thesis wants to answer whether the use of open API, a specification made to ease the documentation of API in a system, creates an environment where one can save time on integration and increase efficiency through knowledge of performance from specific endpoints. The thesis wants to measure further the applications where autonomous penetration testing with open API could be preferred. The thesis also measures the different use cases of using autonomous black box testing against a white box to answer when one can be preferred over the other and when it is helpful to have both. The thesis goes through the theory of penetration testing and how it is conducted, what common strategies, and what attacking methods are standard. The thesis also goes into the theory of autonomous PT against manual PT, a theory of web application, and API to describe open API and what Swagger is as a tool. The thesis also goes into the theory of continuous integration flow, its design, and how a developer builds one from scratch. The thesis also brings up five significant articles related to the work, such as an article that discusses the problems faced when designing a black box vulnerability scanner on web servers. This article discusses the implementations of continuous integration on automatic performance testing and one for automating security scanning. The thesis also brings up an article on introducing continuous fuzzing to study the scalability of fuzzing in CI/CD pipelines. These related works enhance the purpose and method of this thesis and its goal to measure autonomous penetration testing in a CI pipeline. The method to answer the research question is to build a website to serve as a target with open API and integrate the website into the GitLabs CI pipeline along with vulnerability scanning tools configured to perform black box, grey box, and white box testing. The results show that while the black box is a lot more thorough through having to search and test every different point of discovery, it does so at the cost of time. The grey box shows similar results as the black box, although it only focuses on finding vulnerabilities from API endpoints. Using the white box, the results showed more critical vulnerabilities mainly focused on packages installed and stored in an environment directory. The vulnerabilities also differentiate from the black box and grey box, showing a need to use both scans to discover as many unique vulnerabilities as possible.
2024-06-18 - ADIT
Run-Time Optimization of ElasticSearch for Medical Documents
Ludvig Bolin, Emil Carlsson
Avancerad (30hp)
kl 13:15, Babbage (In English)
[Abstract]
ElasticSearch is a database management system used to index and search documents, and as with all database management systems, performance is important. The aim of this thesis is to investigate whether the configuration of an ElasticSearch system can be tuned to improve either index- or search performance using different optimization algorithms. With that goal in mind, this thesis has evaluated three different optimization algorithms as a means to generate performance-improving ElasticSearch configurations. Two local algorithms, Simulated Annealing and Simultaneous Perturbation Stochastic Approximation, and one global algorithm, Genetic Algorithm.
The benchmarking tool ESRally is used as an objective function for the local algorithms. Since the global algorithm requires near-instant evaluation, two machine-learning models are trained to predict configuration performance in said benchmarks instead. The machine learning models, Random Forest, and Regression-Enhanced Random Forest, performed with similar accuracy. Both models could predict the performance of a configuration measuring index performance well but could not predict the search performance to the same extent.
The configuration generated by the various optimization algorithms is then evaluated in a simulation replaying four hours of real traffic from an ElasticSearch instance used in a hospital for medical data indexing and searching. Unfortunately most configurations generated by the various algorithms failed to improve search performance. On the other hand all the algorithms succeeded in generating configurations that outperform the default configuration in the simulation regarding indexing performance, with Simultaneous Perturbation Stochastic Approximation producing the best performance configuration.


Page responsible: Final Thesis Coordinator
Last updated: 2022-06-03