Log File Analysis with LLMs for Pre-Processing in ML-Based Intrusion Detection Systems

In cybersecurity, log files provide critical insights into system activities, making them essential for detecting anomalies and potential intrusions. However, the vast amount of unstructured data in log files can pose challenges for effective analysis. This BSc thesis explores the use of Large Language Models (LLMs) as a pre-processing step to improve log file analysis in Machine Learning (ML)-based Intrusion Detection Systems (IDS).

Student Target Groups:

  • Students of ICE/Telematics;
  • Students of Computer Science;
  • Students of Software Engineering.

Thesis Type:

  • Bachelor Thesis

Goal and Tasks:

The primary goal of this thesis is to explore how LLMs, known for their ability to handle natural language data, can extract meaningful information from complex, unstructured logs, enabling better feature extraction and data preparation for ML-based IDS models. By streamlining the pre-processing stage, the goal is to enhance the accuracy and efficiency of intrusion detection.

  • Conduct a literature review on LLM applications in log file analysis and ML-based intrusion detection systems;
  • Identify and select appropriate LLM techniques for processing log files;
  • Implement a pre-processing pipeline that uses LLMs to extract relevant features from log data;
  • Integrate the processed log data into an ML-based IDS and evaluate its performance;
  • Summarize the results in a written thesis and deliver an oral presentation.

Recommended Prior Knowledge:

  • Programming skills in Python;
  • Prior experience with deep learning frameworks is desirable (preferably PyTorch);
  • Interest in the topic.

Start:

  • a.s.a.p.

Contact: