As vision models like Vision Transformers (ViTs), CLIP, or SAM grow larger, they require significant memory and computation, limiting their deployment in resource-constrained settings. Traditional compression methods such as pruning, quantization, and distillation reduce model size but often compromise accuracy or require retraining. Model Folding is a recent technique that clusters similar neurons to reduce parameters while preserving data statistics—offering a new trade-off between size and performance. While effective on smaller tasks, its impact on large vision models remains unexplored.
This thesis aims to investigate model folding on large-scale vision architectures, evaluating its effectiveness alone or in combination with other compression techniques like quantization.
Interested? Please contact us for more details!
Download as PDF
Student Target Groups:
- Students of ICE;
- Students of Computer Science;
- Students of Software Engineering.
Thesis Type:
- Master Thesis / Master Project
Goals and Tasks:
- Conduct a literature review on model compression for large vision models;
- Select and analyze one or more large-scale vision models (e.g., ViT, CLIP, SAM);
- Implement model folding on selected models;
- Evaluate performance trade-offs in terms of accuracy, size, and compute on public datasets;
- Present your findings in a final presentation and written report.
Requirements:
- Solid knowledge of neural networks and model architectures;
- Programming skills in Python, PyTorch or TensorFlow;
- (Optional) Familiarity with Vision Transformers, CLIP, or SAM models.
Used Tools & Equipment
- A computation cluster of TU Graz
Start:
Contact: