DEEN

Model folding on large vision models

As vision models like Vision Transformers (ViTs), CLIP, or SAM grow larger, they require significant memory and computation, limiting their deployment in resource-constrained settings. Traditional compression methods such as pruning, quantization, and distillation reduce model size but often compromise accuracy or require retraining. Model Folding is a recent technique that clusters similar neurons to reduce parameters while preserving data statistics—offering a new trade-off between size and performance. While effective on smaller tasks, its impact on large vision models remains unexplored.
This thesis aims to investigate model folding on large-scale vision architectures, evaluating its effectiveness alone or in combination with other compression techniques like quantization.
Interested? Please contact us for more details!

Download as PDF

Student Target Groups:

Students of ICE;
Students of Computer Science;
Students of Software Engineering.

Thesis Type:

Master Thesis / Master Project

Goals and Tasks:

Conduct a literature review on model compression for large vision models;
Select and analyze one or more large-scale vision models (e.g., ViT, CLIP, SAM);
Implement model folding on selected models;
Evaluate performance trade-offs in terms of accuracy, size, and compute on public datasets;
Present your findings in a final presentation and written report.

Requirements:

Solid knowledge of neural networks and model architectures;
Programming skills in Python, PyTorch or TensorFlow;
(Optional) Familiarity with Vision Transformers, CLIP, or SAM models.

Used Tools & Equipment

A computation cluster of TU Graz

Start:

a.s.a.p.

Contact:

Dong Wang (dong.wangnoSpam@tugraz.at)
Assoc. Prof. Dr. Olga Saukh (saukhnoSpam@tugraz.at)