Gemma-Prune: Compressing Gemma 3 4B Vision-Language Model for Mobile Devices
A seven-stage compression pipeline reduces the Gemma 3 4B vision-language model from 2.8 GB to 2.1 GB, achieving 22% faster text generation, 3.4x faster image processing, and 23% lower peak memory on Apple Silicon.