Login Paper Search My Schedule Paper Index Help

My ICIP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDIMT-CIF-2.9
Paper Title BEYOND FLOPS IN LOW-RANK COMPRESSION OF NEURAL NETWORKS: OPTIMIZING DEVICE-SPECIFIC INFERENCE RUNTIME
Authors Yerlan Idelbayev, Miguel Á. Carreira-Perpiñán, University of California, Merced, United States
SessionIMT-CIF-2: Computational Imaging 2
LocationArea I
Session Time:Wednesday, 22 September, 14:30 - 16:00
Presentation Time:Wednesday, 22 September, 14:30 - 16:00
Presentation Poster
Topic Computational Imaging Methods and Models: Sparse and Low Rank Models
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Neural network compression has become an important practical step when deploying trained models. We consider the problem of low-rank compression of the neural networks with the goal of optimizing the measured inference time. Given a neural network and a target device to run it, we want to find the matrix ranks and the weight values of the compressed model so that network runs as fast as possible on the device while having best task performance (e.g., classification accuracy). This is a hard optimization problem involving weights, ranks, and device constraints. To tackle this problem, we first implement a simple yet accurate model of the on-device runtime that requires only a few measurements. Then we give a suitable formulation of the optimization problem involving the proposed runtime model and solve it using alternating optimization. We validate our approach on various neural networks and show that by using our estimated runtime model we achieve better task performance compared to FLOPs based methods for the same runtime budget on the actual device.