Recent processors have instructions for multiplication and division of positive integers below 264. However, division is substantially slower than multiplication. You are given a large set S of integers between 0 and 264, all multiples of n, and you want to divide them all by n. How would you do this, using only multiplication and simple bit operations?
We can write n = 2a · b, with b an odd integer. Since gcd(b, 264) = 1, we can find c between 0 and 264 with b · c ≡ 1 (mod 264). For each number in S, we divide it by 2a (which is trivial in base 2) and we multiply it by c. Hence, if the number was previously k · n = k · 2a · b, it is now k · b · c ≡ k (mod 264). Since k < 264, we have correctly performed the division.
This requires one modular inversion in total (which can be neglected in the total computation time), and per number it only requires one multiplication and a trivial operation (removing a zeros), which is a lot cheaper than division.
This problem is a nice example where the theory of modular arithmetic actually yields direct computational improvement in problems where it would seem unrelated at first, and without the need for deep theory or advanced preliminaries.