Overview

Minimalistic vector art of Knapsack problem with fruit

You're on a trip to the grocery store and as an environmentally conscious individual you've brought your favorite knapsack with you. But this knapsack can only fit so many items! This problem is concerned with a bag or knapsack that has a maximum weight capacity, and a set of items with associated weights and values. The goal is to maximize the value of the items you can fit in the knapsack without exceeding its maximum weight capacity.

The dynamic programming based algorithm for this problem exists with a $\theta(nW)$, where $W$ is the maximum weight capacity and $n$ is the number of items. This is actually a pseudo-polynomial time algorithm, since the running not polynomial in the number of bits to represent the input, but rather in the values of the input.

Even though this optimization knapsack problem is NP-hard, there is a simple approximation algorithm that can be used to find a solution that is within a constant factor 2 of the optimal solution—and it's built on friendly intuition!

An Observation

Let an instance of the problem be as follows, imagining the items are $n=3$ large fruits:

item	weight	value
🍎	25678	10001111
🍌	34939	20010001
🥭	10017	3002003

So we have $\{(w_i, v_i)\} = \{(25678, 10001111), (34939, 20010001), (10017, 3002003)\}$.

Using the regular DP algorithm, the runtime for this instance would be $\theta(nW) = \theta(3 \cdot 34939) = \theta(104817)$.

There's another DP algorithm for knapsack which depends instead on the maximum value $V$ that has runtime $\theta(nV) = \theta(3 \cdot 3002003) = \theta(9006009)$.

These look kind of hard to deal with. This might motivate the thought: if its so troublesome to deal with large values, what if we just scaled them down? Our values would maintain their same relative relationships and we could save on quite of bit of computation. Then our table would look something like this:

item	weight	value
🍎	25678	10001111 ≈ 1 × 10⁷ → 1
🍌	34939	20010001 ≈ 2 × 10⁷ → 2
🥭	10017	3002003 ≈ 3 × 10⁷ → 3

These look more manageable! And indeed they are: the runtime for this instance would be $\theta(nV) = \theta(3 \cdot 3) = \theta(9)$.

The Algorithm

Roughly,


  def approximate_knapsack(W, n, {(w_i, v_i)}):
      # Let B be our scaling factor 
      B = scaling_factor

      # scale all values v_i by B: v_i' = floor(v_i / B)
      for v_i in {(w_i, v_i)}:
          v_i = floor(v_i / B)
      
      # Solve this instance optimally: use the standard DP algorithm
      out = knapsack(W, n, {(w_i, v_i)})

      # Return the items found in the optimal solution
      return out

For the value output by the algorithm $\sum_{i\in S}^{n} v_i$, and letting $\hat{S}$ be the optimal solution for the $(w_i, \hat{v_i})$, and $\hat{v_i}$ be the scaled down values (and normal $S$ be the optimal solution for knapsack), we have that

$$\sum_{i\in \hat{S}}^{n} v_i \\ \geq \left(\sum_{i \in \hat{S}}^{n} \hat{v_i}\right)B \\ \geq \left(\sum_{i \in S}\hat{v_i}\right)B \\ \geq \left(\sum_{i \in S}\frac{v_i}{B}-1\right)B \\ = \left(\sum_{i \in S}v_i\right)-|S|B \\ \geq \left(\sum_{i \in S}v_i\right) - nB$$