At Ramp, our customers upload two types of receipt images.

Photos:

And screenshots:

I was bestowed with the privilege of writing an algorithm that distinguishes between them.

However, there was one major limitation. I could not use any libraries other than Pillow. No OpenCV or Tesseract or classification models, just straight rawdogging it.

After a few failed approaches1, I came up with a solution that uses color entropy:

import math
from PIL import Image
from typing import Literal

def is_image_photo_or_screenshot(
    img_path: str
) -> Literal["photo", "screenshot"]:
    with Image.open(img_path) as img:
        if img.mode != "RGB":
            img = img.convert("RGB")
        img.thumbnail(
            (100, 100),
            resample=Image.Resampling.NEAREST,
        )

        total_pixels = img.width * img.height
        colors = img.getcolors(maxcolors=total_pixels) or []

        entropy = 0.0
        for count, _ in colors:
            p = count / total_pixels
            entropy -= p * math.log(p, 2)

        # Normalize to 0.0-1.0 range
        normalized_entropy = entropy / math.log(total_pixels, 2)

        if normalized_entropy > 0.5:
            return "photo"
        return "screenshot"

The intuition behind this algorithm is simple. A photo has a large amount of unique colors. Even the blank area of a physical receipt in a photo will have dozens of slightly different whites. The opposite is true of a screenshot -- most of the pixels neatly belong to a handful of colors. This algorithm just measures the distribution of colors.

I ran a test on a small dataset of receipts I prepared:

Looking good! We only missed 6 screenshots2 out of almost 3000 images, a 99.8% overall accuracy. It's also quite speedy, due to the trick of downsizing the image to 100x100 pixels3.

I quickly shipped this algorithm, and it now runs on the hundreds of millions of receipts Ramp processes a year.

If this sounded interesting to you, Ramp is hiring!