← All articles
PRODUCTIVITY Paperless-ngx: Self-Hosted Document Management for Y... 2026-02-09 · 4 min read · paperless-ngx · documents · ocr

Paperless-ngx: Self-Hosted Document Management for Your Home Lab

Productivity 2026-02-09 · 4 min read paperless-ngx documents ocr self-hosted docker organization

Every household accumulates paper. Tax documents, medical records, insurance policies, receipts, warranties, letters — all of it piling up in folders, drawers, or (if you're honest) random stacks on your desk. Paperless-ngx is a self-hosted document management system that turns all of that into a searchable, tagged, organized digital archive.

The workflow is simple: scan or photograph a document, drop it into Paperless-ngx's consumption directory, and the system automatically OCRs it, extracts the text, and files it. You can then search the full text of every document you've ever scanned. Need that receipt from three years ago? Search for the vendor name and it appears in seconds.

Paperless-ngx logo

Why Paperless-ngx

Paperless-ngx is a community fork of the original Paperless project that adds significant improvements. It's the most actively maintained version and the one you should use.

Key features:

Deployment

The recommended setup uses Docker Compose with Redis for task management and PostgreSQL for the database (SQLite works too, but PostgreSQL handles larger collections better):

services:
  paperless-redis:
    image: redis:7
    container_name: paperless-redis
    restart: unless-stopped
    volumes:
      - ./redis:/data

  paperless-db:
    image: postgres:16
    container_name: paperless-db
    restart: unless-stopped
    volumes:
      - ./pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: changeme

  paperless:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    container_name: paperless
    restart: unless-stopped
    depends_on:
      - paperless-db
      - paperless-redis
    ports:
      - "8000:8000"
    volumes:
      - ./data:/usr/src/paperless/data
      - ./media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://paperless-redis:6379
      PAPERLESS_DBHOST: paperless-db
      PAPERLESS_DBNAME: paperless
      PAPERLESS_DBUSER: paperless
      PAPERLESS_DBPASS: changeme
      PAPERLESS_OCR_LANGUAGE: eng
      PAPERLESS_TIME_ZONE: America/New_York
      PAPERLESS_ADMIN_USER: admin
      PAPERLESS_ADMIN_PASSWORD: changeme
      PAPERLESS_URL: https://paperless.home.lab

After deploying, create your admin account (or use the environment variables above) and access the web interface on port 8000.

Setting Up the Consumption Pipeline

The consume directory is where Paperless-ngx watches for new documents. Any file you drop here gets automatically imported, OCR'd, tagged, and filed. The original file is then removed (or moved, depending on your configuration).

Option 1: Network Share

Mount the consume directory as a Samba/NFS share. Then you can scan documents on your phone or computer and save them directly to the share:

services:
  samba:
    image: dperson/samba
    container_name: paperless-samba
    restart: unless-stopped
    ports:
      - "445:445"
    volumes:
      - ./consume:/mount/consume
    command: >
      -u "scanner;scannerpass"
      -s "consume;/mount/consume;yes;no;no;scanner"

Option 2: Email Consumption

Paperless-ngx can monitor an email inbox and import attachments automatically:

environment:
  PAPERLESS_EMAIL_TASK_CRON: "*/10 * * * *"

Then configure an email account in the admin panel under Mail → Mail Accounts. Forward documents to a dedicated email address and they'll be automatically imported.

Option 3: Mobile Scanning Apps

Use a scanning app on your phone that saves to a network share or cloud folder synced to the consume directory. Popular options:

Organizing Documents

Tags

Tags are the primary organization method. Create tags for categories that matter to you:

Correspondents

Correspondents represent who sent or created the document. Paperless-ngx can auto-detect these:

Document Types

Broader categories than tags: Letter, Receipt, Invoice, Contract, Statement, Manual.

Automatic Matching

The real power comes from matching rules. Paperless-ngx can automatically assign tags, correspondents, and document types based on content:

  1. Exact match: If the document contains "Blue Cross Blue Shield", tag it insurance and set correspondent to "BCBS".
  2. Regular expression: Match invoice numbers, account numbers, or specific formats.
  3. Fuzzy match: Catches slight variations in how names or terms appear.
  4. Machine learning: After you manually tag enough documents (~20-30), Paperless-ngx learns your patterns and starts suggesting automatically.

Storage and Backup

How Much Space Do You Need?

A typical scanned document (letter-sized, 300 DPI) is about 500KB-2MB as a PDF. At that rate:

The media directory holds the actual files, and the data directory holds the database and index. Back up both.

Backup Strategy

Paperless-ngx has a built-in export function:

docker exec paperless document_exporter /usr/src/paperless/export

This exports all documents and metadata to the export directory. Combine this with your regular backup solution (BorgBackup, restic, etc.) for a solid backup strategy.

Tips for Getting Started

  1. Start with recent documents. Don't try to scan your entire filing cabinet on day one. Start with new documents as they arrive and backfill when you have time.

  2. Establish tag conventions early. It's harder to rename and reorganize later. Decide on your tag structure before importing hundreds of documents.

  3. Use the mobile app. Scan documents the moment you receive them. The longer paper sits around, the less likely you are to digitize it.

  4. Set up automatic matching rules as you go. Every time you manually tag a document, ask yourself: "Could Paperless-ngx detect this automatically?" If yes, create a matching rule.

  5. Keep original papers for important documents. Paperless-ngx is excellent for searching and organization, but some documents (birth certificates, titles, certain legal documents) should be kept in physical form too.

Performance Considerations

OCR is CPU-intensive. When you first set up Paperless-ngx and bulk-import documents, expect high CPU usage as the OCR engine processes everything. After the initial import, ongoing consumption is light — processing a single document takes 10-30 seconds depending on length and your hardware.

For bulk imports of hundreds of documents, consider running Paperless-ngx on a machine with at least 4 cores. For ongoing use with a few documents per week, even a Raspberry Pi 4 handles it fine.

Paperless-ngx is one of those home lab services that feels unnecessary until you set it up, and then you wonder how you ever lived without it. Being able to search your entire document history in seconds changes how you think about paper.