Developer Documentation

PaintCo Invoice Classifier — system architecture and API reference

Architecture

Single Python server using Flask. No external AI APIs — all classification is done locally with rule-based keyword matching.

File Structure

paintco-mia-wizard/
├── server.py — Flask web server, routes, email polling, SharePoint sync
├── classifier.py — PDF parsing + paint/materials classification engine
├── excel_writer.py — Invoice_Register.xlsx generation
├── email_monitor.py — IMAP + Graph API email polling
├── sharepoint_client.py — Microsoft Graph API SharePoint operations
├── config.py — Settings persistence (JSON file)
├── help_pages.py — Help + DevDoc HTML pages
├── start.sh — Startup script
└── TAX_INVOICE_92006512.pdf — Sample test invoice

Classification Engine

classifier.py uses pdfplumber for PDF text extraction and keyword matching for classification. No LLM, no external API calls.

How it works

  1. extract_pdf_text() — extracts text from PDF using pdfplumber
  2. parse_invoice_text() — identifies invoice metadata (supplier, number, date, totals) and table rows
  3. _classify_item() — checks description against PAINT_KEYWORDS and MATERIALS_KEYWORDS
  4. _confidence_check() — flags items with Review status if low confidence

Currently handles Wattyl/Hempel invoice format. To add support for new suppliers, adjust _parse_table_rows() in classifier.py.

API Endpoints

GET/
Web UI — invoice upload and classification
GET/history
Web UI — classification history
GET/settings
Web UI — email and SharePoint configuration
GET/help
Web UI — user help guide
GET/devdoc
Web UI — developer documentation
POST/api/classify
Upload and classify a PDF invoice (multipart form, field: file)
GET/api/health
Service health check
GET/api/stats
Aggregate classification statistics
GET/api/register/download
Download Invoice_Register.xlsx
GET/api/config
Get current settings (secrets masked)
POST/api/config/email
Save email configuration
POST/api/config/sharepoint
Save SharePoint configuration
POST/api/config/test/email
Test email connection
POST/api/config/test/sharepoint
Test SharePoint connection
GET/api/email/status
Email polling status
POST/api/email/start
Start email polling thread
POST/api/email/stop
Stop email polling thread
POST/api/watch/process
Process all PDFs in the inbox watch directory
GET/download/<entry_id>
Download JSON result for a specific classification

Email Monitoring

email_monitor.py supports two modes:

Polling runs in a background daemon thread. Configurable interval (default 60s). Found emails are processed then marked as read.

SharePoint Integration

sharepoint_client.py uses Microsoft Graph API with MSAL client credentials flow.

Required Graph permissions:

The Excel register is downloaded, updated with openpyxl, and uploaded back. Uses PUT /drives/{id}/items/{id}/content for updates.

Running Locally

pip install flask pdfplumber openpyxl msal
export PORT=5000
python3 server.py