PaintCo Invoice Classifier — system architecture and API reference
Single Python server using Flask. No external AI APIs — all classification is done locally with rule-based keyword matching.
classifier.py uses pdfplumber for PDF text extraction and keyword matching for classification. No LLM, no external API calls.
extract_pdf_text() — extracts text from PDF using pdfplumberparse_invoice_text() — identifies invoice metadata (supplier, number, date, totals) and table rows_classify_item() — checks description against PAINT_KEYWORDS and MATERIALS_KEYWORDS_confidence_check() — flags items with Review status if low confidenceCurrently handles Wattyl/Hempel invoice format. To add support for new suppliers, adjust _parse_table_rows() in classifier.py.
email_monitor.py supports two modes:
Mail.Read application permission.Polling runs in a background daemon thread. Configurable interval (default 60s). Found emails are processed then marked as read.
sharepoint_client.py uses Microsoft Graph API with MSAL client credentials flow.
Required Graph permissions:
Sites.ReadWrite.All — read/write files in SharePointMail.Read — (for email monitoring) read mailbox messagesThe Excel register is downloaded, updated with openpyxl, and uploaded back. Uses PUT /drives/{id}/items/{id}/content for updates.
pip install flask pdfplumber openpyxl msal export PORT=5000 python3 server.py