Impact-Site-Verification: 578d421e-1081-463d-918b-ec5e29c5b9db
<Back

Business Analysis Table Auto-Generation AI Pipeline

Global Consulting Firm

Overview

Auto-generates 'Business Analysis Tables' using AI to streamline consultants' client business analysis. Designed and developed a pipeline that comprehensively analyzes business manuals in various formats (PDF, Word, PowerPoint, Excel) and outputs a unified CSV file. Built a reproducible experiment management infrastructure to streamline prompt iteration.

Architecture

Adopted an architecture with clear separation of Config, Orchestrator, and Core Logic roles. By simply editing the central configuration file (config.py), multiple combinations of prompts and parameters can be executed at once, with results output and managed individually.

Pipeline Orchestration with LangGraph: Integrated management of the 4-stage processing flow (Converter → Extractor → Grouper → Classifier) using LangGraph, enabling stateful workflow control.

  1. Converter: Convert various file formats to AI-friendly Markdown
  2. Extractor: Extract business procedures from Markdown and create structured tables
  3. Grouper: Action group classification from user perspective
  4. Classifier: Technical detailed classification

Key Features

Experiment Management: Enables batch execution and result comparison of multiple experiments through prompt version control and parameter combination definitions. Manages intermediate files and final outputs in separate directories for each experiment.

Two-Stage Classification: Split task classification into 'Grouping' and 'Detailed Classification' stages, improving AI judgment accuracy at each step and facilitating intermediate result verification.

Data Quality Assurance: Implemented quality gates using Pydantic models for input/output validation, ensuring AI outputs conform to expected JSON formats.

Azure Integration: Prepared file integration scripts with Azure Blob Storage, enabling seamless data input/output in cloud environments.

Web Application: Developed frontend with Next.js/TypeScript. Provides file upload functionality and analysis execution UI, enabling users to intuitively operate the pipeline.

Development & Quality

Adopted Python 3.12+ with uv package management. Applied Google Style Docstrings to all files, ensuring code quality through Ruff static analysis and formatting. Implemented Office file to PDF conversion using LibreOffice with Japanese font support to prevent character corruption.

Technologies

PythonGoogle GeminiLangGraphNext.jsTypeScriptAzure Blob StoragePandasPydanticLibreOfficeuvRuffPrompt Engineering