Education· Last updated April 16, 2026

How to Automate Master Service Agreement Review in Python with LegalGuard AI

Build an automated MSA review pipeline in Python. Extract key clauses, flag risky terms, and score vendor contracts using the LegalGuard AI API.

How to Automate Master Service Agreement Review in Python with LegalGuard AI

Master Service Agreements (MSAs) are the backbone of B2B commercial relationships — and they're also one of the most consistently under-reviewed categories of vendor contracts. A typical enterprise carries hundreds of active MSAs, each negotiated under different circumstances, each containing varying terms around liability, indemnification, data processing, and auto-renewal. Most of those contracts live in a shared drive somewhere, last reviewed at signing.

This tutorial shows you how to build an automated MSA review pipeline in Python using the LegalGuard AI API. By the end, you'll have a system that extracts key clauses from any vendor MSA, flags high-risk terms, scores overall contract risk, and generates a structured review report — in seconds rather than hours.

Why MSA Automation Matters

The numbers from the legal tech market make the case: contract lifecycle management (CLM) software is the single hottest subcategory in legal tech investment in 2026, according to Artificial Lawyer's Q1 2026 funding analysis. Investors are betting billions on the premise that unstructured contract data is one of the largest untapped sources of business intelligence in the enterprise.

The operational reality driving that investment:

  • A typical mid-sized company has 500-2,000 active vendor MSAs
  • The average cost of a legal review per contract ranges from $500-$2,000 for external counsel
  • 60-70% of enterprise MSAs contain terms that differ materially from standard templates — but only a fraction get flagged during procurement
  • Auto-renewal clauses alone cost enterprises hundreds of thousands of dollars annually in forgotten SaaS subscriptions

Automated clause extraction doesn't replace legal review for high-stakes contracts — but it does change what your legal team focuses on. Instead of reading 50 pages to find the liability cap, they review a structured extraction and flag the three clauses that need negotiation.

What We'll Build

A Python script that:

  1. Accepts an MSA document (PDF or plain text)
  2. Extracts 10+ standard clause types with their full text
  3. Scores each clause for risk (low/medium/high)
  4. Generates an overall contract risk score
  5. Outputs a structured JSON report and a human-readable summary

Prerequisites

pip install requests pypdf2 python-dotenv rich

You'll need:

  • A LegalGuard AI API key (available on RapidAPI)
  • Python 3.10+
  • An MSA document in PDF or text format for testing

Step 1: Document Ingestion

Start with a module to handle PDF extraction and text preparation:

# ingestion.py
import PyPDF2
import re
from pathlib import Path
 
def extract_text_from_pdf(pdf_path: str) -> str:
    """Extract raw text from a PDF document."""
    text_chunks = []
    
    with open(pdf_path, "rb") as f:
        reader = PyPDF2.PdfReader(f)
        for page_num, page in enumerate(reader.pages):
            page_text = page.extract_text()
            if page_text.strip():
                text_chunks.append(f"[Page {page_num + 1}]\n{page_text}")
    
    return "\n\n".join(text_chunks)
 
def prepare_document(source: str) -> tuple[str, str]:
    """
    Prepare a document for analysis.
    Returns (document_text, source_type).
    """
    path = Path(source)
    
    if path.suffix.lower() == ".pdf":
        text = extract_text_from_pdf(source)
        source_type = "pdf"
    elif path.suffix.lower() in (".txt", ".md"):
        text = path.read_text(encoding="utf-8")
        source_type = "text"
    else:
        # Assume raw string input
        text = source
        source_type = "raw"
    
    # Basic cleanup
    text = re.sub(r'\n{3,}', '\n\n', text)  # Normalize excessive line breaks
    text = re.sub(r' {2,}', ' ', text)        # Normalize multiple spaces
    
    return text.strip(), source_type

Step 2: The MSA Review Engine

The core analysis module that calls LegalGuard AI:

# msa_reviewer.py
import os
import requests
from dataclasses import dataclass
from typing import Any
 
API_BASE = "https://apivult.com/api/legalguard/v1"
API_KEY = os.environ.get("LEGALGUARD_API_KEY", "YOUR_API_KEY")
 
# Standard MSA clauses to extract
MSA_CLAUSES = [
    "liability_cap",           # Maximum liability ceiling
    "mutual_indemnification",  # Indemnification obligations
    "ip_ownership",            # Intellectual property rights
    "data_processing",         # Data handling, privacy, breach notification
    "termination_for_cause",   # Conditions allowing termination
    "termination_for_convenience", # Notice periods for convenience termination
    "auto_renewal",            # Automatic renewal terms
    "price_escalation",        # Price increase provisions
    "governing_law",           # Jurisdiction and governing law
    "dispute_resolution",      # Arbitration vs. litigation
    "confidentiality",         # NDA terms embedded in MSA
    "force_majeure",           # Force majeure scope and duration
]
 
@dataclass
class ClauseResult:
    clause_type: str
    extracted_text: str
    risk_level: str          # "low", "medium", "high", "not_found"
    risk_rationale: str
    page_reference: str | None
 
@dataclass  
class MSAReviewResult:
    document_id: str
    overall_risk_score: int  # 0-100
    risk_level: str          # "low", "medium", "high", "critical"
    clauses: list[ClauseResult]
    high_risk_items: list[str]
    missing_standard_clauses: list[str]
    recommendations: list[str]
    review_summary: str
 
def analyze_msa(document_text: str, context: dict | None = None) -> MSAReviewResult:
    """
    Send MSA to LegalGuard AI for clause extraction and risk scoring.
    
    Args:
        document_text: Full text of the MSA document
        context: Optional dict with metadata (vendor_name, contract_value, etc.)
    
    Returns:
        MSAReviewResult with extracted clauses and risk assessment
    """
    payload = {
        "document": document_text,
        "document_type": "master_service_agreement",
        "extract_clauses": MSA_CLAUSES,
        "risk_scoring": True,
        "jurisdiction": context.get("jurisdiction", "US") if context else "US",
        "analysis_options": {
            "flag_missing_clauses": True,
            "identify_unusual_terms": True,
            "compare_to_standard_template": True,
            "auto_renewal_alert": True
        },
        "context": context or {}
    }
    
    response = requests.post(
        f"{API_BASE}/analyze",
        headers={
            "X-RapidAPI-Key": API_KEY,
            "Content-Type": "application/json"
        },
        json=payload,
        timeout=60  # MSA analysis may take up to 60s for long documents
    )
    
    response.raise_for_status()
    data = response.json()
    
    # Parse clause results
    clauses = []
    for clause_data in data.get("clauses", []):
        clauses.append(ClauseResult(
            clause_type=clause_data["type"],
            extracted_text=clause_data.get("text", ""),
            risk_level=clause_data.get("risk_level", "not_found"),
            risk_rationale=clause_data.get("rationale", ""),
            page_reference=clause_data.get("page_reference")
        ))
    
    return MSAReviewResult(
        document_id=data["document_id"],
        overall_risk_score=data["overall_risk_score"],
        risk_level=data["risk_level"],
        clauses=clauses,
        high_risk_items=data.get("high_risk_items", []),
        missing_standard_clauses=data.get("missing_clauses", []),
        recommendations=data.get("recommendations", []),
        review_summary=data.get("summary", "")
    )

Step 3: Report Generation

Format the results for human review:

# reporter.py
import json
from datetime import datetime
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich import print as rprint
 
from msa_reviewer import MSAReviewResult
 
console = Console()
 
RISK_COLORS = {
    "low": "green",
    "medium": "yellow",
    "high": "red",
    "critical": "bold red",
    "not_found": "dim"
}
 
def print_msa_review(result: MSAReviewResult, vendor_name: str = "Vendor"):
    """Print a formatted MSA review report to the console."""
    
    # Header
    risk_color = RISK_COLORS.get(result.risk_level, "white")
    console.print(Panel(
        f"[bold]MSA Review: {vendor_name}[/bold]\n"
        f"Document ID: {result.document_id}\n"
        f"Overall Risk Score: [{risk_color}]{result.overall_risk_score}/100 ({result.risk_level.upper()})[/{risk_color}]\n"
        f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M UTC')}",
        title="Contract Analysis Report",
        border_style="blue"
    ))
    
    # Clause extraction table
    table = Table(title="Extracted Clauses")
    table.add_column("Clause", style="cyan", width=25)
    table.add_column("Risk", justify="center", width=10)
    table.add_column("Key Finding", width=55)
    
    for clause in result.clauses:
        risk_color = RISK_COLORS.get(clause.risk_level, "white")
        risk_display = f"[{risk_color}]{clause.risk_level.upper()}[/{risk_color}]"
        
        # Truncate long extracted text for display
        finding = clause.risk_rationale[:120] + "..." if len(clause.risk_rationale) > 120 else clause.risk_rationale
        
        table.add_row(
            clause.clause_type.replace("_", " ").title(),
            risk_display,
            finding or "[dim]Not found in document[/dim]"
        )
    
    console.print(table)
    
    # High-risk items
    if result.high_risk_items:
        console.print("\n[bold red]⚠ HIGH-RISK ITEMS REQUIRING LEGAL REVIEW:[/bold red]")
        for item in result.high_risk_items:
            console.print(f"  • {item}")
    
    # Missing clauses
    if result.missing_standard_clauses:
        console.print("\n[bold yellow]⚡ MISSING STANDARD CLAUSES:[/bold yellow]")
        for clause in result.missing_standard_clauses:
            console.print(f"  • {clause.replace('_', ' ').title()}")
    
    # Recommendations
    if result.recommendations:
        console.print("\n[bold green]✓ RECOMMENDATIONS:[/bold green]")
        for i, rec in enumerate(result.recommendations, 1):
            console.print(f"  {i}. {rec}")
    
    # Summary
    console.print(f"\n[bold]Summary:[/bold] {result.review_summary}")
 
def save_json_report(result: MSAReviewResult, output_path: str):
    """Save structured review data as JSON for downstream processing."""
    report = {
        "document_id": result.document_id,
        "overall_risk_score": result.overall_risk_score,
        "risk_level": result.risk_level,
        "clauses": [
            {
                "type": c.clause_type,
                "extracted_text": c.extracted_text,
                "risk_level": c.risk_level,
                "risk_rationale": c.risk_rationale,
                "page_reference": c.page_reference
            }
            for c in result.clauses
        ],
        "high_risk_items": result.high_risk_items,
        "missing_clauses": result.missing_standard_clauses,
        "recommendations": result.recommendations,
        "summary": result.review_summary,
        "generated_at": datetime.utcnow().isoformat() + "Z"
    }
    
    with open(output_path, "w") as f:
        json.dump(report, f, indent=2)
    
    print(f"JSON report saved to: {output_path}")

Step 4: The Main Pipeline

Wire everything together:

# main.py
import sys
import argparse
from pathlib import Path
 
from ingestion import prepare_document
from msa_reviewer import analyze_msa
from reporter import print_msa_review, save_json_report
 
def review_msa(
    document_path: str,
    vendor_name: str = "Unknown Vendor",
    contract_value: str | None = None,
    output_json: str | None = None
) -> int:
    """
    Main MSA review pipeline.
    Returns exit code (0 = low/medium risk, 1 = high/critical risk).
    """
    print(f"Preparing document: {document_path}")
    document_text, source_type = prepare_document(document_path)
    print(f"Document loaded ({source_type}, {len(document_text):,} characters)")
    
    # Build context
    context = {
        "vendor_name": vendor_name,
        "jurisdiction": "US",
    }
    if contract_value:
        context["contract_value"] = contract_value
    
    print("Analyzing MSA with LegalGuard AI...")
    result = analyze_msa(document_text, context=context)
    
    # Display results
    print_msa_review(result, vendor_name=vendor_name)
    
    # Optionally save JSON report
    if output_json:
        save_json_report(result, output_json)
    
    # Return exit code based on risk level
    return 1 if result.risk_level in ("high", "critical") else 0
 
if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Automated MSA Review Pipeline")
    parser.add_argument("document", help="Path to MSA document (PDF or TXT)")
    parser.add_argument("--vendor", default="Unknown Vendor", help="Vendor name")
    parser.add_argument("--value", help="Estimated contract value (e.g. '$500K')")
    parser.add_argument("--json-output", help="Path to save JSON report")
    
    args = parser.parse_args()
    exit_code = review_msa(
        document_path=args.document,
        vendor_name=args.vendor,
        contract_value=args.value,
        output_json=args.json_output
    )
    sys.exit(exit_code)

Step 5: Running the Pipeline

# Set your API key
export LEGALGUARD_API_KEY="YOUR_API_KEY"
 
# Basic review
python main.py vendor_msa.pdf --vendor "Acme Corp"
 
# Full review with JSON output
python main.py vendor_msa.pdf \
  --vendor "Acme Corp" \
  --value "$250K" \
  --json-output review_output.json

Example output:

Preparing document: vendor_msa.pdf
Document loaded (pdf, 48,291 characters)
Analyzing MSA with LegalGuard AI...

╭─── Contract Analysis Report ──────────────────────────────────────╮
│ MSA Review: Acme Corp                                              │
│ Document ID: doc_msa_2026-04-16_abc123                            │
│ Overall Risk Score: 72/100 (HIGH)                                  │
│ Generated: 2026-04-16 14:23 UTC                                    │
╰────────────────────────────────────────────────────────────────────╯

┌─────────────────────────────────────────────────────────────────────┐
│ Extracted Clauses                                                   │
├──────────────────────────┬──────────┬──────────────────────────────┤
│ Clause                   │ Risk     │ Key Finding                  │
├──────────────────────────┼──────────┼──────────────────────────────┤
│ Liability Cap            │ HIGH     │ Cap set at 1-month fees —    │
│                          │          │ well below industry standard  │
│ Mutual Indemnification   │ MEDIUM   │ Indemnification is one-sided │
│ Ip Ownership             │ HIGH     │ Vendor claims ownership of    │
│                          │          │ all work product derivatives  │
│ Data Processing          │ LOW      │ Standard GDPR-compliant DPA  │
│ Auto Renewal             │ HIGH     │ 180-day cancellation window  │
│                          │          │ — unusually long notice req   │
...

⚠ HIGH-RISK ITEMS REQUIRING LEGAL REVIEW:
  • Liability cap of 1 month fees is 6x below industry standard (6 months)
  • IP ownership clause grants vendor rights to work product derivatives
  • 180-day auto-renewal cancellation window creates significant lock-in risk

✓ RECOMMENDATIONS:
  1. Negotiate liability cap to minimum 6 months aggregate fees
  2. Add explicit carve-out for customer IP derivatives in Section 8.2
  3. Reduce auto-renewal cancellation window to 30-60 days

Batch Processing: Review Multiple Contracts

For teams reviewing a portfolio of vendor MSAs:

# batch_review.py
import os
import csv
from pathlib import Path
from main import review_msa
 
def batch_review_contracts(contract_dir: str, output_csv: str):
    """Review all MSA documents in a directory and output summary CSV."""
    contracts = list(Path(contract_dir).glob("*.pdf"))
    results = []
    
    for contract_path in contracts:
        vendor_name = contract_path.stem.replace("_", " ").title()
        print(f"\nProcessing: {vendor_name}")
        
        try:
            from msa_reviewer import analyze_msa
            from ingestion import prepare_document
            
            text, _ = prepare_document(str(contract_path))
            result = analyze_msa(text, context={"vendor_name": vendor_name})
            
            results.append({
                "vendor": vendor_name,
                "file": contract_path.name,
                "risk_score": result.overall_risk_score,
                "risk_level": result.risk_level,
                "high_risk_count": len(result.high_risk_items),
                "missing_clauses": ", ".join(result.missing_standard_clauses)
            })
        except Exception as e:
            results.append({
                "vendor": vendor_name,
                "file": contract_path.name,
                "risk_score": -1,
                "risk_level": "error",
                "high_risk_count": 0,
                "missing_clauses": str(e)
            })
    
    # Sort by risk score (highest first)
    results.sort(key=lambda x: x["risk_score"], reverse=True)
    
    with open(output_csv, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=results[0].keys())
        writer.writeheader()
        writer.writerows(results)
    
    print(f"\nBatch review complete. Results saved to {output_csv}")
    high_risk = [r for r in results if r["risk_level"] in ("high", "critical")]
    print(f"High-risk contracts requiring legal review: {len(high_risk)}/{len(results)}")
 
# Usage
batch_review_contracts("./vendor_contracts/", "msa_risk_summary.csv")

What This System Enables

The automated MSA review pipeline transforms how procurement and legal teams handle vendor contract onboarding:

Before automation: Each new vendor MSA requires 2-4 hours of legal review time. With hundreds of new vendor relationships per year, this creates a review queue that delays procurement and routes all contracts through expensive legal resources.

After automation: Initial screening identifies the 20-30% of contracts with genuine high-risk terms in under a minute. Legal review focuses exclusively on contracts that scored high/critical — reducing review time by 60-70% while improving consistency.

Measurable outcomes typical in this workflow:

  • Auto-renewal clause capture: organizations using automated MSA review typically recover $50K-$500K annually in forgotten SaaS renewals
  • Liability cap normalization: automated flagging of below-standard liability caps prevents material financial exposure
  • IP ownership standardization: systematic identification of non-standard IP clauses before execution

The full code for this tutorial is available to adapt for your specific MSA templates and risk criteria. LegalGuard AI supports custom clause definitions and jurisdiction-specific risk frameworks — contact APIVult for enterprise configuration options.

Sources