7. System Architecture & Technical Considerations

System Architecture Analysis

1. Data Capability Mapping

UI Element	Azure AI Field	Structure	Hover-to-Discover Support
Proof Quality Score	denseCaptionsResult (blur detection), metadata (image dimensions)	Caption text + confidence	⚠️ Indirect - requires derived heuristics
Drop Safety Score	denseCaptionsResult + tagsResult	Combined analysis	✅ Dense captions have boundingBox
Risk Badges (Weather/Theft)	denseCaptionsResult.values[].text	{text, confidence, boundingBox: {x,y,w,h}}	✅ Yes - bounding box per caption
"Why" Engine Text	denseCaptionsResult.values[].text	Natural language descriptions	✅ Direct
Label Visibility Check	readResult.blocks[].lines[]	{text, boundingPolygon: [{x,y},...]}	✅ OCR polygons available
Visual Tags Toggle	tagsResult.values[]	{name, confidence}	❌ No bounding boxes on tags

Hover-to-Discover Image Analysis Implementation

The critical linkage for "Hover-to-Discover" works as follows:

UI Risk Text ("Weather Hazard") 
    → detected_risks[].risk_id 
    → detected_risks[].underlying_evidence[].bounding_box 
    → Frontend draws overlay on Left Pane image

Key insight: denseCaptionsResult provides bounding boxes ({x, y, w, h}) for each caption. The Decision Engine must:

Parse caption text for risk keywords (e.g., "puddle", "wet pavement")
Store the associated boundingBox as underlying_evidence
Frontend retrieves bounding box by risk_id on hover

2. Data Model Design (Schema)

Prisma/SQL Schema

// Core Delivery Analysis Record
model AnalyzedDelivery {
  id                  String            @id @default(uuid())
  shipment_id         String            @unique
  image_url           String
  image_width         Int
  image_height        Int
  
  // === DUAL SCORE SYSTEM ===
  proof_quality_score Int               // 0-100: Is the photo good?
  drop_safety_score   Int               // 0-100: Was the behavior safe?
  
  // === STATUS FLAGS ===
  review_status       ReviewStatus      @default(PENDING)
  audit_outcome       AuditOutcome?
  
  // === RELATIONS ===
  proof_quality       ProofQualityCheck?
  drop_safety         DropSafetyCheck?
  detected_risks      DetectedRisk[]
  human_feedback      HumanFeedback[]
  
  // === RAW DATA (for debugging/reprocessing) ===
  raw_azure_response  Json
  
  created_at          DateTime          @default(now())
  updated_at          DateTime          @updatedAt
  reviewed_at         DateTime?
  reviewed_by         String?
}

enum ReviewStatus {
  PENDING           // Awaiting dispatcher review
  FLAGGED           // AI flagged - needs human review
  APPROVED          // Human approved
  COACHING_FLAGGED  // Marked for driver coaching
}

enum AuditOutcome {
  PASS
  FAIL
  OVERRIDE_APPROVED  // AI was wrong (False Positive)
}

// === PROOF QUALITY LAYER ===
model ProofQualityCheck {
  id                   String           @id @default(uuid())
  delivery_id          String           @unique
  delivery             AnalyzedDelivery @relation(fields: [delivery_id], references: [id])
  
  // Checklist items (spec requirement)
  is_image_sharp       Boolean
  is_label_visible     Boolean
  is_package_visible   Boolean
  has_adequate_lighting Boolean
  
  blur_confidence      Float?           // From dense caption "blurry image of..."
  ocr_text_detected    String[]         // From readResult
  label_bounding_box   Json?            // {x, y, w, h} for label location
}

// === DROP SAFETY LAYER ===
model DropSafetyCheck {
  id                   String           @id @default(uuid())
  delivery_id          String           @unique
  delivery             AnalyzedDelivery @relation(fields: [delivery_id], references: [id])
  
  // Context indicators
  is_residential       Boolean          // porch, door, mat detected
  is_sheltered         Boolean          // garage, overhang detected
  is_near_road         Boolean          // street, curb, vehicle detected
  has_weather_exposure Boolean          // wet, rain, puddle detected
  has_theft_risk       Boolean          // public area, no concealment
  
  // Package material vulnerability
  package_type         PackageType?
  material_porous      Boolean          @default(true)  // cardboard = porous
}

enum PackageType {
  CARDBOARD
  PLASTIC_BAG
  POLY_MAILER
  UNKNOWN
}

// === DETECTED RISKS (with hover-to-discover support) ===
model DetectedRisk {
  id                   String           @id @default(uuid())
  delivery_id          String
  delivery             AnalyzedDelivery @relation(fields: [delivery_id], references: [id])
  
  risk_type            RiskType
  severity             Severity
  severity_score       Int              // 0-100 contribution to drop_safety deduction
  
  // UI Display
  display_label        String           // "Weather Hazard", "Theft Risk"
  explanation_text     String           // "Puddles detected adjacent to porous packaging"
  
  // === HOVER-TO-DISCOVER LINKAGE ===
  underlying_evidence  Json             // Array of evidence sources
  // Structure: [{
  //   source_type: "dense_caption" | "tag" | "ocr",
  //   source_text: "a puddle on the ground",
  //   confidence: 0.85,
  //   bounding_box: {x: 100, y: 200, w: 150, h: 80}  // nullable for tags
  // }]
  
  // === RL FEEDBACK LOOP ===
  human_feedback_status FeedbackStatus  @default(UNREVIEWED)
}

enum RiskType {
  WEATHER_HAZARD
  THEFT_RISK
  ROAD_PROXIMITY
  POOR_PLACEMENT
  PACKAGE_DAMAGE
}

enum Severity {
  LOW
  MEDIUM
  HIGH
  CRITICAL
}

enum FeedbackStatus {
  UNREVIEWED
  CONFIRMED      // Human agrees with AI
  OVERRIDDEN     // Human disagrees (False Positive)
}

// === REINFORCEMENT LEARNING LOOP ===
model HumanFeedback {
  id                   String           @id @default(uuid())
  delivery_id          String
  delivery             AnalyzedDelivery @relation(fields: [delivery_id], references: [id])
  
  action_taken         FeedbackAction
  dispatcher_id        String
  
  // What was the original AI assessment?
  original_drop_score  Int
  original_risks       Json             // Snapshot of detected_risks at review time
  
  // Correction details
  correction_reason    String?          // Why did dispatcher override?
  
  // Training data export
  exported_for_training Boolean         @default(false)
  
  created_at           DateTime         @default(now())
}

enum FeedbackAction {
  APPROVE_DELIVERY     // Scenario B: AI was wrong
  FLAG_FOR_COACHING    // Scenario A: Driver messed up
  CUSTOMER_ALERT       // Scenario C: Notification sent
}

JSON Representation (API Response)

{
  "delivery_id": "uuid-123",
  "shipment_id": "SHP-456789",
  "image_url": "https://storage.blob/customer_123.jpg",
  "image_dimensions": { "width": 810, "height": 1080 },
  
  "scores": {
    "proof_quality": { "value": 78, "status": "PASS" },
    "drop_safety": { "value": 45, "status": "FAIL" }
  },
  
  "proof_quality_details": {
    "checklist": [
      { "item": "Image sharp", "passed": true },
      { "item": "Label visible", "passed": true },
      { "item": "Package visible", "passed": true },
      { "item": "Adequate lighting", "passed": false }
    ]
  },
  
  "detected_risks": [
    {
      "risk_id": "risk-001",
      "risk_type": "WEATHER_HAZARD",
      "severity": "HIGH",
      "severity_score": 35,
      "display_label": "Weather Hazard",
      "explanation": "Puddles and wet pavement detected adjacent to porous packaging.",
      "underlying_evidence": [
        {
          "source_type": "dense_caption",
          "source_text": "a puddle on the ground near boxes",
          "confidence": 0.82,
          "bounding_box": { "x": 150, "y": 600, "w": 200, "h": 100 }
        }
      ],
      "human_feedback_status": "UNREVIEWED"
    },
    {
      "risk_id": "risk-002",
      "risk_type": "THEFT_RISK",
      "severity": "MEDIUM",
      "severity_score": 20,
      "display_label": "Theft Risk",
      "explanation": "Package left in publicly visible location near roadway.",
      "underlying_evidence": [
        {
          "source_type": "tag",
          "source_text": "street",
          "confidence": 0.60,
          "bounding_box": null
        },
        {
          "source_type": "dense_caption",
          "source_text": "a white car on the road",
          "confidence": 0.77,
          "bounding_box": { "x": 626, "y": 97, "w": 181, "h": 104 }
        }
      ],
      "human_feedback_status": "UNREVIEWED"
    }
  ],
  
  "review_status": "FLAGGED",
  "audit_outcome": null
}

3. Decision Engine (Heuristics Construction)

Drop Safety Score Calculation (Pseudo-code)

def calculate_drop_safety_score(azure_result: dict) -> tuple[int, list[DetectedRisk]]:
    """
    Calculate Drop Safety Score (0-100) where 100 = perfectly safe.
    Returns (score, list of detected risks with evidence).
    """
    score = 100  # Start with perfect score, deduct for risks
    detected_risks = []
    
    # Extract data sources
    captions = azure_result.get('denseCaptionsResult', {}).get('values', [])
    tags = azure_result.get('tagsResult', {}).get('values', [])
    
    # Combine all text for keyword analysis
    caption_texts = [(c['text'].lower(), c['confidence'], c.get('boundingBox')) 
                     for c in captions]
    tag_names = [(t['name'].lower(), t['confidence']) for t in tags]
    
    # === RULE 1: WEATHER HAZARDS ===
    weather_keywords = ['wet', 'rain', 'puddle', 'snow', 'water', 'flooded']
    weather_evidence = []
    
    for text, conf, bbox in caption_texts:
        if any(kw in text for kw in weather_keywords) and conf >= 0.60:
            weather_evidence.append({
                'source_type': 'dense_caption',
                'source_text': text,
                'confidence': conf,
                'bounding_box': bbox
            })
    
    if weather_evidence:
        # Check package material vulnerability
        is_porous = any('cardboard' in t[0] or 'carton' in t[0] 
                        for t in tag_names if t[1] >= 0.70)
        
        # Calculate proximity (are weather hazard and package overlapping?)
        package_boxes = [c.get('boundingBox') for c in captions 
                         if 'box' in c['text'].lower() and c.get('boundingBox')]
        weather_boxes = [e['bounding_box'] for e in weather_evidence if e['bounding_box']]
        
        proximity = calculate_proximity(package_boxes, weather_boxes)
        
        if proximity == 'TOUCHING':
            severity = 'CRITICAL'
            deduction = 50 if is_porous else 35
        elif proximity == 'ADJACENT':
            severity = 'HIGH'
            deduction = 35 if is_porous else 20
        else:
            severity = 'MEDIUM'
            deduction = 15
        
        score -= deduction
        detected_risks.append(DetectedRisk(
            risk_type='WEATHER_HAZARD',
            severity=severity,
            severity_score=deduction,
            display_label='Weather Hazard',
            explanation=f"{'Puddles/wet conditions' if 'puddle' in str(weather_evidence) else 'Water exposure'} "
                        f"detected {'in contact with' if proximity == 'TOUCHING' else 'near'} "
                        f"{'porous' if is_porous else ''} packaging.",
            underlying_evidence=weather_evidence
        ))
    
    # === RULE 2: ROAD/THEFT RISK ===
    road_keywords = ['road', 'highway', 'street', 'curb', 'sidewalk', 'parking']
    safe_keywords = ['porch', 'door', 'mat', 'garage', 'shelter', 'entrance', 'building']
    
    road_evidence = []
    safe_evidence = []
    
    for text, conf, bbox in caption_texts:
        if any(kw in text for kw in road_keywords) and conf >= 0.60:
            road_evidence.append({
                'source_type': 'dense_caption', 'source_text': text,
                'confidence': conf, 'bounding_box': bbox
            })
        if any(kw in text for kw in safe_keywords) and conf >= 0.60:
            safe_evidence.append({
                'source_type': 'dense_caption', 'source_text': text,
                'confidence': conf, 'bounding_box': bbox
            })
    
    # Add tag-based evidence (no bounding boxes)
    for name, conf in tag_names:
        if name in road_keywords and conf >= 0.70:
            road_evidence.append({
                'source_type': 'tag', 'source_text': name,
                'confidence': conf, 'bounding_box': None
            })
    
    if road_evidence and not safe_evidence:
        severity = 'HIGH' if len(road_evidence) >= 2 else 'MEDIUM'
        deduction = 25 if severity == 'HIGH' else 15
        score -= deduction
        detected_risks.append(DetectedRisk(
            risk_type='THEFT_RISK',
            severity=severity,
            severity_score=deduction,
            display_label='Theft Risk',
            explanation="Package left in publicly visible location near roadway.",
            underlying_evidence=road_evidence
        ))
    elif safe_evidence:
        score += 10  # Bonus for safe location (capped at 100)
    
    # === RULE 3: VEHICLE PROXIMITY (enhanced theft/damage risk) ===
    vehicle_keywords = ['car', 'vehicle', 'truck', 'van']
    vehicle_evidence = [
        {'source_type': 'dense_caption', 'source_text': text, 
         'confidence': conf, 'bounding_box': bbox}
        for text, conf, bbox in caption_texts
        if any(kw in text for kw in vehicle_keywords) and conf >= 0.70
    ]
    
    if vehicle_evidence:
        deduction = 10
        score -= deduction
        # Append to existing THEFT_RISK or create new
        existing = next((r for r in detected_risks if r.risk_type == 'THEFT_RISK'), None)
        if existing:
            existing.underlying_evidence.extend(vehicle_evidence)
            existing.severity_score += deduction
        else:
            detected_risks.append(DetectedRisk(
                risk_type='ROAD_PROXIMITY',
                severity='LOW',
                severity_score=deduction,
                display_label='Road Proximity',
                explanation="Vehicle detected near drop location.",
                underlying_evidence=vehicle_evidence
            ))
    
    return (max(0, min(100, score)), detected_risks)


def calculate_proximity(box_list_a: list, box_list_b: list) -> str:
    """Calculate spatial relationship between two sets of bounding boxes."""
    for box_a in box_list_a:
        for box_b in box_list_b:
            if boxes_overlap(box_a, box_b):
                return 'TOUCHING'
            if boxes_adjacent(box_a, box_b, threshold=50):
                return 'ADJACENT'
    return 'DISTANT'

Heuristics Table

Signal	Source	Confidence Threshold	Score Impact	Severity	Badge
puddle + cardboard box + TOUCHING	dense_caption	≥0.60	-50	CRITICAL	🔴 Weather Hazard
puddle + cardboard box + ADJACENT	dense_caption	≥0.60	-35	HIGH	🔴 Weather Hazard
wet/rain + any package	dense_caption	≥0.60	-15	MEDIUM	🟡 Weather Hazard
street/road + NO porch/door	tag + caption	≥0.60 (caption), ≥0.70 (tag)	-25	HIGH	🔴 Theft Risk
car/vehicle near package	dense_caption	≥0.70	-10	LOW	🟡 Road Proximity
porch/door/mat detected	dense_caption	≥0.60	+10	—	✅ Safe Location
blurry image in caption	dense_caption	≥0.65	-20 (Proof Quality)	—	⚠️ Poor Image
NO OCR text detected	readResult	—	-15 (Proof Quality)	—	⚠️ Label Not Visible

Warning vs. Failure Thresholds

Score Range	Status	UI Treatment
70-100	✅ PASS	Green badge, no modal trigger
40-69	⚠️ WARNING	Amber badge, "Review Needed" status
0-39	❌ FAIL	Red badge, immediate modal, high priority

4. Architecture & Data Flow

Mermaid Diagram

View here: Online FlowChart & Diagrams Editor - Mermaid Live Editor

flowchart TB

    subgraph "1. Image Ingestion"
        A[Driver Mobile App] -->|Upload VPOD Image| B[Azure Blob Storage]
        B -->|Event Trigger| C[Azure Event Grid]
    end
    
    subgraph "2. AI Processing Pipeline"
        C -->|Trigger| D[Azure Function: Vision Processor]
        D -->|API Call| E[Azure AI Vision 4.0]
        E -->|JSON Response| D
        D -->|Enrich & Score| F[Decision Engine<br/>Serverless Function]
    end
    
    subgraph "3. Decision Engine"
        F -->|Calculate| G[Proof Quality Score]
        F -->|Calculate| H[Drop Safety Score]
        F -->|Detect| I[Risk Evidence + Bounding Boxes]
        G & H & I -->|Structure| J[AnalyzedDelivery Record]
    end
    
    subgraph "4. Data Layer"
        J -->|Write| K[(PostgreSQL / Cosmos DB)]
        K -->|Read| L[GraphQL API / REST API]
    end
    
    subgraph "5. Frontend Experience"
        L -->|Fetch| M[Ops Vision UI]
        M -->|Split Pane| N[Left: Raw Image]
        M -->|Split Pane| O[Right: Risk Sidebar]
        O -->|Hover Event| P[Highlight Bounding Box on N]
    end
    
    subgraph "6. RL Feedback Loop"
        M -->|Dispatcher Action| Q{User Decision}
        Q -->|Approve| R[HumanFeedback: OVERRIDE]
        Q -->|Flag for Coaching| S[HumanFeedback: CONFIRMED]
        Q -->|Customer Alert| T[Trigger Email + Log]
        R & S -->|Write| K
        R -->|Export| U[Training Data Pipeline]
        U -->|Retrain| V[Custom Vision Model<br/>or Fine-tuned Prompt]
    end

Component Responsibilities

Component	Technology	Responsibility
Vision Processor	Azure Function (Python)	Receive blob trigger, call Azure AI Vision API, parse response
Decision Engine	Azure Function (Python)	Apply heuristics, calculate scores, structure detected_risks with evidence
Database	PostgreSQL (Azure) or Cosmos DB	Store AnalyzedDelivery, DetectedRisk, HumanFeedback
API Layer	FastAPI / GraphQL (Azure App Service)	Expose delivery data to frontend, handle feedback writes
Frontend	React + TailwindCSS	Split-pane UI, hover-to-discover, action buttons
RL Training Pipeline	Azure ML / Custom Python	Export overridden records, retrain classification thresholds

Reinforcement Learning Loop (Technical Flow)

1. User clicks "Approve Delivery" on a FLAGGED delivery
   ↓
2. Frontend POSTs to /api/feedback:
   {
     delivery_id: "uuid-123",
     action: "APPROVE_DELIVERY",
     correction_reason: "Shadow, not puddle"
   }
   ↓
3. API writes HumanFeedback record:
   - Snapshot original_risks and original_drop_score
   - Set human_feedback_status = OVERRIDDEN on all DetectedRisk records
   ↓
4. Nightly batch job exports OVERRIDDEN records:
   - Image URL + original Azure response + human correction
   ↓
5. Two retraining paths:
   a) **Threshold Adjustment**: Lower confidence requirement for "puddle" 
      if 30%+ of puddle detections are overridden
   b) **Custom Vision Model**: Fine-tune on corrected examples for 
      domain-specific hazards (e.g., "loading dock" vs "street")
   ↓
6. Deploy updated Decision Engine weights/thresholds

5. Gap Analysis

❌ Unsupported by Current Notebook

Spec Requirement	Gap	Remediation
Proximity detection ("puddle ADJACENT to box")	Azure dense captions don't provide semantic relationships between objects	Implement calculate_proximity() using bounding box overlap/distance in Decision Engine
Wetness vs. Shadow distinction	Azure cannot distinguish reflective surfaces (wet pavement vs. shadow)	Requires Custom Vision model trained on logistics images, or LLM re-analysis of image
Tags with bounding boxes	tagsResult has NO bounding boxes (only name + confidence)	Use denseCaptionsResult as primary for Hover-to-Discover; tags only for scoring
Object Detection API	Notebook uses features=denseCaptions,read,tags but NOT objects	Add objects to API call for dedicated object detection with bounding boxes
Blur/sharpness quantification	No explicit blur metric—only "blurry image of..." in captions	Implement client-side image analysis (Laplacian variance) OR use Azure imageQuality feature
Material detection (porous vs. plastic)	Tags include "cardboard" but not material properties	Build keyword mapping: {cardboard, carton} → porous, {poly, plastic} → non-porous

⚠️ Partially Supported

Spec Requirement	Current State	Enhancement Needed
OCR for label visibility	readResult provides text + polygons	Add confidence threshold (≥0.80) and minimum text length check
Risk explanation copy	Notebook has basic findings text	Enhance Decision Engine to generate spec-compliant copy: "Puddles and wet pavement detected adjacent to porous packaging."

✅ Fully Supported

Spec Requirement	Azure AI Field
Dense scene descriptions	denseCaptionsResult.values[].text
Bounding box coordinates	denseCaptionsResult.values[].boundingBox
OCR text extraction	readResult.blocks[].lines[].text
Semantic tags	tagsResult.values[].name
Confidence filtering	All results include confidence

Summary

The Azure AI Vision API provides sufficient foundation for Ops Vision, but the Decision Engine must bridge significant gaps:

Proximity logic must be computed geometrically from bounding boxes
Tags lack spatial data—rely on dense captions for Hover-to-Discover
Consider adding objects feature to API call for better object detection
RL loop is well-defined but requires disciplined feedback storage and batch export
Material/wetness nuance is the hardest gap—may require custom model or LLM verification