7. System Architecture & Technical Considerations
System Architecture Analysis
1. Data Capability Mapping
|
UI Element |
Azure AI Field |
Structure |
Hover-to-Discover Support |
|---|---|---|---|
|
Proof Quality Score |
denseCaptionsResult (blur detection), metadata (image dimensions) |
Caption text + confidence |
⚠️ Indirect - requires derived heuristics |
|
Drop Safety Score |
denseCaptionsResult + tagsResult |
Combined analysis |
✅ Dense captions have boundingBox |
|
Risk Badges (Weather/Theft) |
denseCaptionsResult.values[].text |
{text, confidence, boundingBox: {x,y,w,h}} |
✅ Yes - bounding box per caption |
|
"Why" Engine Text |
denseCaptionsResult.values[].text |
Natural language descriptions |
✅ Direct |
|
Label Visibility Check |
readResult.blocks[].lines[] |
{text, boundingPolygon: [{x,y},...]} |
✅ OCR polygons available |
|
Visual Tags Toggle |
tagsResult.values[] |
{name, confidence} |
❌ No bounding boxes on tags |
Hover-to-Discover Image Analysis Implementation
The critical linkage for "Hover-to-Discover" works as follows:
UI Risk Text ("Weather Hazard")
→ detected_risks[].risk_id
→ detected_risks[].underlying_evidence[].bounding_box
→ Frontend draws overlay on Left Pane imageKey insight: denseCaptionsResult provides bounding boxes ({x, y, w, h}) for each caption. The Decision Engine must:
-
Parse caption text for risk keywords (e.g., "puddle", "wet pavement")
-
Store the associated boundingBox as underlying_evidence
-
Frontend retrieves bounding box by risk_id on hover
2. Data Model Design (Schema)
Prisma/SQL Schema
// Core Delivery Analysis Record
model AnalyzedDelivery {
id String @id @default(uuid())
shipment_id String @unique
image_url String
image_width Int
image_height Int
// === DUAL SCORE SYSTEM ===
proof_quality_score Int // 0-100: Is the photo good?
drop_safety_score Int // 0-100: Was the behavior safe?
// === STATUS FLAGS ===
review_status ReviewStatus @default(PENDING)
audit_outcome AuditOutcome?
// === RELATIONS ===
proof_quality ProofQualityCheck?
drop_safety DropSafetyCheck?
detected_risks DetectedRisk[]
human_feedback HumanFeedback[]
// === RAW DATA (for debugging/reprocessing) ===
raw_azure_response Json
created_at DateTime @default(now())
updated_at DateTime @updatedAt
reviewed_at DateTime?
reviewed_by String?
}
enum ReviewStatus {
PENDING // Awaiting dispatcher review
FLAGGED // AI flagged - needs human review
APPROVED // Human approved
COACHING_FLAGGED // Marked for driver coaching
}
enum AuditOutcome {
PASS
FAIL
OVERRIDE_APPROVED // AI was wrong (False Positive)
}
// === PROOF QUALITY LAYER ===
model ProofQualityCheck {
id String @id @default(uuid())
delivery_id String @unique
delivery AnalyzedDelivery @relation(fields: [delivery_id], references: [id])
// Checklist items (spec requirement)
is_image_sharp Boolean
is_label_visible Boolean
is_package_visible Boolean
has_adequate_lighting Boolean
blur_confidence Float? // From dense caption "blurry image of..."
ocr_text_detected String[] // From readResult
label_bounding_box Json? // {x, y, w, h} for label location
}
// === DROP SAFETY LAYER ===
model DropSafetyCheck {
id String @id @default(uuid())
delivery_id String @unique
delivery AnalyzedDelivery @relation(fields: [delivery_id], references: [id])
// Context indicators
is_residential Boolean // porch, door, mat detected
is_sheltered Boolean // garage, overhang detected
is_near_road Boolean // street, curb, vehicle detected
has_weather_exposure Boolean // wet, rain, puddle detected
has_theft_risk Boolean // public area, no concealment
// Package material vulnerability
package_type PackageType?
material_porous Boolean @default(true) // cardboard = porous
}
enum PackageType {
CARDBOARD
PLASTIC_BAG
POLY_MAILER
UNKNOWN
}
// === DETECTED RISKS (with hover-to-discover support) ===
model DetectedRisk {
id String @id @default(uuid())
delivery_id String
delivery AnalyzedDelivery @relation(fields: [delivery_id], references: [id])
risk_type RiskType
severity Severity
severity_score Int // 0-100 contribution to drop_safety deduction
// UI Display
display_label String // "Weather Hazard", "Theft Risk"
explanation_text String // "Puddles detected adjacent to porous packaging"
// === HOVER-TO-DISCOVER LINKAGE ===
underlying_evidence Json // Array of evidence sources
// Structure: [{
// source_type: "dense_caption" | "tag" | "ocr",
// source_text: "a puddle on the ground",
// confidence: 0.85,
// bounding_box: {x: 100, y: 200, w: 150, h: 80} // nullable for tags
// }]
// === RL FEEDBACK LOOP ===
human_feedback_status FeedbackStatus @default(UNREVIEWED)
}
enum RiskType {
WEATHER_HAZARD
THEFT_RISK
ROAD_PROXIMITY
POOR_PLACEMENT
PACKAGE_DAMAGE
}
enum Severity {
LOW
MEDIUM
HIGH
CRITICAL
}
enum FeedbackStatus {
UNREVIEWED
CONFIRMED // Human agrees with AI
OVERRIDDEN // Human disagrees (False Positive)
}
// === REINFORCEMENT LEARNING LOOP ===
model HumanFeedback {
id String @id @default(uuid())
delivery_id String
delivery AnalyzedDelivery @relation(fields: [delivery_id], references: [id])
action_taken FeedbackAction
dispatcher_id String
// What was the original AI assessment?
original_drop_score Int
original_risks Json // Snapshot of detected_risks at review time
// Correction details
correction_reason String? // Why did dispatcher override?
// Training data export
exported_for_training Boolean @default(false)
created_at DateTime @default(now())
}
enum FeedbackAction {
APPROVE_DELIVERY // Scenario B: AI was wrong
FLAG_FOR_COACHING // Scenario A: Driver messed up
CUSTOMER_ALERT // Scenario C: Notification sent
}JSON Representation (API Response)
{
"delivery_id": "uuid-123",
"shipment_id": "SHP-456789",
"image_url": "https://storage.blob/customer_123.jpg",
"image_dimensions": { "width": 810, "height": 1080 },
"scores": {
"proof_quality": { "value": 78, "status": "PASS" },
"drop_safety": { "value": 45, "status": "FAIL" }
},
"proof_quality_details": {
"checklist": [
{ "item": "Image sharp", "passed": true },
{ "item": "Label visible", "passed": true },
{ "item": "Package visible", "passed": true },
{ "item": "Adequate lighting", "passed": false }
]
},
"detected_risks": [
{
"risk_id": "risk-001",
"risk_type": "WEATHER_HAZARD",
"severity": "HIGH",
"severity_score": 35,
"display_label": "Weather Hazard",
"explanation": "Puddles and wet pavement detected adjacent to porous packaging.",
"underlying_evidence": [
{
"source_type": "dense_caption",
"source_text": "a puddle on the ground near boxes",
"confidence": 0.82,
"bounding_box": { "x": 150, "y": 600, "w": 200, "h": 100 }
}
],
"human_feedback_status": "UNREVIEWED"
},
{
"risk_id": "risk-002",
"risk_type": "THEFT_RISK",
"severity": "MEDIUM",
"severity_score": 20,
"display_label": "Theft Risk",
"explanation": "Package left in publicly visible location near roadway.",
"underlying_evidence": [
{
"source_type": "tag",
"source_text": "street",
"confidence": 0.60,
"bounding_box": null
},
{
"source_type": "dense_caption",
"source_text": "a white car on the road",
"confidence": 0.77,
"bounding_box": { "x": 626, "y": 97, "w": 181, "h": 104 }
}
],
"human_feedback_status": "UNREVIEWED"
}
],
"review_status": "FLAGGED",
"audit_outcome": null
}3. Decision Engine (Heuristics Construction)
Drop Safety Score Calculation (Pseudo-code)
def calculate_drop_safety_score(azure_result: dict) -> tuple[int, list[DetectedRisk]]:
"""
Calculate Drop Safety Score (0-100) where 100 = perfectly safe.
Returns (score, list of detected risks with evidence).
"""
score = 100 # Start with perfect score, deduct for risks
detected_risks = []
# Extract data sources
captions = azure_result.get('denseCaptionsResult', {}).get('values', [])
tags = azure_result.get('tagsResult', {}).get('values', [])
# Combine all text for keyword analysis
caption_texts = [(c['text'].lower(), c['confidence'], c.get('boundingBox'))
for c in captions]
tag_names = [(t['name'].lower(), t['confidence']) for t in tags]
# === RULE 1: WEATHER HAZARDS ===
weather_keywords = ['wet', 'rain', 'puddle', 'snow', 'water', 'flooded']
weather_evidence = []
for text, conf, bbox in caption_texts:
if any(kw in text for kw in weather_keywords) and conf >= 0.60:
weather_evidence.append({
'source_type': 'dense_caption',
'source_text': text,
'confidence': conf,
'bounding_box': bbox
})
if weather_evidence:
# Check package material vulnerability
is_porous = any('cardboard' in t[0] or 'carton' in t[0]
for t in tag_names if t[1] >= 0.70)
# Calculate proximity (are weather hazard and package overlapping?)
package_boxes = [c.get('boundingBox') for c in captions
if 'box' in c['text'].lower() and c.get('boundingBox')]
weather_boxes = [e['bounding_box'] for e in weather_evidence if e['bounding_box']]
proximity = calculate_proximity(package_boxes, weather_boxes)
if proximity == 'TOUCHING':
severity = 'CRITICAL'
deduction = 50 if is_porous else 35
elif proximity == 'ADJACENT':
severity = 'HIGH'
deduction = 35 if is_porous else 20
else:
severity = 'MEDIUM'
deduction = 15
score -= deduction
detected_risks.append(DetectedRisk(
risk_type='WEATHER_HAZARD',
severity=severity,
severity_score=deduction,
display_label='Weather Hazard',
explanation=f"{'Puddles/wet conditions' if 'puddle' in str(weather_evidence) else 'Water exposure'} "
f"detected {'in contact with' if proximity == 'TOUCHING' else 'near'} "
f"{'porous' if is_porous else ''} packaging.",
underlying_evidence=weather_evidence
))
# === RULE 2: ROAD/THEFT RISK ===
road_keywords = ['road', 'highway', 'street', 'curb', 'sidewalk', 'parking']
safe_keywords = ['porch', 'door', 'mat', 'garage', 'shelter', 'entrance', 'building']
road_evidence = []
safe_evidence = []
for text, conf, bbox in caption_texts:
if any(kw in text for kw in road_keywords) and conf >= 0.60:
road_evidence.append({
'source_type': 'dense_caption', 'source_text': text,
'confidence': conf, 'bounding_box': bbox
})
if any(kw in text for kw in safe_keywords) and conf >= 0.60:
safe_evidence.append({
'source_type': 'dense_caption', 'source_text': text,
'confidence': conf, 'bounding_box': bbox
})
# Add tag-based evidence (no bounding boxes)
for name, conf in tag_names:
if name in road_keywords and conf >= 0.70:
road_evidence.append({
'source_type': 'tag', 'source_text': name,
'confidence': conf, 'bounding_box': None
})
if road_evidence and not safe_evidence:
severity = 'HIGH' if len(road_evidence) >= 2 else 'MEDIUM'
deduction = 25 if severity == 'HIGH' else 15
score -= deduction
detected_risks.append(DetectedRisk(
risk_type='THEFT_RISK',
severity=severity,
severity_score=deduction,
display_label='Theft Risk',
explanation="Package left in publicly visible location near roadway.",
underlying_evidence=road_evidence
))
elif safe_evidence:
score += 10 # Bonus for safe location (capped at 100)
# === RULE 3: VEHICLE PROXIMITY (enhanced theft/damage risk) ===
vehicle_keywords = ['car', 'vehicle', 'truck', 'van']
vehicle_evidence = [
{'source_type': 'dense_caption', 'source_text': text,
'confidence': conf, 'bounding_box': bbox}
for text, conf, bbox in caption_texts
if any(kw in text for kw in vehicle_keywords) and conf >= 0.70
]
if vehicle_evidence:
deduction = 10
score -= deduction
# Append to existing THEFT_RISK or create new
existing = next((r for r in detected_risks if r.risk_type == 'THEFT_RISK'), None)
if existing:
existing.underlying_evidence.extend(vehicle_evidence)
existing.severity_score += deduction
else:
detected_risks.append(DetectedRisk(
risk_type='ROAD_PROXIMITY',
severity='LOW',
severity_score=deduction,
display_label='Road Proximity',
explanation="Vehicle detected near drop location.",
underlying_evidence=vehicle_evidence
))
return (max(0, min(100, score)), detected_risks)
def calculate_proximity(box_list_a: list, box_list_b: list) -> str:
"""Calculate spatial relationship between two sets of bounding boxes."""
for box_a in box_list_a:
for box_b in box_list_b:
if boxes_overlap(box_a, box_b):
return 'TOUCHING'
if boxes_adjacent(box_a, box_b, threshold=50):
return 'ADJACENT'
return 'DISTANT'Heuristics Table
|
Signal |
Source |
Confidence Threshold |
Score Impact |
Severity |
Badge |
|---|---|---|---|---|---|
|
puddle + cardboard box + TOUCHING |
dense_caption |
≥0.60 |
-50 |
CRITICAL |
🔴 Weather Hazard |
|
puddle + cardboard box + ADJACENT |
dense_caption |
≥0.60 |
-35 |
HIGH |
🔴 Weather Hazard |
|
wet/rain + any package |
dense_caption |
≥0.60 |
-15 |
MEDIUM |
🟡 Weather Hazard |
|
street/road + NO porch/door |
tag + caption |
≥0.60 (caption), ≥0.70 (tag) |
-25 |
HIGH |
🔴 Theft Risk |
|
car/vehicle near package |
dense_caption |
≥0.70 |
-10 |
LOW |
🟡 Road Proximity |
|
porch/door/mat detected |
dense_caption |
≥0.60 |
+10 |
— |
✅ Safe Location |
|
blurry image in caption |
dense_caption |
≥0.65 |
-20 (Proof Quality) |
— |
⚠️ Poor Image |
|
NO OCR text detected |
readResult |
— |
-15 (Proof Quality) |
— |
⚠️ Label Not Visible |
Warning vs. Failure Thresholds
|
Score Range |
Status |
UI Treatment |
|---|---|---|
|
70-100 |
✅ PASS |
Green badge, no modal trigger |
|
40-69 |
⚠️ WARNING |
Amber badge, "Review Needed" status |
|
0-39 |
❌ FAIL |
Red badge, immediate modal, high priority |
4. Architecture & Data Flow
Mermaid Diagram
View here: Online FlowChart & Diagrams Editor - Mermaid Live Editor
flowchart TB subgraph "1. Image Ingestion"
A[Driver Mobile App] -->|Upload VPOD Image| B[Azure Blob Storage]
B -->|Event Trigger| C[Azure Event Grid]
end
subgraph "2. AI Processing Pipeline"
C -->|Trigger| D[Azure Function: Vision Processor]
D -->|API Call| E[Azure AI Vision 4.0]
E -->|JSON Response| D
D -->|Enrich & Score| F[Decision Engine<br/>Serverless Function]
end
subgraph "3. Decision Engine"
F -->|Calculate| G[Proof Quality Score]
F -->|Calculate| H[Drop Safety Score]
F -->|Detect| I[Risk Evidence + Bounding Boxes]
G & H & I -->|Structure| J[AnalyzedDelivery Record]
end
subgraph "4. Data Layer"
J -->|Write| K[(PostgreSQL / Cosmos DB)]
K -->|Read| L[GraphQL API / REST API]
end
subgraph "5. Frontend Experience"
L -->|Fetch| M[Ops Vision UI]
M -->|Split Pane| N[Left: Raw Image]
M -->|Split Pane| O[Right: Risk Sidebar]
O -->|Hover Event| P[Highlight Bounding Box on N]
end
subgraph "6. RL Feedback Loop"
M -->|Dispatcher Action| Q{User Decision}
Q -->|Approve| R[HumanFeedback: OVERRIDE]
Q -->|Flag for Coaching| S[HumanFeedback: CONFIRMED]
Q -->|Customer Alert| T[Trigger Email + Log]
R & S -->|Write| K
R -->|Export| U[Training Data Pipeline]
U -->|Retrain| V[Custom Vision Model<br/>or Fine-tuned Prompt]
endComponent Responsibilities
|
Component |
Technology |
Responsibility |
|---|---|---|
|
Vision Processor |
Azure Function (Python) |
Receive blob trigger, call Azure AI Vision API, parse response |
|
Decision Engine |
Azure Function (Python) |
Apply heuristics, calculate scores, structure detected_risks with evidence |
|
Database |
PostgreSQL (Azure) or Cosmos DB |
Store AnalyzedDelivery, DetectedRisk, HumanFeedback |
|
API Layer |
FastAPI / GraphQL (Azure App Service) |
Expose delivery data to frontend, handle feedback writes |
|
Frontend |
React + TailwindCSS |
Split-pane UI, hover-to-discover, action buttons |
|
RL Training Pipeline |
Azure ML / Custom Python |
Export overridden records, retrain classification thresholds |
Reinforcement Learning Loop (Technical Flow)
1. User clicks "Approve Delivery" on a FLAGGED delivery
↓
2. Frontend POSTs to /api/feedback:
{
delivery_id: "uuid-123",
action: "APPROVE_DELIVERY",
correction_reason: "Shadow, not puddle"
}
↓
3. API writes HumanFeedback record:
- Snapshot original_risks and original_drop_score
- Set human_feedback_status = OVERRIDDEN on all DetectedRisk records
↓
4. Nightly batch job exports OVERRIDDEN records:
- Image URL + original Azure response + human correction
↓
5. Two retraining paths:
a) **Threshold Adjustment**: Lower confidence requirement for "puddle"
if 30%+ of puddle detections are overridden
b) **Custom Vision Model**: Fine-tune on corrected examples for
domain-specific hazards (e.g., "loading dock" vs "street")
↓
6. Deploy updated Decision Engine weights/thresholds5. Gap Analysis
❌ Unsupported by Current Notebook
|
Spec Requirement |
Gap |
Remediation |
|---|---|---|
|
Proximity detection ("puddle ADJACENT to box") |
Azure dense captions don't provide semantic relationships between objects |
Implement calculate_proximity() using bounding box overlap/distance in Decision Engine |
|
Wetness vs. Shadow distinction |
Azure cannot distinguish reflective surfaces (wet pavement vs. shadow) |
Requires Custom Vision model trained on logistics images, or LLM re-analysis of image |
|
Tags with bounding boxes |
tagsResult has NO bounding boxes (only name + confidence) |
Use denseCaptionsResult as primary for Hover-to-Discover; tags only for scoring |
|
Object Detection API |
Notebook uses features=denseCaptions,read,tags but NOT objects |
Add objects to API call for dedicated object detection with bounding boxes |
|
Blur/sharpness quantification |
No explicit blur metric—only "blurry image of..." in captions |
Implement client-side image analysis (Laplacian variance) OR use Azure imageQuality feature |
|
Material detection (porous vs. plastic) |
Tags include "cardboard" but not material properties |
Build keyword mapping: {cardboard, carton} → porous, {poly, plastic} → non-porous |
⚠️ Partially Supported
|
Spec Requirement |
Current State |
Enhancement Needed |
|---|---|---|
|
OCR for label visibility |
readResult provides text + polygons |
Add confidence threshold (≥0.80) and minimum text length check |
|
Risk explanation copy |
Notebook has basic findings text |
Enhance Decision Engine to generate spec-compliant copy: "Puddles and wet pavement detected adjacent to porous packaging." |
✅ Fully Supported
|
Spec Requirement |
Azure AI Field |
|---|---|
|
Dense scene descriptions |
denseCaptionsResult.values[].text |
|
Bounding box coordinates |
denseCaptionsResult.values[].boundingBox |
|
OCR text extraction |
readResult.blocks[].lines[].text |
|
Semantic tags |
tagsResult.values[].name |
|
Confidence filtering |
All results include confidence |
Summary
The Azure AI Vision API provides sufficient foundation for Ops Vision, but the Decision Engine must bridge significant gaps:
-
Proximity logic must be computed geometrically from bounding boxes
-
Tags lack spatial data—rely on dense captions for Hover-to-Discover
-
Consider adding objects feature to API call for better object detection
-
RL loop is well-defined but requires disciplined feedback storage and batch export
-
Material/wetness nuance is the hardest gap—may require custom model or LLM verification