Skip to content
English
  • There are no suggestions because the search field is empty.

7. System Architecture & Technical Considerations

System Architecture Analysis

1. Data Capability Mapping

UI Element

Azure AI Field

Structure

Hover-to-Discover Support

Proof Quality Score

denseCaptionsResult (blur detection), metadata (image dimensions)

Caption text + confidence

⚠️ Indirect - requires derived heuristics

Drop Safety Score

denseCaptionsResult + tagsResult

Combined analysis

✅ Dense captions have boundingBox

Risk Badges (Weather/Theft)

denseCaptionsResult.values[].text

{text, confidence, boundingBox: {x,y,w,h}}

✅ Yes - bounding box per caption

"Why" Engine Text

denseCaptionsResult.values[].text

Natural language descriptions

✅ Direct

Label Visibility Check

readResult.blocks[].lines[]

{text, boundingPolygon: [{x,y},...]}

✅ OCR polygons available

Visual Tags Toggle

tagsResult.values[]

{name, confidence}

No bounding boxes on tags

 

Hover-to-Discover Image Analysis Implementation

The critical linkage for "Hover-to-Discover" works as follows:

UI Risk Text ("Weather Hazard")
→ detected_risks[].risk_id
→ detected_risks[].underlying_evidence[].bounding_box
→ Frontend draws overlay on Left Pane image

Key insight: denseCaptionsResult provides bounding boxes ({x, y, w, h}) for each caption. The Decision Engine must:

  1. Parse caption text for risk keywords (e.g., "puddle", "wet pavement")

  2. Store the associated boundingBox as underlying_evidence

  3. Frontend retrieves bounding box by risk_id on hover


2. Data Model Design (Schema)

Prisma/SQL Schema

// Core Delivery Analysis Record
model AnalyzedDelivery {
id String @id @default(uuid())
shipment_id String @unique
image_url String
image_width Int
image_height Int

// === DUAL SCORE SYSTEM ===
proof_quality_score Int // 0-100: Is the photo good?
drop_safety_score Int // 0-100: Was the behavior safe?

// === STATUS FLAGS ===
review_status ReviewStatus @default(PENDING)
audit_outcome AuditOutcome?

// === RELATIONS ===
proof_quality ProofQualityCheck?
drop_safety DropSafetyCheck?
detected_risks DetectedRisk[]
human_feedback HumanFeedback[]

// === RAW DATA (for debugging/reprocessing) ===
raw_azure_response Json

created_at DateTime @default(now())
updated_at DateTime @updatedAt
reviewed_at DateTime?
reviewed_by String?
}

enum ReviewStatus {
PENDING // Awaiting dispatcher review
FLAGGED // AI flagged - needs human review
APPROVED // Human approved
COACHING_FLAGGED // Marked for driver coaching
}

enum AuditOutcome {
PASS
FAIL
OVERRIDE_APPROVED // AI was wrong (False Positive)
}

// === PROOF QUALITY LAYER ===
model ProofQualityCheck {
id String @id @default(uuid())
delivery_id String @unique
delivery AnalyzedDelivery @relation(fields: [delivery_id], references: [id])

// Checklist items (spec requirement)
is_image_sharp Boolean
is_label_visible Boolean
is_package_visible Boolean
has_adequate_lighting Boolean

blur_confidence Float? // From dense caption "blurry image of..."
ocr_text_detected String[] // From readResult
label_bounding_box Json? // {x, y, w, h} for label location
}

// === DROP SAFETY LAYER ===
model DropSafetyCheck {
id String @id @default(uuid())
delivery_id String @unique
delivery AnalyzedDelivery @relation(fields: [delivery_id], references: [id])

// Context indicators
is_residential Boolean // porch, door, mat detected
is_sheltered Boolean // garage, overhang detected
is_near_road Boolean // street, curb, vehicle detected
has_weather_exposure Boolean // wet, rain, puddle detected
has_theft_risk Boolean // public area, no concealment

// Package material vulnerability
package_type PackageType?
material_porous Boolean @default(true) // cardboard = porous
}

enum PackageType {
CARDBOARD
PLASTIC_BAG
POLY_MAILER
UNKNOWN
}

// === DETECTED RISKS (with hover-to-discover support) ===
model DetectedRisk {
id String @id @default(uuid())
delivery_id String
delivery AnalyzedDelivery @relation(fields: [delivery_id], references: [id])

risk_type RiskType
severity Severity
severity_score Int // 0-100 contribution to drop_safety deduction

// UI Display
display_label String // "Weather Hazard", "Theft Risk"
explanation_text String // "Puddles detected adjacent to porous packaging"

// === HOVER-TO-DISCOVER LINKAGE ===
underlying_evidence Json // Array of evidence sources
// Structure: [{
// source_type: "dense_caption" | "tag" | "ocr",
// source_text: "a puddle on the ground",
// confidence: 0.85,
// bounding_box: {x: 100, y: 200, w: 150, h: 80} // nullable for tags
// }]

// === RL FEEDBACK LOOP ===
human_feedback_status FeedbackStatus @default(UNREVIEWED)
}

enum RiskType {
WEATHER_HAZARD
THEFT_RISK
ROAD_PROXIMITY
POOR_PLACEMENT
PACKAGE_DAMAGE
}

enum Severity {
LOW
MEDIUM
HIGH
CRITICAL
}

enum FeedbackStatus {
UNREVIEWED
CONFIRMED // Human agrees with AI
OVERRIDDEN // Human disagrees (False Positive)
}

// === REINFORCEMENT LEARNING LOOP ===
model HumanFeedback {
id String @id @default(uuid())
delivery_id String
delivery AnalyzedDelivery @relation(fields: [delivery_id], references: [id])

action_taken FeedbackAction
dispatcher_id String

// What was the original AI assessment?
original_drop_score Int
original_risks Json // Snapshot of detected_risks at review time

// Correction details
correction_reason String? // Why did dispatcher override?

// Training data export
exported_for_training Boolean @default(false)

created_at DateTime @default(now())
}

enum FeedbackAction {
APPROVE_DELIVERY // Scenario B: AI was wrong
FLAG_FOR_COACHING // Scenario A: Driver messed up
CUSTOMER_ALERT // Scenario C: Notification sent
}

JSON Representation (API Response)

{
"delivery_id": "uuid-123",
"shipment_id": "SHP-456789",
"image_url": "https://storage.blob/customer_123.jpg",
"image_dimensions": { "width": 810, "height": 1080 },

"scores": {
"proof_quality": { "value": 78, "status": "PASS" },
"drop_safety": { "value": 45, "status": "FAIL" }
},

"proof_quality_details": {
"checklist": [
{ "item": "Image sharp", "passed": true },
{ "item": "Label visible", "passed": true },
{ "item": "Package visible", "passed": true },
{ "item": "Adequate lighting", "passed": false }
]
},

"detected_risks": [
{
"risk_id": "risk-001",
"risk_type": "WEATHER_HAZARD",
"severity": "HIGH",
"severity_score": 35,
"display_label": "Weather Hazard",
"explanation": "Puddles and wet pavement detected adjacent to porous packaging.",
"underlying_evidence": [
{
"source_type": "dense_caption",
"source_text": "a puddle on the ground near boxes",
"confidence": 0.82,
"bounding_box": { "x": 150, "y": 600, "w": 200, "h": 100 }
}
],
"human_feedback_status": "UNREVIEWED"
},
{
"risk_id": "risk-002",
"risk_type": "THEFT_RISK",
"severity": "MEDIUM",
"severity_score": 20,
"display_label": "Theft Risk",
"explanation": "Package left in publicly visible location near roadway.",
"underlying_evidence": [
{
"source_type": "tag",
"source_text": "street",
"confidence": 0.60,
"bounding_box": null
},
{
"source_type": "dense_caption",
"source_text": "a white car on the road",
"confidence": 0.77,
"bounding_box": { "x": 626, "y": 97, "w": 181, "h": 104 }
}
],
"human_feedback_status": "UNREVIEWED"
}
],

"review_status": "FLAGGED",
"audit_outcome": null
}

3. Decision Engine (Heuristics Construction)

Drop Safety Score Calculation (Pseudo-code)

def calculate_drop_safety_score(azure_result: dict) -> tuple[int, list[DetectedRisk]]:
"""
Calculate Drop Safety Score (0-100) where 100 = perfectly safe.
Returns (score, list of detected risks with evidence).
"""
score = 100 # Start with perfect score, deduct for risks
detected_risks = []

# Extract data sources
captions = azure_result.get('denseCaptionsResult', {}).get('values', [])
tags = azure_result.get('tagsResult', {}).get('values', [])

# Combine all text for keyword analysis
caption_texts = [(c['text'].lower(), c['confidence'], c.get('boundingBox'))
for c in captions]
tag_names = [(t['name'].lower(), t['confidence']) for t in tags]

# === RULE 1: WEATHER HAZARDS ===
weather_keywords = ['wet', 'rain', 'puddle', 'snow', 'water', 'flooded']
weather_evidence = []

for text, conf, bbox in caption_texts:
if any(kw in text for kw in weather_keywords) and conf >= 0.60:
weather_evidence.append({
'source_type': 'dense_caption',
'source_text': text,
'confidence': conf,
'bounding_box': bbox
})

if weather_evidence:
# Check package material vulnerability
is_porous = any('cardboard' in t[0] or 'carton' in t[0]
for t in tag_names if t[1] >= 0.70)

# Calculate proximity (are weather hazard and package overlapping?)
package_boxes = [c.get('boundingBox') for c in captions
if 'box' in c['text'].lower() and c.get('boundingBox')]
weather_boxes = [e['bounding_box'] for e in weather_evidence if e['bounding_box']]

proximity = calculate_proximity(package_boxes, weather_boxes)

if proximity == 'TOUCHING':
severity = 'CRITICAL'
deduction = 50 if is_porous else 35
elif proximity == 'ADJACENT':
severity = 'HIGH'
deduction = 35 if is_porous else 20
else:
severity = 'MEDIUM'
deduction = 15

score -= deduction
detected_risks.append(DetectedRisk(
risk_type='WEATHER_HAZARD',
severity=severity,
severity_score=deduction,
display_label='Weather Hazard',
explanation=f"{'Puddles/wet conditions' if 'puddle' in str(weather_evidence) else 'Water exposure'} "
f"detected {'in contact with' if proximity == 'TOUCHING' else 'near'} "
f"{'porous' if is_porous else ''} packaging.",
underlying_evidence=weather_evidence
))

# === RULE 2: ROAD/THEFT RISK ===
road_keywords = ['road', 'highway', 'street', 'curb', 'sidewalk', 'parking']
safe_keywords = ['porch', 'door', 'mat', 'garage', 'shelter', 'entrance', 'building']

road_evidence = []
safe_evidence = []

for text, conf, bbox in caption_texts:
if any(kw in text for kw in road_keywords) and conf >= 0.60:
road_evidence.append({
'source_type': 'dense_caption', 'source_text': text,
'confidence': conf, 'bounding_box': bbox
})
if any(kw in text for kw in safe_keywords) and conf >= 0.60:
safe_evidence.append({
'source_type': 'dense_caption', 'source_text': text,
'confidence': conf, 'bounding_box': bbox
})

# Add tag-based evidence (no bounding boxes)
for name, conf in tag_names:
if name in road_keywords and conf >= 0.70:
road_evidence.append({
'source_type': 'tag', 'source_text': name,
'confidence': conf, 'bounding_box': None
})

if road_evidence and not safe_evidence:
severity = 'HIGH' if len(road_evidence) >= 2 else 'MEDIUM'
deduction = 25 if severity == 'HIGH' else 15
score -= deduction
detected_risks.append(DetectedRisk(
risk_type='THEFT_RISK',
severity=severity,
severity_score=deduction,
display_label='Theft Risk',
explanation="Package left in publicly visible location near roadway.",
underlying_evidence=road_evidence
))
elif safe_evidence:
score += 10 # Bonus for safe location (capped at 100)

# === RULE 3: VEHICLE PROXIMITY (enhanced theft/damage risk) ===
vehicle_keywords = ['car', 'vehicle', 'truck', 'van']
vehicle_evidence = [
{'source_type': 'dense_caption', 'source_text': text,
'confidence': conf, 'bounding_box': bbox}
for text, conf, bbox in caption_texts
if any(kw in text for kw in vehicle_keywords) and conf >= 0.70
]

if vehicle_evidence:
deduction = 10
score -= deduction
# Append to existing THEFT_RISK or create new
existing = next((r for r in detected_risks if r.risk_type == 'THEFT_RISK'), None)
if existing:
existing.underlying_evidence.extend(vehicle_evidence)
existing.severity_score += deduction
else:
detected_risks.append(DetectedRisk(
risk_type='ROAD_PROXIMITY',
severity='LOW',
severity_score=deduction,
display_label='Road Proximity',
explanation="Vehicle detected near drop location.",
underlying_evidence=vehicle_evidence
))

return (max(0, min(100, score)), detected_risks)


def calculate_proximity(box_list_a: list, box_list_b: list) -> str:
"""Calculate spatial relationship between two sets of bounding boxes."""
for box_a in box_list_a:
for box_b in box_list_b:
if boxes_overlap(box_a, box_b):
return 'TOUCHING'
if boxes_adjacent(box_a, box_b, threshold=50):
return 'ADJACENT'
return 'DISTANT'

 

Heuristics Table

Signal

Source

Confidence Threshold

Score Impact

Severity

Badge

puddle + cardboard box + TOUCHING

dense_caption

≥0.60

-50

CRITICAL

🔴 Weather Hazard

puddle + cardboard box + ADJACENT

dense_caption

≥0.60

-35

HIGH

🔴 Weather Hazard

wet/rain + any package

dense_caption

≥0.60

-15

MEDIUM

🟡 Weather Hazard

street/road + NO porch/door

tag + caption

≥0.60 (caption), ≥0.70 (tag)

-25

HIGH

🔴 Theft Risk

car/vehicle near package

dense_caption

≥0.70

-10

LOW

🟡 Road Proximity

porch/door/mat detected

dense_caption

≥0.60

+10

✅ Safe Location

blurry image in caption

dense_caption

≥0.65

-20 (Proof Quality)

⚠️ Poor Image

NO OCR text detected

readResult

-15 (Proof Quality)

⚠️ Label Not Visible

 

Warning vs. Failure Thresholds

 

Score Range

Status

UI Treatment

70-100

✅ PASS

Green badge, no modal trigger

40-69

⚠️ WARNING

Amber badge, "Review Needed" status

0-39

❌ FAIL

Red badge, immediate modal, high priority

 

4. Architecture & Data Flow

Mermaid Diagram

View here: Online FlowChart & Diagrams Editor - Mermaid Live Editor

flowchart TB
subgraph "1. Image Ingestion"
A[Driver Mobile App] -->|Upload VPOD Image| B[Azure Blob Storage]
B -->|Event Trigger| C[Azure Event Grid]
end

subgraph "2. AI Processing Pipeline"
C -->|Trigger| D[Azure Function: Vision Processor]
D -->|API Call| E[Azure AI Vision 4.0]
E -->|JSON Response| D
D -->|Enrich & Score| F[Decision Engine<br/>Serverless Function]
end

subgraph "3. Decision Engine"
F -->|Calculate| G[Proof Quality Score]
F -->|Calculate| H[Drop Safety Score]
F -->|Detect| I[Risk Evidence + Bounding Boxes]
G & H & I -->|Structure| J[AnalyzedDelivery Record]
end

subgraph "4. Data Layer"
J -->|Write| K[(PostgreSQL / Cosmos DB)]
K -->|Read| L[GraphQL API / REST API]
end

subgraph "5. Frontend Experience"
L -->|Fetch| M[Ops Vision UI]
M -->|Split Pane| N[Left: Raw Image]
M -->|Split Pane| O[Right: Risk Sidebar]
O -->|Hover Event| P[Highlight Bounding Box on N]
end

subgraph "6. RL Feedback Loop"
M -->|Dispatcher Action| Q{User Decision}
Q -->|Approve| R[HumanFeedback: OVERRIDE]
Q -->|Flag for Coaching| S[HumanFeedback: CONFIRMED]
Q -->|Customer Alert| T[Trigger Email + Log]
R & S -->|Write| K
R -->|Export| U[Training Data Pipeline]
U -->|Retrain| V[Custom Vision Model<br/>or Fine-tuned Prompt]
end

 

Component Responsibilities

Component

Technology

Responsibility

Vision Processor

Azure Function (Python)

Receive blob trigger, call Azure AI Vision API, parse response

Decision Engine

Azure Function (Python)

Apply heuristics, calculate scores, structure detected_risks with evidence

Database

PostgreSQL (Azure) or Cosmos DB

Store AnalyzedDelivery, DetectedRisk, HumanFeedback

API Layer

FastAPI / GraphQL (Azure App Service)

Expose delivery data to frontend, handle feedback writes

Frontend

React + TailwindCSS

Split-pane UI, hover-to-discover, action buttons

RL Training Pipeline

Azure ML / Custom Python

Export overridden records, retrain classification thresholds

 

Reinforcement Learning Loop (Technical Flow)

 
 
1. User clicks "Approve Delivery" on a FLAGGED delivery

2. Frontend POSTs to /api/feedback:
{
delivery_id: "uuid-123",
action: "APPROVE_DELIVERY",
correction_reason: "Shadow, not puddle"
}

3. API writes HumanFeedback record:
- Snapshot original_risks and original_drop_score
- Set human_feedback_status = OVERRIDDEN on all DetectedRisk records

4. Nightly batch job exports OVERRIDDEN records:
- Image URL + original Azure response + human correction

5. Two retraining paths:
a) **Threshold Adjustment**: Lower confidence requirement for "puddle"
if 30%+ of puddle detections are overridden
b) **Custom Vision Model**: Fine-tune on corrected examples for
domain-specific hazards (e.g., "loading dock" vs "street")

6. Deploy updated Decision Engine weights/thresholds

5. Gap Analysis

❌ Unsupported by Current Notebook

Spec Requirement

Gap

Remediation

Proximity detection ("puddle ADJACENT to box")

Azure dense captions don't provide semantic relationships between objects

Implement calculate_proximity() using bounding box overlap/distance in Decision Engine

Wetness vs. Shadow distinction

Azure cannot distinguish reflective surfaces (wet pavement vs. shadow)

Requires Custom Vision model trained on logistics images, or LLM re-analysis of image

Tags with bounding boxes

tagsResult has NO bounding boxes (only name + confidence)

Use denseCaptionsResult as primary for Hover-to-Discover; tags only for scoring

Object Detection API

Notebook uses features=denseCaptions,read,tags but NOT objects

Add objects to API call for dedicated object detection with bounding boxes

Blur/sharpness quantification

No explicit blur metric—only "blurry image of..." in captions

Implement client-side image analysis (Laplacian variance) OR use Azure imageQuality feature

Material detection (porous vs. plastic)

Tags include "cardboard" but not material properties

Build keyword mapping: {cardboard, carton} → porous, {poly, plastic} → non-porous

 

⚠️ Partially Supported

Spec Requirement

Current State

Enhancement Needed

OCR for label visibility

readResult provides text + polygons

Add confidence threshold (≥0.80) and minimum text length check

Risk explanation copy

Notebook has basic findings text

Enhance Decision Engine to generate spec-compliant copy: "Puddles and wet pavement detected adjacent to porous packaging."

 

✅ Fully Supported

Spec Requirement

Azure AI Field

Dense scene descriptions

denseCaptionsResult.values[].text

Bounding box coordinates

denseCaptionsResult.values[].boundingBox

OCR text extraction

readResult.blocks[].lines[].text

Semantic tags

tagsResult.values[].name

Confidence filtering

All results include confidence

 

Summary

The Azure AI Vision API provides sufficient foundation for Ops Vision, but the Decision Engine must bridge significant gaps:

  1. Proximity logic must be computed geometrically from bounding boxes

  2. Tags lack spatial data—rely on dense captions for Hover-to-Discover

  3. Consider adding objects feature to API call for better object detection

  4. RL loop is well-defined but requires disciplined feedback storage and batch export

  5. Material/wetness nuance is the hardest gap—may require custom model or LLM verification