Skip to main content

Shared Responsibility & Accountability Model

1. Introduction: Trust Through Clarity

In an enterprise platform like XOPS, trust is built on clear expectations. While we strive for perfection in our technology, we operate within a complex ecosystem of customer data, third-party integrations, and external services. It is critical for our Customer Success, Support, and Engineering teams to understand where our accountability ends and the customer's (or their vendor's) begins.

This document defines our Shared Responsibility Model. It guides how we triage issues, how we communicate with customers, and how we defend the integrity of our platform without accepting undue blame for external failures.

Core Principle: We are accountable for the functioning of the platform. The customer is accountable for the quality of the inputs and the health of their external dependencies.


2. The Accountability Matrix

DomainXOPS Responsibility (Our Accountability)Customer Responsibility (Your Accountability)
Knowledge GraphEnsuring the ingestion pipeline functions, schemas are enforced, and the graph database is available and performant.Ensuring source data accuracy, consistency, and cleanliness ("Garbage In, Garbage Out").
IntegrationsMaintaining the integration code, handling API errors gracefully, and keeping up with vendor API changes.Providing valid API credentials/keys and maintaining valid contracts with third-party vendors (e.g., Twilio, Blue Dart).
Autonomous EngineExecuting logic correctly based on received telemetry.Ensuring the telemetry sent by their vendors is accurate and timely.
SecuritySecuring the platform infrastructure, code, and data at rest/transit.Managing their user access, role assignments, and API token secrecy.

3. Scenarios and Playbooks

Scenario A: Data Quality Corruption in Knowledge Graph

  • The Symptom: A customer reports that the "Knowledge Graph is broken" because query results show incorrect entity relationships (e.g., a "Parcel" is linked to the wrong "Customer").
  • The Triage:
    1. Check Ingestion Health: Did our ingestion pipeline throw errors? If yes, it's our defect.
    2. Inspect Source Payload: Look at the raw data received from the source system (Bronze Layer). Did the source send the wrong link?
  • The Response (If Source is Bad):
    • Do NOT: Create a P1 defect for Engineering. Do not accept blame.
    • DO: Show the customer the raw log entry proving the bad data came from their source. "Our platform correctly processed the data we received. Please check your source system configuration."
    • Outcome: Ticket closed as "External Data Issue".

Scenario B: Integration Failure due to Invalid Credentials

  • The Symptom: An integration (e.g., ServiceNow) stops working. Dashboards show Red.
  • The Triage:
    1. Check Error Logs: Look for 401 Unauthorized or 403 Forbidden errors from the vendor API.
  • The Response:
    • Do NOT: Treat this as a platform outage.
    • DO: Proactively notify the customer (potentially via Sparky). "We are receiving authentication errors from ServiceNow. It appears your API key has expired or changed. Please update it in the Control Center."
    • Outcome: Ticket waits on Customer Action.

Scenario C: Bad Telemetry from Third-Party (e.g., Twilio, Blue Dart)

  • The Symptom: The Autonomous Engine failed to trigger a "Delivery Delay" workflow because Blue Dart sent the "Delivered" event prematurely or with a wrong timestamp.
  • The Triage:
    1. Verify Logic: Did the Engine execute the rule correctly based on the timestamp it received?
    2. Isolate Vendor: Confirm the incoming payload has the bad data.
  • The Response:
    • Do NOT: Attempt to debug Blue Dart's API. Do not contact Blue Dart on the customer's behalf (we have no contract with them).
    • DO: Provide the customer with the trace ID and payload. "The engine acted correctly based on the 'Delivered' event received at 10:00 AM. Please contact your Blue Dart account manager to investigate why this event was sent prematurely."
    • Outcome: Defend the platform logic. Redirect the customer to their vendor.

4. Support Team Guidelines: "Defend & Guide"

Our support team is trained to Defend the platform's integrity while Guiding the customer to a solution.

  1. Don't Rush to "Defect": Never label a ticket as a "Platform Bug" until you have ruled out configuration and data issues.
  2. Evidence is Key: Always provide logs or traces (from New Relic/Sentry) that show the input causing the issue.
  3. Boundary Management: Politely but firmly decline requests to debug external systems we do not own. "We cannot see inside your Twilio account, but here is the error Twilio sent us."

5. Engineering Implications

  • Defensive Coding: Our systems must be resilient to bad inputs. We must validate data at the door.
  • Clear Error Messages: Error messages must clearly distinguish between "Internal Platform Error" (500) and "Bad Request/Upstream Error" (400/502).
  • Observability: We must log raw payloads at the Bronze layer to prove "what we received" in case of disputes.