[!FROZEN] MPLP Protocol v1.0.0 Frozen Specification Freeze Date: 2025-12-03 Status: FROZEN (no breaking changes permitted) Governance: MPLP Protocol Governance Committee (MPGC) License: Apache-2.0 Note: Any normative change requires a new protocol version.

Vendor-Neutral LLM Integration: Complete Guide

Related Golden Flows: flow-04-single-agent-llm-enrichment-single-agent-llm-enrichment

1. Design Philosophy

MPLP achieves vendor neutrality by defining an abstract LlmClient interface that isolates the protocol logic from specific LLM provider APIs. This design ensures:

Portability: The same agent code runs with OpenAI, Anthropic, local Llama models, or any OpenAI-compatible API.
Testability: Mock LLM clients can be injected for testing without external dependencies.
Cost Control: Easy to switch providers based on cost, latency, or feature requirements.

2. The `LlmClient` Interface

Package: @mplp/integration-llm-http

The interface abstracts all LLM interactions to a single method:

interface LlmClient {
  generate(request: LlmGenerationRequest): Promise<LlmGenerationResult>;
}

2.1. Request Structure

interface LlmGenerationRequest {
  model: string;                    // e.g., "gpt-4", "claude-3-opus", "llama-3-70b"
  input: string;                    // The prompt text
  temperature?: number;             // Sampling temperature (0.0 - 1.0)
  maxTokens?: number;               // Maximum tokens to generate
  stopSequences?: string[];         // Stop generation at these sequences
  additionalParams?: Record<string, any>;  // Provider-specific parameters
}

2.2. Result Structure

interface LlmGenerationResult {
  output: string;                   // The generated text
  finishReason: 'stop' | 'length' | 'error';
  usage?: {
    promptTokens: number;
    completionTokens: number;
    totalTokens: number;
  };
  raw?: any;                        // Provider-specific raw response (for debugging)
}

3. The `HttpLlmClient` Implementation

The HttpLlmClient is a reference implementation that works with any OpenAI-compatible HTTP endpoint.

3.1. Configuration

import { HttpLlmClient } from '@mplp/integration-llm-http';

const client = new HttpLlmClient({
  baseUrl: "https://api.openai.com/v1/chat/completions",
  defaultHeaders: {
    "Authorization": "Bearer sk-...",
    "Content-Type": "application/json"
  },
  timeout: 30000  // 30 seconds
}, fetch);

3.2. Usage Example

const request: LlmGenerationRequest = {
  model: "gpt-4",
  input: "Explain quantum computing in simple terms.",
  temperature: 0.7,
  maxTokens: 200
};

const result = await client.generate(request);
console.log(result.output);

3.3. Provider-Specific Configurations

OpenAI

const openai = new HttpLlmClient({
  baseUrl: "https://api.openai.com/v1/chat/completions",
  defaultHeaders: { "Authorization": "Bearer YOUR_API_KEY" }
}, fetch);

Anthropic Claude

const anthropic = new HttpLlmClient({
  baseUrl: "https://api.anthropic.com/v1/messages",
  defaultHeaders: {
    "x-api-key": "YOUR_API_KEY",
    "anthropic-version": "2023-06-01"
  }
}, fetch);

Local Llama (via llama.cpp server)

const llama = new HttpLlmClient({
  baseUrl: "http://localhost:8080/v1/chat/completions",
  defaultHeaders: {}
}, fetch);

Azure OpenAI

const azure = new HttpLlmClient({
  baseUrl: "https://YOUR_RESOURCE.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT/chat/completions?api-version=2023-05-15",
  defaultHeaders: { "api-key": "YOUR_AZURE_KEY" }
}, fetch);

4. Integration with MPLP Runtime

4.1. Injecting into Module Handlers

The LLM client is typically injected into the Plan Module Handler or other reasoning modules:

import { PlanModuleHandler } from '@mplp/coordination';
import { HttpLlmClient } from '@mplp/integration-llm-http';

const llmClient = new HttpLlmClient({ /* config */ }, fetch);

const planHandler: PlanModuleHandler = async (input) => {
  const context = input.coordination.context;

  // Construct prompt from the context
  const prompt = `
    Create a detailed plan to accomplish the following objective:
    Title: ${context.title}
    Domain: ${context.root.domain}

    Respond with a JSON array of steps.
  `;

  const result = await llmClient.generate({
    model: "gpt-4",
    input: prompt,
    temperature: 0.3
  });

  const stepsJson = JSON.parse(result.output);

  return {
    meta: { protocol_version: "1.0.0", schema_version: "1.0.0", created_at: new Date().toISOString() },
    plan_id: uuidv4(),
    context_id: context.context_id,
    title: `Plan for ${context.title}`,
    objective: context.title,
    status: "proposed",
    steps: stepsJson.map((s: any) => ({
      step_id: uuidv4(),
      description: s.description,
      status: "pending"
    }))
  };
};

4.2. Using with Action Execution Layer (AEL)

The AEL can also use the LLM client for tool use planning or code generation:

class LlmPoweredAEL implements ActionExecutionLayer {
  constructor(private llm: LlmClient) {}

  async execute(action: Action): Promise<ActionResult> {
    if (action.type === 'generate_code') {
      const result = await this.llm.generate({
        model: "gpt-4",
        input: `Generate Python code to: ${action.description}`
      });

      return { success: true, output: result.output };
    }
    // ... other action types
  }
}

5. Error Handling & Retries

5.1. Built-in Error Types

The HttpLlmClient throws typed errors:

try {
  const result = await client.generate(request);
} catch (error) {
  if (error instanceof LlmRateLimitError) {
    // Wait and retry
    await sleep(60000);
  } else if (error instanceof LlmAuthenticationError) {
    // Invalid API key
    console.error("Authentication failed");
  } else if (error instanceof LlmTimeoutError) {
    // Request took too long
  }
}

5.2. Retry Strategy

Implement exponential backoff for rate limits:

async function generateWithRetry(client: LlmClient, request: LlmGenerationRequest, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.generate(request);
    } catch (error) {
      if (error instanceof LlmRateLimitError && attempt < maxRetries - 1) {
        const delay = Math.pow(2, attempt) * 1000;  // 1s, 2s, 4s
        await sleep(delay);
      } else {
        throw error;
      }
    }
  }
}

6. Testing

6.1. Mock Client for Unit Tests

class MockLlmClient implements LlmClient {
  async generate(request: LlmGenerationRequest): Promise<LlmGenerationResult> {
    return {
      output: "Mock response",
      finishReason: 'stop',
      usage: { promptTokens: 10, completionTokens: 5, totalTokens: 15 }
    };
  }
}

// In tests
const mockClient = new MockLlmClient();
const handler = createPlanHandler(mockClient);

7. Cost Tracking

Track token usage across all LLM calls:

class TokenTracker {
  private totalTokens = 0;

  wrap(client: LlmClient): LlmClient {
    return {
      generate: async (req) => {
        const result = await client.generate(req);
        this.totalTokens += result.usage?.totalTokens || 0;
        return result;
      }
    };
  }

  getTotal() { return this.totalTokens; }
}

8. Best Practices

Use Environment Variables: Store API keys in .env files, never hardcode.
Set Timeouts: Always configure reasonable timeouts to prevent hanging.
Log Requests: Log prompts and responses for debugging and audit trails.
Validate Outputs: Always validate LLM JSON outputs before using them as Protocol Objects.
Fallback Models: If primary model fails, fallback to a cheaper/smaller model.

9. Current Implementation Status

As of P6:

LlmClient interface defined
HttpLlmClient implemented- Basic error handling
Advanced retry logic (manual implementation required)
Cost tracking (manual implementation required)
WebSocket streaming support (planned for P7)

2025 Bangshi Beijing Network Technology Limited Company Licensed under the Apache License, Version 2.0.

1. Design Philosophy​

2. The LlmClient Interface​

2.1. Request Structure​

2.2. Result Structure​

3. The HttpLlmClient Implementation​

3.1. Configuration​

3.2. Usage Example​

3.3. Provider-Specific Configurations​

OpenAI​

Anthropic Claude​

Local Llama (via llama.cpp server)​

Azure OpenAI​

4. Integration with MPLP Runtime​

4.1. Injecting into Module Handlers​

4.2. Using with Action Execution Layer (AEL)​

5. Error Handling & Retries​

5.1. Built-in Error Types​

5.2. Retry Strategy​

6. Testing​

6.1. Mock Client for Unit Tests​

7. Cost Tracking​

8. Best Practices​

9. Current Implementation Status​