mirror of https://github.com/amithkoujalgi/ollama4j.git synced 2025-10-14 01:18:58 +02:00

Update documentation and refactor code to replace OllamaAPI with Ollama

- Replaced all instances of `OllamaAPI` with `Ollama` in documentation and code examples for consistency.
- Enhanced the configuration for handling broken markdown links in Docusaurus.
- Updated integration tests and example code snippets to reflect the new class structure.

2025-09-29 09:31:32 +05:30

5.9 KiB

Raw Blame History

Prometheus Metrics Integration

Ollama4j now includes comprehensive Prometheus metrics collection to help you monitor and observe your Ollama API usage. This feature allows you to track request counts, response times, model usage, and other operational metrics.

Features

The metrics integration provides the following metrics:

Request Metrics: Total requests, duration histograms, and response time summaries by endpoint
Model Usage: Model-specific usage statistics and response times
Token Generation: Token count tracking per model
Error Tracking: Error counts by type and endpoint
Active Connections: Current number of active API connections

Quick Start

1. Enable Metrics Collection

import io.github.ollama4j.Ollama;

// Create API instance with metrics enabled
Ollama ollama = new Ollama();
ollamaAPI.

setMetricsEnabled(true);

2. Start Metrics Server

import io.prometheus.client.exporter.HTTPServer;

// Start Prometheus metrics HTTP server on port 8080
HTTPServer metricsServer = new HTTPServer(8080);
System.out.println("Metrics available at: http://localhost:8080/metrics");

3. Use the API (Metrics are automatically collected)

// All API calls are automatically instrumented
boolean isReachable = ollama.ping();

Map<String, Object> format = new HashMap<>();
format.put("type", "json");
OllamaResult result = ollama.generateWithFormat(
    "llama2",
    "Generate a JSON object",
    format
);

Available Metrics

Request Metrics

ollama_api_requests_total - Total number of API requests by endpoint, method, and status
ollama_api_request_duration_seconds - Request duration histogram by endpoint and method
ollama_api_response_time_seconds - Response time summary with percentiles

Model Metrics

ollama_model_usage_total - Model usage count by model name and operation
ollama_model_response_time_seconds - Model response time histogram
ollama_tokens_generated_total - Total tokens generated by model

System Metrics

ollama_api_active_connections - Current number of active connections
ollama_api_errors_total - Error count by endpoint and error type

Example Metrics Output

# HELP ollama_api_requests_total Total number of Ollama API requests
# TYPE ollama_api_requests_total counter
ollama_api_requests_total{endpoint="/api/generate",method="POST",status="success"} 5.0
ollama_api_requests_total{endpoint="/api/embed",method="POST",status="success"} 3.0

# HELP ollama_api_request_duration_seconds Duration of Ollama API requests in seconds
# TYPE ollama_api_request_duration_seconds histogram
ollama_api_request_duration_seconds_bucket{endpoint="/api/generate",method="POST",le="0.1"} 0.0
ollama_api_request_duration_seconds_bucket{endpoint="/api/generate",method="POST",le="0.5"} 2.0
ollama_api_request_duration_seconds_bucket{endpoint="/api/generate",method="POST",le="1.0"} 4.0
ollama_api_request_duration_seconds_bucket{endpoint="/api/generate",method="POST",le="+Inf"} 5.0
ollama_api_request_duration_seconds_sum{endpoint="/api/generate",method="POST"} 2.5
ollama_api_request_duration_seconds_count{endpoint="/api/generate",method="POST"} 5.0

# HELP ollama_model_usage_total Total number of model usage requests
# TYPE ollama_model_usage_total counter
ollama_model_usage_total{model_name="llama2",operation="generate_with_format"} 5.0
ollama_model_usage_total{model_name="llama2",operation="embed"} 3.0

# HELP ollama_tokens_generated_total Total number of tokens generated
# TYPE ollama_tokens_generated_total counter
ollama_tokens_generated_total{model_name="llama2"} 150.0

Configuration

Enable/Disable Metrics

OllamaAPI ollama = new OllamaAPI();

// Enable metrics collection
ollama.setMetricsEnabled(true);

// Disable metrics collection (default)
ollama.setMetricsEnabled(false);

Custom Metrics Server

import io.prometheus.client.exporter.HTTPServer;

// Start on custom port
HTTPServer metricsServer = new HTTPServer(9090);

// Start on custom host and port
HTTPServer metricsServer = new HTTPServer("0.0.0.0", 9090);

Integration with Prometheus

Prometheus Configuration

Add this to your prometheus.yml:

scrape_configs:
  - job_name: 'ollama4j'
    static_configs:
      - targets: ['localhost:8080']
    scrape_interval: 15s

Grafana Dashboards

You can create Grafana dashboards using the metrics. Some useful queries:

Request Rate: rate(ollama_api_requests_total[5m])
Average Response Time: rate(ollama_api_request_duration_seconds_sum[5m]) / rate(ollama_api_request_duration_seconds_count[5m])
Error Rate: rate(ollama_api_requests_total{status="error"}[5m]) / rate(ollama_api_requests_total[5m])
Model Usage: rate(ollama_model_usage_total[5m])
Token Generation Rate: rate(ollama_tokens_generated_total[5m])

Performance Considerations

Metrics collection adds minimal overhead (~1-2% in most cases)
Metrics are collected asynchronously and don't block API calls
You can disable metrics in production if needed: ollama.setMetricsEnabled(false)
The metrics server uses minimal resources

Troubleshooting

Metrics Not Appearing

Ensure metrics are enabled: ollama.setMetricsEnabled(true)
Check that the metrics server is running: http://localhost:8080/metrics
Verify API calls are being made (metrics only appear after API usage)

High Memory Usage

Metrics accumulate over time. Consider restarting your application periodically
Use Prometheus to scrape metrics regularly to avoid accumulation

Custom Metrics

You can extend the metrics by accessing the Prometheus registry directly:

import io.prometheus.client.CollectorRegistry;
import io.prometheus.client.Counter;

// Create custom metrics
Counter customCounter = Counter.build()
    .name("my_custom_metric_total")
    .help("My custom metric")
    .register();

// Use the metric
customCounter.inc();

5.9 KiB Raw Blame History