Reader

The landscape of generative AI development is evolving rapidly but comes with significant challenges. API usage costs can quickly add up, especially during development. Privacy concerns arise when sensitive data must be sent to external services. And relying on external APIs can introduce connectivity issues and latency.

Enter Gemma 3 and Docker Model Runner, a powerful combination that brings state-of-the-art language models to your local environment, addressing these challenges head-on.

In this blog post, we’ll explore how to run Gemma 3 locally using Docker Model Runner. We’ll also walk through a practical case study: a Comment Processing System that analyzes user feedback about a fictional AI assistant named Jarvis.

The power of local GenAI development

Before diving into the implementation, let’s look at why local GenAI development is becoming increasingly important:

Cost efficiency: With no per-token or per-request charges, you can experiment freely without worrying about usage fees.
Data privacy: Sensitive data stays within your environment, with no third-party exposure.
Reduced network latency: Eliminates reliance on external APIs and enables offline use.
Full control: Run the model on your terms, with no intermediaries and full transparency.

Setting up Docker Model Runner with Gemma 3

Docker Model Runner provides an OpenAI-compatible API interface to run models locally.
It is included in Docker Desktop for macOS, starting with version 4.40.0.

Here’s how to set it up with Gemma 3:

docker desktop enable model-runner --tcp 12434
docker model pull ai/gemma3

Once setup is complete, the OpenAI-compatible API provided by the Model Runner is available at: http://localhost:12434/engines/v1

Case study: Comment processing system

To demonstrate the power of local GenAI development, we’ve built a Comment Processing System that leverages Gemma 3 for multiple NLP tasks. This system:

Generates synthetic user comments about a fictional AI assistant
Categorizes comments as positive, negative, or neutral
Clusters similar comments together using embeddings
Identifies potential product features from the comments
Generates contextually appropriate responses

All tasks are performed locally with no external API calls.

Implementation details

Configuring the OpenAI SDK to use local models

To make this work, we configure the OpenAI SDK to point to the Docker Model Runner:

// config.js

export default {
  // Model configuration
  openai: {
    baseURL: "http://localhost:12434/engines/v1", // Base URL for Docker Model Runner
    apiKey: 'ignored',
    model: "ai/gemma3",
    commentGeneration: { // Each task has its own configuration, for example temperature is set to a high value when generating comments for creativity
      temperature: 0.3, 
      max_tokens: 250,
      n: 1,
    },
    embedding: {
      model: "ai/mxbai-embed-large", // Model for generating embeddings
    },
  },
  // ... other configuration options
};

import OpenAI from 'openai';
import config from './config.js';

// Initialize OpenAI client with local endpoint
const client = new OpenAI({
  baseURL: config.openai.baseURL,
  apiKey: config.openai.apiKey,
});

Task-specific configuration

One key benefit of running models locally is the ability to experiment freely with different configurations for each task without worrying about API costs or rate limits.

In our case:

Synthetic comment generation uses a higher temperature for creativity.
Categorization uses a lower temperature and a 10-token limit for consistency.
Clustering allows up to 20 tokens to improve semantic richness in embeddings.

This flexibility lets us iterate quickly, tune for performance, and tailor the model’s behavior to each use case.

Generating synthetic comments

To simulate user feedback, we use Gemma 3’s ability to follow detailed, context-aware prompts.

/**
 * Create a prompt for comment generation
 * @param {string} type - Type of comment (positive, negative, neutral)
 * @param {string} topic - Topic of the comment
 * @returns {string} - Prompt for OpenAI
 */
function createPromptForCommentGeneration(type, topic) {
  let sentiment = '';
  
  switch (type) {
    case 'positive':
      sentiment = 'positive and appreciative';
      break;
    case 'negative':
      sentiment = 'negative and critical';
      break;
    case 'neutral':
      sentiment = 'neutral and balanced';
      break;
    default:
      sentiment = 'general';
  }
  
  return `Generate a realistic ${sentiment} user comment about an AI assistant called Jarvis, focusing on its ${topic}.
  
The comment should sound natural, as if written by a real user who has been using Jarvis.
Keep the comment concise (1-3 sentences) and focused on the specific topic.
Do not include ratings (like "5/5 stars") or formatting.
Just return the comment text without any additional context or explanation.`;
}

Examples:

"Honestly, Jarvis is just a lot of empty promises. It keeps suggesting irrelevant articles and failing to actually understand my requests for help with my work – it’s not helpful at all."

"Jarvis is seriously impressive – the speed at which it responds is incredible! I’ve never used an AI assistant that’s so quick and efficient, it’s a game changer.

The ability to produce realistic feedback on demand is incredibly useful for simulating user data with zero API cost.

Generating contextual responses

We also use Gemma 3 to simulate polite, on-brand support responses to user comments. Here’s the prompt logic:

const response = await client.chat.completions.create({
    model: config.openai.model,
    messages: [
      {
        role: "system",
        content: `You are a customer support representative for an AI assistant called Jarvis. Your task is to generate polite, helpful responses to user comments.

Guidelines for responses:
1. Be empathetic and acknowledge the user's feedback
2. Thank the user for their input
3. If the comment is positive, express appreciation
4. If the comment is negative, apologize for the inconvenience and assure them you're working on improvements
5. If the comment is neutral, acknowledge their observation
6. If relevant, mention that their feedback will be considered for future updates
7. Keep responses concise (2-4 sentences) and professional
8. Do not make specific promises about feature implementation or timelines
9. Sign the response as "The Jarvis Team"`
      },
      {
        role: "user",
        content: `User comment: "${comment.text}"
Comment category: ${comment.category || 'unknown'}

${featuresContext}

Generate a polite, helpful response to this user comment.`
      }
    ],
    temperature: 0.7,
    max_tokens: 200
  });

Examples:

For a positive comment:

Thank you so much for your positive feedback regarding Jarvis’s interface! We’re thrilled to hear you find it clean and intuitive – that’s exactly what we’re aiming for. We appreciate you pointing out your desire for more visual customization options, and your feedback will definitely be considered as we continue to develop Jarvis.

The Jarvis Team

For a negative comment:

Thank you for your feedback – we appreciate you taking the time to share your observations about Jarvis. We sincerely apologize for the glitches and freezes you’ve experienced; we understand how frustrating that can be. Your input is valuable, and we’re actively working on improvements to enhance Jarvis’s reliability and accuracy. 

The Jarvis Team

This approach ensures a consistent, human-like support experience generated entirely locally.

Extracting product features from user feedback

Beyond generating and responding to comments, we also use Gemma 3 to analyze user feedback and identify actionable insights. This helps simulate the role of a product analyst, surfacing recurring themes, user pain points, and opportunities for improvement.

Here, we provide a prompt instructing the model to identify up to three potential features or improvements based on a set of user comments.

/**
 * Extract features from comments
 * @param {string} commentsText - Text of comments
 * @returns {Promise<Array>} - Array of identified features
 */
async function extractFeaturesFromComments(commentsText) {
  const response = await client.chat.completions.create({
    model: config.openai.model,
    messages: [
      {
        role: "system",
        content: `You are a product analyst for an AI assistant called Jarvis. Your task is to identify potential product features or improvements based on user comments.
        
For each set of comments, identify up to 3 potential features or improvements that could address the user feedback.

For each feature, provide:
1. A short name (2-5 words)
2. A brief description (1-2 sentences)
3. The type of feature (New Feature, Improvement, Bug Fix)
4. Priority (High, Medium, Low)

Format your response as a JSON array of features, with each feature having the fields: name, description, type, and priority.`
      },
      {
        role: "user",
        content: `Here are some user comments about Jarvis. Identify potential features or improvements based on these comments:

${commentsText}`
      }
    ],
    response_format: { type: "json_object" },
    temperature: 0.5
  });
  
  try {
    const result = JSON.parse(response.choices[0].message.content);
    return result.features || [];
  } catch (error) {
    console.error('Error parsing feature identification response:', error);
    return [];
  }
}

Here’s an example of what the model might return:

"features": [
    {
      "name": "Enhanced Visual Customization",
      "description": "Allows users to personalize the Jarvis interface with more themes, icon styles, and display options to improve visual appeal and user preference.",
      "type": "Improvement",
      "priority": "Medium",
      "clusters": [
        "1"
      ]
    },

And just like everything else in this project, it’s generated locally with no external services.

Conclusion

By combining Gemma 3 with Docker Model Runner, we’ve unlocked a local GenAI workflow that’s fast, private, cost-effective, and fully under our control. In building our Comment Processing System, we experienced firsthand the benefits of this approach:

Rapid iteration without worrying about API costs or rate limits
Flexibility to test different configurations for each task
Offline development with no dependency on external services
Significant cost savings during development

And this is just one example of what’s possible. Whether you’re prototyping a new AI product, building internal tools, or exploring advanced NLP use cases, running models locally puts you in the driver’s seat.

As open-source models and local tooling continue to evolve, the barrier to entry for building powerful AI systems keeps getting lower.

Don’t just consume AI; develop, shape, and own the process.

Try it yourself: clone the repository and start experimenting today.