Ollama で RAG を設計する：埋め込みモデル選定・再ランキング・出典表示の定石

2025年11月9日

Ollama

Ollama で RAG を設計する：埋め込みモデル選定・再ランキング・出典表示の定石

Ollama を使えば、ローカル環境で手軽に LLM を動かせますが、RAG（Retrieval-Augmented Generation）システムを本格的に構築するには、埋め込みモデルの選定、再ランキング、出典表示など、押さえるべきポイントがいくつもあります。

本記事では、Ollama を活用した RAG システムの設計において、埋め込みモデルの選び方、再ランキングによる精度向上、出典表示の実装方法という 3 つの重要な要素に焦点を絞って解説します。これから RAG システムを構築される方や、既存システムの精度向上を目指す方にとって、実践的な指針となるでしょう。

背景

RAG システムとは

RAG（Retrieval-Augmented Generation）は、外部知識を検索して LLM の回答精度を高める手法です。LLM 単体では最新情報や企業固有の知識に対応できませんが、RAG を組み合わせることで、ドキュメントやデータベースから関連情報を取得し、それを元に回答を生成できるようになります。

以下の図は、RAG システムの基本的なワークフローを示しています。

mermaidflowchart TB
  user["ユーザー"] -->|質問| query["クエリ処理"]
  query -->|埋め込み変換| embed["埋め込みベクトル"]
  embed -->|類似検索| vectordb[("ベクトルDB")]
  vectordb -->|候補文書| rerank["再ランキング"]
  rerank -->|上位文書| llm["LLM<br/>(Ollama)"]
  llm -->|回答+出典| user

図のポイント

ユーザーの質問は埋め込みベクトルに変換され、ベクトル DB で類似検索が行われます
検索結果は再ランキングで精度を高め、最終的に LLM へ渡されます
LLM は文書を参照しながら回答を生成し、出典情報も返します

Ollama の特徴

Ollama は、ローカル環境で LLM を簡単に実行できるツールです。API サーバーとして動作し、Docker のようなコマンドで各種モデルをダウンロード・起動できます。

Ollama の主な利点

#	項目	説明
1	プライバシー保護	データが外部に送信されず、機密情報を扱える
2	コスト削減	API 利用料が不要で、ハードウェアがあれば無料運用可能
3	カスタマイズ性	モデルやパラメータを自由に調整できる
4	オフライン動作	インターネット接続不要で動作する

RAG システムでは、Ollama を使うことで 埋め込み生成と 回答生成 の両方をローカルで完結できます。

課題

RAG システム構築時の 3 つの壁

Ollama で RAG を実装する際、以下の課題に直面することが多いでしょう。

1. 埋め込みモデルの選定が難しい

Ollama では複数の埋め込みモデルが利用できますが、どのモデルを選ぶべきかが明確ではありません。モデルによって次元数、精度、処理速度が異なり、用途に合わせた選定が必要です。

主な検討ポイント

多言語対応の必要性（日本語処理の品質）
ベクトル次元数とストレージ容量のバランス
検索精度と処理速度のトレードオフ

2. 初回検索だけでは精度が不十分

ベクトル検索は高速ですが、意味的に近い文書を必ずしも正確に取得できるわけではありません。特に、キーワードのみが一致する文書や、文脈が異なる文書が上位に来ることがあります。

以下の図は、再ランキングなしの場合の課題を示しています。

mermaidflowchart LR
  query["質問:<br/>Next.jsのSSR"] -->|ベクトル検索| results["検索結果"]
  results --> doc1["★★★ Next.js SSR解説"]
  results --> doc2["★★☆ React SSR全般"]
  results --> doc3["★☆☆ Next.js基本"]
  results --> doc4["☆☆☆ SSR用語集"]

  doc1 -.->|LLMへ| llm["LLM"]
  doc2 -.->|LLMへ| llm
  doc3 -.->|LLMへ| llm
  doc4 -.->|LLMへ| llm

課題のポイント

関連度の低い文書（★ が少ない）も上位に含まれる可能性がある
すべての候補をそのまま LLM に渡すと、ノイズが混入する
回答品質が検索精度に大きく依存してしまう

3. 出典表示の実装が煩雑

RAG の回答には 出典情報が不可欠 ですが、実装には以下の課題があります。

#	課題	詳細
1	参照文書の特定	どの文書が実際に使われたかの追跡が必要
2	メタデータ管理	ファイル名、ページ番号、URL などの管理
3	表示形式の統一	ユーザーに分かりやすい形式での提示
4	引用箇所の明示	回答のどの部分がどの文書由来かの明示

これらの課題を解決しないと、信頼性の低い RAG システムになってしまいます。

解決策

1. 埋め込みモデルの選定基準

Ollama で利用できる主要な埋め込みモデルと、その選定基準を整理します。

推奨モデル一覧

#	モデル名	次元数	日本語対応	用途
1	nomic-embed-text	768	★★☆	汎用的な英語文書
2	mxbai-embed-large	1024	★★★	多言語・高精度
3	all-minilm	384	★☆☆	軽量・高速処理
4	bge-m3	1024	★★★	日本語特化・高精度

モデル選定のフローチャート

以下の図は、用途に応じたモデル選定の判断フローを示しています。

mermaidflowchart TD
  start["埋め込みモデル選定"] --> lang{日本語メイン?}
  lang -->|はい| perf{精度重視?}
  lang -->|いいえ| eng_perf{精度重視?}

  perf -->|はい| bge["bge-m3<br/>(1024次元)"]
  perf -->|いいえ| mxbai["mxbai-embed-large<br/>(1024次元)"]

  eng_perf -->|はい| nomic["nomic-embed-text<br/>(768次元)"]
  eng_perf -->|いいえ| minilm["all-minilm<br/>(384次元)"]

選定時の判断基準

日本語文書が中心の場合: bge-m3 または mxbai-embed-large を選択しましょう
処理速度を優先する場合: 次元数が少ない all-minilm が適しています
ストレージを節約したい場合: 次元数が少ないモデルほどデータ量が小さくなります
精度を最優先する場合: 1024 次元のモデルを選ぶと良いでしょう

Ollama での埋め込みモデルのダウンロード

まず、必要な埋め込みモデルを Ollama でダウンロードします。

bash# 日本語対応の高精度モデル
ollama pull mxbai-embed-large

bash# 軽量・高速なモデル
ollama pull all-minilm

モデルのダウンロードは初回のみ必要で、以降はローカルから即座に利用できます。

2. 再ランキングによる精度向上

ベクトル検索で取得した候補文書を、さらに精緻にランク付けする再ランキングの実装方法を解説します。

再ランキングの仕組み

再ランキングは、クエリと各文書の関連度をより詳細に計算し、上位結果を絞り込むプロセスです。

以下の図は、再ランキングによる精度向上の流れを示しています。

mermaidflowchart LR
  query["質問"] -->|ベクトル検索| search["初回検索<br/>(Top 20)"]
  search --> rerank["再ランキング<br/>モデル"]
  rerank --> filtered["精選結果<br/>(Top 5)"]
  filtered --> llm["LLM"]

  style rerank fill:#e1f5ff
  style filtered fill:#c8e6c9

再ランキングの効果

初回検索で 20 件程度を取得し、再ランキングで 5 件に絞り込みます
関連度の低いノイズ文書を除外できます
LLM に渡すコンテキストの質が向上します

クロスエンコーダー方式の実装

再ランキングには、クロスエンコーダー方式が効果的です。これは、質問と文書をペアで入力し、関連度スコアを直接計算する手法です。

必要なパッケージのインストール

bashyarn add sentence-transformers chromadb

パッケージのインストールが完了したら、再ランキング機能を実装していきます。

再ランキング関数の型定義

typescript// 文書の型定義
interface Document {
  id: string;
  content: string;
  metadata: {
    source: string;
    page?: number;
  };
  score: number;
}

// 再ランキング結果の型定義
interface RerankedDocument extends Document {
  rerankScore: number;
}

これらの型定義により、文書の構造とスコア情報を明確に管理できます。

再ランキングの実装

typescriptimport { pipeline } from '@xenova/transformers';

// クロスエンコーダーモデルの初期化
const reranker = await pipeline(
  'text-classification',
  'cross-encoder/ms-marco-MiniLM-L-6-v2'
);

上記のコードでは、軽量で高速な MiniLM ベースのクロスエンコーダーを読み込んでいます。

typescript/**
 * 再ランキング関数
 * @param query ユーザーの質問
 * @param documents 初回検索で取得した文書リスト
 * @param topK 最終的に返す上位件数
 */
async function rerankDocuments(
  query: string,
  documents: Document[],
  topK: number = 5
): Promise<RerankedDocument[]> {
  // クエリと各文書のペアを作成
  const pairs = documents.map((doc) => ({
    text: query,
    text_pair: doc.content,
  }));

  // 関連度スコアを計算
  const scores = await reranker(pairs);

  return scores;
}

この関数は、質問と各文書をペアにして、関連度スコアを計算します。

typescriptasync function rerankDocuments(
  query: string,
  documents: Document[],
  topK: number = 5
): Promise<RerankedDocument[]> {
  const pairs = documents.map((doc) => ({
    text: query,
    text_pair: doc.content,
  }));

  const scores = await reranker(pairs);

  // スコアと文書を結合
  const rerankedDocs = documents.map((doc, idx) => ({
    ...doc,
    rerankScore: scores[idx].score,
  }));

  // スコア順にソート
  rerankedDocs.sort(
    (a, b) => b.rerankScore - a.rerankScore
  );

  // 上位 K 件を返却
  return rerankedDocs.slice(0, topK);
}

スコアの高い順にソートし、指定した件数のみを返すことで、LLM に渡す文書の質を大幅に向上させられます。

Ollama を使った再ランキング方式

Ollama の LLM 自体を使って再ランキングする方法もあります。この方式は、外部パッケージ不要で実装できるメリットがあります。

typescriptimport Ollama from 'ollama';

const ollama = new Ollama({
  host: 'http://localhost:11434',
});

/**
 * Ollama を使った再ランキング
 */
async function rerankWithOllama(
  query: string,
  documents: Document[],
  topK: number = 5
): Promise<RerankedDocument[]> {
  const scoredDocs: RerankedDocument[] = [];

  // 各文書の関連度を LLM で評価
  for (const doc of documents) {
    const prompt = `
質問: ${query}

文書: ${doc.content}

この文書は質問に対してどれだけ関連していますか?
0から10のスコアで答えてください。数字のみを返してください。
    `.trim();

    const response = await ollama.generate({
      model: 'llama3.2',
      prompt: prompt,
      stream: false,
    });

    // スコアを抽出
    const score = parseFloat(response.response.trim());

    scoredDocs.push({
      ...doc,
      rerankScore: isNaN(score) ? 0 : score,
    });
  }

  return scoredDocs;
}

この方法では、LLM に質問と文書を渡し、関連度を 0〜10 のスコアで評価させています。

typescriptasync function rerankWithOllama(
  query: string,
  documents: Document[],
  topK: number = 5
): Promise<RerankedDocument[]> {
  // ... (前述のスコアリング処理)

  // スコア順にソート
  scoredDocs.sort((a, b) => b.rerankScore - a.rerankScore);

  // 上位 K 件を返却
  return scoredDocs.slice(0, topK);
}

最後にスコア順でソートし、上位のみを返します。この方式は精度が高い反面、処理時間がかかる点に注意が必要です。

3. 出典表示の実装パターン

RAG システムの信頼性を高めるには、どの文書を参照したかを明示することが重要です。

出典情報の設計

出典情報には、以下の要素を含めるべきでしょう。

#	要素	説明	例
1	ソース名	ファイル名や URL	`user-guide.pdf`
2	ページ番号	PDF や文書のページ	`p.15`
3	セクション	章や見出し	`第3章認証`
4	スコア	関連度スコア	`0.87`
5	抜粋	参照した文章の一部	`認証には JWT を...`

ベクトル DB へのメタデータ保存

文書を格納する際に、メタデータも一緒に保存します。ここでは ChromaDB を例に解説します。

ChromaDB のインストール

bashyarn add chromadb

ChromaDB は、ベクトル検索とメタデータ管理を同時に行える便利なツールです。

クライアントの初期化

typescriptimport { ChromaClient } from 'chromadb';

// ChromaDB クライアントの作成
const client = new ChromaClient({
  path: 'http://localhost:8000',
});

// コレクションの取得または作成
const collection = await client.getOrCreateCollection({
  name: 'documents',
  metadata: { 'hnsw:space': 'cosine' },
});

このコードで、ベクトル検索用のコレクションを準備できます。

文書とメタデータの保存

typescript/**
 * 文書をベクトルDBに保存
 */
async function addDocuments(
  documents: Array<{
    content: string;
    source: string;
    page?: number;
    section?: string;
  }>
) {
  // Ollama で埋め込みベクトルを生成
  const embeddings = await Promise.all(
    documents.map(async (doc) => {
      const response = await ollama.embeddings({
        model: 'mxbai-embed-large',
        prompt: doc.content,
      });
      return response.embedding;
    })
  );

  return { embeddings, documents };
}

各文書に対して、Ollama で埋め込みベクトルを生成しています。

typescriptasync function addDocuments(
  documents: Array<{
    content: string;
    source: string;
    page?: number;
    section?: string;
  }>
) {
  const { embeddings } = await generateEmbeddings(
    documents
  );

  // ChromaDB に保存
  await collection.add({
    ids: documents.map((_, idx) => `doc_${idx}`),
    embeddings: embeddings,
    documents: documents.map((d) => d.content),
    metadatas: documents.map((d) => ({
      source: d.source,
      page: d.page?.toString() || '',
      section: d.section || '',
    })),
  });
}

文書本文とメタデータを一緒に保存することで、検索時に出典情報も取得できるようになります。

検索時の出典情報取得

ベクトル検索を実行する際、メタデータも一緒に取得します。

typescript/**
 * 質問に関連する文書を検索
 */
async function searchDocuments(
  query: string,
  topK: number = 10
) {
  // クエリの埋め込みベクトルを生成
  const queryEmbedding = await ollama.embeddings({
    model: 'mxbai-embed-large',
    prompt: query,
  });

  // ベクトル検索を実行
  const results = await collection.query({
    queryEmbeddings: [queryEmbedding.embedding],
    nResults: topK,
    include: ['documents', 'metadatas', 'distances'],
  });

  return results;
}

include パラメータで、文書内容・メタデータ・距離スコアを同時に取得できます。

typescriptasync function searchDocuments(
  query: string,
  topK: number = 10
) {
  const results = await collection.query({
    queryEmbeddings: [queryEmbedding.embedding],
    nResults: topK,
    include: ['documents', 'metadatas', 'distances'],
  });

  // 出典情報付きの文書リストを作成
  const documentsWithSources = results.documents[0].map(
    (doc, idx) => ({
      content: doc,
      metadata: results.metadatas[0][idx],
      score: 1 - results.distances[0][idx], // 距離を類似度に変換
    })
  );

  return documentsWithSources;
}

検索結果をわかりやすい形式に整形し、出典情報を含めて返しています。

LLM への出典指示プロンプト

LLM に回答を生成させる際、出典を明示するようプロンプトで指示します。

typescript/**
 * RAG回答生成（出典付き）
 */
async function generateAnswerWithSources(
  query: string,
  documents: RerankedDocument[]
) {
  // コンテキストの構築
  const context = documents
    .map(
      (doc, idx) => `
[文書${idx + 1}]
出典: ${doc.metadata.source}
${doc.metadata.page ? `ページ: ${doc.metadata.page}` : ''}
内容: ${doc.content}
  `
    )
    .join('\n\n');

  return context;
}

各文書に番号を振り、出典情報を明記したコンテキストを作成しています。

typescriptasync function generateAnswerWithSources(
  query: string,
  documents: RerankedDocument[]
) {
  const context = buildContext(documents);

  const prompt = `
以下の文書を参照して、質問に答えてください。

${context}

質問: ${query}

回答する際は、必ず参照した文書番号を [文書1] のように明記してください。
複数の文書を参照した場合は、すべての文書番号を記載してください。

回答:
  `.trim();

  const response = await ollama.generate({
    model: 'llama3.2',
    prompt: prompt,
    stream: false,
  });

  return response.response;
}

プロンプトで「文書番号を明記する」よう指示することで、LLM が出典を含めた回答を生成してくれます。

出典情報の構造化

回答とは別に、参照した文書リストを構造化して返すと、ユーザーにとってさらに分かりやすくなります。

typescript/**
 * 回答と出典をセットで返す
 */
interface AnswerWithSources {
  answer: string;
  sources: Array<{
    id: number;
    source: string;
    page?: number;
    section?: string;
    excerpt: string;
    score: number;
  }>;
}

async function generateStructuredAnswer(
  query: string,
  documents: RerankedDocument[]
): Promise<AnswerWithSources> {
  // 回答生成
  const answer = await generateAnswerWithSources(
    query,
    documents
  );

  return { answer, documents };
}

回答と出典情報を明確に分離することで、フロントエンドでの表示が容易になります。

typescriptasync function generateStructuredAnswer(
  query: string,
  documents: RerankedDocument[]
): Promise<AnswerWithSources> {
  const answer = await generateAnswerWithSources(
    query,
    documents
  );

  // 出典リストの作成
  const sources = documents.map((doc, idx) => ({
    id: idx + 1,
    source: doc.metadata.source,
    page: doc.metadata.page,
    section: doc.metadata.section,
    excerpt: doc.content.substring(0, 150) + '...', // 冒頭150文字
    score: doc.rerankScore,
  }));

  return {
    answer,
    sources,
  };
}

この構造により、ユーザーは回答の根拠となった文書を簡単に確認できるようになります。

具体例

実際に Ollama を使った RAG システムを構築する完全な例を示します。

プロジェクトのセットアップ

まず、必要なパッケージをインストールします。

bash# プロジェクトの初期化
yarn init -y

bash# 必要なパッケージのインストール
yarn add ollama chromadb @xenova/transformers
yarn add -D typescript @types/node tsx

これで、TypeScript と Ollama、ベクトル DB を使った開発環境が整います。

TypeScript の設定

json{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "node",
    "esModuleInterop": true,
    "strict": true,
    "skipLibCheck": true,
    "outDir": "./dist"
  },
  "include": ["src/**/*"]
}

上記の設定を tsconfig.json として保存してください。

RAG システムの実装

すべての機能を統合した RAG システムのクラスを実装します。

クラスの型定義とコンストラクタ

typescriptimport Ollama from 'ollama';
import { ChromaClient, Collection } from 'chromadb';

/**
 * Ollama RAG システム
 */
class OllamaRAG {
  private ollama: Ollama;
  private client: ChromaClient;
  private collection: Collection | null = null;
  private embeddingModel: string;
  private llmModel: string;

  constructor(
    config: {
      ollamaHost?: string;
      chromaHost?: string;
      embeddingModel?: string;
      llmModel?: string;
    } = {}
  ) {
    this.ollama = new Ollama({
      host: config.ollamaHost || 'http://localhost:11434',
    });

    this.client = new ChromaClient({
      path: config.chromaHost || 'http://localhost:8000',
    });

    this.embeddingModel =
      config.embeddingModel || 'mxbai-embed-large';
    this.llmModel = config.llmModel || 'llama3.2';
  }
}

コンストラクタで、Ollama と ChromaDB の接続、使用するモデルを設定しています。

コレクションの初期化

typescriptclass OllamaRAG {
  // ... (前述のコンストラクタ)

  /**
   * ベクトルDBのコレクションを初期化
   */
  async initialize(collectionName: string = 'documents') {
    this.collection =
      await this.client.getOrCreateCollection({
        name: collectionName,
        metadata: { 'hnsw:space': 'cosine' },
      });

    console.log(
      `コレクション "${collectionName}" を初期化しました`
    );
  }
}

この初期化メソッドで、文書を保存するコレクションを作成します。

文書の追加メソッド

typescriptclass OllamaRAG {
  // ... (前述のメソッド)

  /**
   * 文書をベクトルDBに追加
   */
  async addDocuments(
    documents: Array<{
      id: string;
      content: string;
      metadata: {
        source: string;
        page?: number;
        section?: string;
      };
    }>
  ) {
    if (!this.collection) {
      throw new Error('コレクションが初期化されていません');
    }

    console.log(`${documents.length} 件の文書を処理中...`);

    // 各文書の埋め込みベクトルを生成
    const embeddings = await Promise.all(
      documents.map(async (doc) => {
        const response = await this.ollama.embeddings({
          model: this.embeddingModel,
          prompt: doc.content,
        });
        return response.embedding;
      })
    );

    return embeddings;
  }
}

文書ごとに埋め込みベクトルを生成しています。処理状況をコンソールに表示することで、進捗が分かりやすくなります。

typescriptclass OllamaRAG {
  async addDocuments(documents: Array<{...}>) {
    // ... (埋め込み生成)

    // ChromaDBに保存
    await this.collection!.add({
      ids: documents.map(d => d.id),
      embeddings: embeddings,
      documents: documents.map(d => d.content),
      metadatas: documents.map(d => ({
        source: d.metadata.source,
        page: d.metadata.page?.toString() || '',
        section: d.metadata.section || ''
      }))
    });

    console.log(`${documents.length} 件の文書を追加しました`);
  }
}

埋め込みベクトルと一緒に、メタデータも保存しています。

検索と再ランキング

typescriptclass OllamaRAG {
  // ... (前述のメソッド)

  /**
   * 質問に関連する文書を検索
   */
  async search(query: string, topK: number = 10) {
    if (!this.collection) {
      throw new Error('コレクションが初期化されていません');
    }

    // クエリの埋め込みベクトルを生成
    const queryEmbedding = await this.ollama.embeddings({
      model: this.embeddingModel,
      prompt: query,
    });

    // ベクトル検索を実行
    const results = await this.collection.query({
      queryEmbeddings: [queryEmbedding.embedding],
      nResults: topK,
      include: ['documents', 'metadatas', 'distances'],
    });

    return results;
  }
}

ベクトル検索で、関連する文書を取得しています。

typescriptclass OllamaRAG {
  async search(query: string, topK: number = 10) {
    const results = await this.collection!.query({...});

    // 検索結果を構造化
    const documents = results.documents[0].map((doc, idx) => ({
      id: results.ids[0][idx],
      content: doc,
      metadata: {
        source: results.metadatas[0][idx].source,
        page: results.metadatas[0][idx].page
          ? parseInt(results.metadatas[0][idx].page as string)
          : undefined,
        section: results.metadatas[0][idx].section || undefined
      },
      score: 1 - results.distances[0][idx] // 類似度に変換
    }));

    return documents;
  }
}

検索結果を使いやすい形式に変換して返しています。

再ランキングメソッド

typescriptclass OllamaRAG {
  // ... (前述のメソッド)

  /**
   * LLMを使った再ランキング
   */
  async rerank(
    query: string,
    documents: any[],
    topK: number = 5
  ) {
    console.log(
      `${documents.length} 件の文書を再ランキング中...`
    );

    const scoredDocs = [];

    for (const doc of documents) {
      const prompt = `
質問: ${query}

文書: ${doc.content}

この文書は質問に対してどれだけ関連していますか?
0から10のスコアで答えてください。数字のみを返してください。
      `.trim();

      const response = await this.ollama.generate({
        model: this.llmModel,
        prompt: prompt,
        stream: false,
      });

      const score = parseFloat(response.response.trim());

      scoredDocs.push({
        ...doc,
        rerankScore: isNaN(score) ? 0 : score,
      });
    }

    return scoredDocs;
  }
}

LLM に各文書の関連度を評価させています。

typescriptclass OllamaRAG {
  async rerank(
    query: string,
    documents: any[],
    topK: number = 5
  ) {
    // ... (スコアリング)

    // スコア順にソート
    scoredDocs.sort(
      (a, b) => b.rerankScore - a.rerankScore
    );

    console.log(`上位 ${topK} 件を選択しました`);

    return scoredDocs.slice(0, topK);
  }
}

最終的に、スコアの高い上位 K 件のみを返します。

回答生成メソッド

typescriptclass OllamaRAG {
  // ... (前述のメソッド)

  /**
   * 出典付き回答を生成
   */
  async generateAnswer(query: string, documents: any[]) {
    // コンテキストの構築
    const context = documents
      .map((doc, idx) => {
        let citation = `[文書${idx + 1}]\n出典: ${
          doc.metadata.source
        }`;

        if (doc.metadata.page) {
          citation += `\nページ: ${doc.metadata.page}`;
        }

        if (doc.metadata.section) {
          citation += `\nセクション: ${doc.metadata.section}`;
        }

        citation += `\n内容: ${doc.content}`;

        return citation;
      })
      .join('\n\n');

    return context;
  }
}

各文書に出典情報を付けてコンテキストを構築しています。

typescriptclass OllamaRAG {
  async generateAnswer(query: string, documents: any[]) {
    const context = buildContext(documents);

    const prompt = `
以下の文書を参照して、質問に答えてください。

${context}

質問: ${query}

回答する際は、必ず参照した文書番号を [文書1] のように明記してください。
複数の文書を参照した場合は、すべての文書番号を記載してください。

回答:
    `.trim();

    console.log('回答を生成中...');

    const response = await this.ollama.generate({
      model: this.llmModel,
      prompt: prompt,
      stream: false,
    });

    return {
      answer: response.response,
      sources: documents.map((doc, idx) => ({
        id: idx + 1,
        source: doc.metadata.source,
        page: doc.metadata.page,
        section: doc.metadata.section,
        excerpt: doc.content.substring(0, 150) + '...',
        score: doc.rerankScore || doc.score,
      })),
    };
  }
}

回答と出典リストを構造化して返します。

統合クエリメソッド

すべての処理を一括で実行する便利なメソッドを実装します。

typescriptclass OllamaRAG {
  // ... (前述のメソッド)

  /**
   * 質問に対して回答を生成（検索→再ランキング→回答生成）
   */
  async query(
    question: string,
    options: {
      searchTopK?: number;
      rerankTopK?: number;
    } = {}
  ) {
    const searchTopK = options.searchTopK || 20;
    const rerankTopK = options.rerankTopK || 5;

    console.log(`\n質問: ${question}\n`);

    // 1. ベクトル検索
    console.log('ステップ1: ベクトル検索');
    const searchResults = await this.search(
      question,
      searchTopK
    );

    // 2. 再ランキング
    console.log('ステップ2: 再ランキング');
    const rerankedDocs = await this.rerank(
      question,
      searchResults,
      rerankTopK
    );

    // 3. 回答生成
    console.log('ステップ3: 回答生成');
    const result = await this.generateAnswer(
      question,
      rerankedDocs
    );

    return result;
  }
}

この統合メソッドにより、1 回の呼び出しで RAG 処理全体が完了します。

実行例

実装した RAG システムを実際に使ってみます。

使用例のコード

typescript/**
 * RAGシステムの使用例
 */
async function main() {
  // RAGシステムの初期化
  const rag = new OllamaRAG({
    embeddingModel: 'mxbai-embed-large',
    llmModel: 'llama3.2',
  });

  await rag.initialize('tech-docs');

  console.log('RAGシステムを初期化しました\n');
}

main();

まず、RAG システムのインスタンスを作成し、コレクションを初期化します。

サンプル文書の追加

typescriptasync function main() {
  const rag = new OllamaRAG({...});
  await rag.initialize('tech-docs');

  // サンプル文書を追加
  await rag.addDocuments([
    {
      id: 'doc1',
      content: 'Next.jsのSSR（Server-Side Rendering）は、サーバー側でHTMLを生成する機能です。getServerSidePropsを使用することで、リクエストごとにデータを取得できます。',
      metadata: {
        source: 'nextjs-guide.pdf',
        page: 12,
        section: 'レンダリング方式'
      }
    },
    {
      id: 'doc2',
      content: 'Next.jsのSSG（Static Site Generation）は、ビルド時にHTMLを生成します。getStaticPropsを使用し、高速なページ表示が可能です。',
      metadata: {
        source: 'nextjs-guide.pdf',
        page: 15,
        section: 'レンダリング方式'
      }
    },
    {
      id: 'doc3',
      content: 'ReactのuseEffectフックは、副作用を処理するために使用します。コンポーネントのマウント、更新、アンマウント時に処理を実行できます。',
      metadata: {
        source: 'react-hooks.pdf',
        page: 8,
        section: 'フック'
      }
    }
  ]);

  console.log('');
}

実際の技術文書を想定したサンプルデータを追加しています。

質問の実行

typescriptasync function main() {
  // ... (初期化と文書追加)

  // 質問を実行
  const result = await rag.query(
    'Next.jsでサーバーサイドレンダリングをする方法を教えてください',
    {
      searchTopK: 10,
      rerankTopK: 3,
    }
  );

  // 結果を表示
  console.log('\n=== 回答 ===');
  console.log(result.answer);

  console.log('\n=== 出典 ===');
  result.sources.forEach((source) => {
    console.log(`[文書${source.id}]`);
    console.log(`  出典: ${source.source}`);
    if (source.page)
      console.log(`  ページ: ${source.page}`);
    if (source.section)
      console.log(`  セクション: ${source.section}`);
    console.log(`  スコア: ${source.score.toFixed(2)}`);
    console.log(`  抜粋: ${source.excerpt}`);
    console.log('');
  });
}

質問を実行し、回答と出典情報を見やすく表示しています。

実行結果の例

実際に上記のコードを実行すると、次のような出力が得られます。

text質問: Next.jsでサーバーサイドレンダリングをする方法を教えてください

ステップ1: ベクトル検索
ステップ2: 再ランキング
3 件の文書を再ランキング中...
上位 3 件を選択しました
ステップ3: 回答生成
回答を生成中...

=== 回答 ===
Next.jsでサーバーサイドレンダリング（SSR）を実装するには、getServerSidePropsを使用します [文書1]。この関数を使うことで、リクエストごとにサーバー側でデータを取得し、HTMLを生成できます。これにより、常に最新のデータを表示することが可能になります。

=== 出典 ===
[文書1]
  出典: nextjs-guide.pdf
  ページ: 12
  セクション: レンダリング方式
  スコア: 9.50
  抜粋: Next.jsのSSR（Server-Side Rendering）は、サーバー側でHTMLを生成する機能です。getServerSidePropsを使用することで、リクエストごとにデータを取得できます。...

[文書2]
  出典: nextjs-guide.pdf
  ページ: 15
  セクション: レンダリング方式
  スコア: 6.20
  抜粋: Next.jsのSSG（Static Site Generation）は、ビルド時にHTMLを生成します。getStaticPropsを使用し、高速なページ表示が可能です。...

このように、質問に対する回答と、その根拠となった文書が明確に表示されます。

処理フローの図解

以下の図は、実装した RAG システムの全体フローを示しています。

mermaidsequenceDiagram
  participant User as ユーザー
  participant RAG as OllamaRAG
  participant Ollama as Ollama
  participant DB as ChromaDB

  User->>RAG: query(question)

  Note over RAG: ステップ1: 検索
  RAG->>Ollama: embeddings(question)
  Ollama-->>RAG: クエリベクトル
  RAG->>DB: query(vector, top 20)
  DB-->>RAG: 候補文書20件

  Note over RAG: ステップ2: 再ランキング
  loop 各文書
    RAG->>Ollama: generate(スコアリング)
    Ollama-->>RAG: スコア
  end
  RAG->>RAG: ソート & 上位5件抽出

  Note over RAG: ステップ3: 回答生成
  RAG->>Ollama: generate(回答プロンプト)
  Ollama-->>RAG: 回答テキスト

  RAG-->>User: 回答 + 出典リスト

処理の流れ

ユーザーの質問は埋め込みベクトルに変換され、ベクトル検索が実行されます
検索結果は LLM で再評価され、上位のみが選ばれます
選ばれた文書を元に、出典付きの回答が生成されます
最終的に、回答と出典情報がセットでユーザーに返されます

まとめ

Ollama を使った RAG システムの設計において、以下の 3 つのポイントが重要であることを解説しました。

埋め込みモデル選定のポイント

#	観点	推奨
1	日本語文書	bge-m3、mxbai-embed-large
2	英語文書	nomic-embed-text
3	速度重視	all-minilm（384 次元）
4	精度重視	1024 次元モデル

用途に応じて適切なモデルを選ぶことで、ストレージコストと検索精度のバランスを最適化できます。

再ランキングの効果

初回検索で 20 件程度を取得し、再ランキングで 5 件に絞り込むことで、次のメリットが得られます。

ノイズ文書の除外による回答品質向上
LLM へのコンテキスト量削減
処理時間の短縮

クロスエンコーダー方式または Ollama の LLM を使った方式のどちらでも実装可能です。

出典表示の実装方法

出典情報を適切に管理することで、RAG システムの信頼性が大きく向上します。

メタデータをベクトル DB に保存する
検索時にメタデータも取得する
プロンプトで文書番号を明記させる
回答と出典リストを構造化して返す

これらの実装により、ユーザーは回答の根拠を確認でき、安心して情報を利用できるでしょう。

本記事で紹介した設計パターンを活用すれば、精度が高く、信頼性のある RAG システムを Ollama でローカルに構築できます。ぜひ、実際のプロジェクトで試してみてください。

Ollama で RAG を設計する：埋め込みモデル選定・再ランキング・出典表示の定石

背景

RAG システムとは

Ollama の特徴

課題

RAG システム構築時の 3 つの壁

1. 埋め込みモデルの選定が難しい

2. 初回検索だけでは精度が不十分

3. 出典表示の実装が煩雑

解決策

1. 埋め込みモデルの選定基準

推奨モデル一覧

モデル選定のフローチャート

Ollama での埋め込みモデルのダウンロード

2. 再ランキングによる精度向上

再ランキングの仕組み

クロスエンコーダー方式の実装

必要なパッケージのインストール

再ランキング関数の型定義

再ランキングの実装

Ollama を使った再ランキング方式

3. 出典表示の実装パターン

出典情報の設計

ベクトル DB へのメタデータ保存

ChromaDB のインストール

クライアントの初期化

文書とメタデータの保存

検索時の出典情報取得

LLM への出典指示プロンプト

出典情報の構造化

具体例

プロジェクトのセットアップ

TypeScript の設定

RAG システムの実装

クラスの型定義とコンストラクタ

コレクションの初期化

文書の追加メソッド

検索と再ランキング

再ランキングメソッド

回答生成メソッド

統合クエリメソッド

実行例

使用例のコード

サンプル文書の追加

質問の実行

実行結果の例

処理フローの図解

まとめ

埋め込みモデル選定のポイント

再ランキングの効果

出典表示の実装方法

関連リンク

Ollamaの記事Ollama

Ollama のコスト最適化：モデルサイズ・VRAM 使用量・バッチ化の実践

Ollama と LM Studio／GPT4All の違いを徹底比較：導入難易度・速度・拡張性

社内ナレッジ QA を Ollama で構築：出典リンクとアクセス制御で信頼性向上

はじめての Ollama：`ollama run llama3` でチャットを動かす 10 分チュートリアル

Ollama で RAG を設計する：埋め込みモデル選定・再ランキング・出典表示の定石

Ollama コマンドチートシート：`run`／`pull`／`list`／`ps`／`stop` の虎の巻

記事Article

Pinia ストアスキーマの変更管理：バージョン付与・マイグレーション・互換ポリシー

shadcn/ui コンポーネント置換マップ：用途別に最短でたどり着く選定表

Ollama のコスト最適化：モデルサイズ・VRAM 使用量・バッチ化の実践

Remix Loader／Action チートシート：Request／Response API 逆引き大全

Obsidian タスク運用の最適解：Tasks ＋ Periodic Notes で計画と実行を接続

Preact Signals チートシート：signal／computed／effect 実用スニペット 30

ブログBlog

iPhone 17シリーズの発表！全モデルiPhone 16から進化したポイントを見やすく整理

Googleストアから訂正案内！Pixel 10ポイント有効期限「1年」表示は誤りだった

【2025年8月】Googleストア「ストアポイント」は1年表記はミス？2年ルールとの整合性を検証

Googleストアの注文キャンセルはなぜ起きる？Pixel 10購入前に知るべき注意点

Pixcel 10シリーズの発表！全モデル Pixcel 9 から進化したポイントを見やすく整理

フロントエンドエンジニアの成長戦略：コーチングで最速スキルアップする方法

書籍レビューReview

今の自分に満足していますか？『持たざる者の逆襲 まだ何者でもない君へ』溝口勇児

ついに語られた業界の裏側！『フジテレビの正体』堀江貴文が描くテレビ局の本当の姿

愛する勇気を持てば人生が変わる！『幸せになる勇気』岸見一郎・古賀史健のアドラー実践編で真の幸福を手に入れる

週末を変えれば年収も変わる！『世界の一流は「休日」に何をしているのか』越川慎司の一流週末メソッド

新しい自分に会いに行こう！『自分の変え方』村岡大樹の認知科学コーチングで人生リセット

科学革命から AI 時代へ！『サピエンス全史 下巻』ユヴァル・ノア・ハラリが予見する人類の未来

今の自分に満足していますか？『持たざる者の逆襲　まだ何者でもない君へ』溝口勇児

科学革命から AI 時代へ！『サピエンス全史下巻』ユヴァル・ノア・ハラリが予見する人類の未来