LlamaIndex チートシート：最小コードで RAG を動かすスニペット 50

2025年10月27日

LlamaIndex

LlamaIndex チートシート：最小コードで RAG を動かすスニペット 50

RAG（Retrieval-Augmented Generation）を実装する際、LlamaIndex は非常に強力なツールです。しかし、機能が豊富すぎて「どこから始めればいいの？」と迷ってしまうことも多いでしょう。

本記事では、LlamaIndex を使った RAG の実装に必要な 50 個の実用的なスニペットを厳選してお届けします。基本的なセットアップから高度なカスタマイズまで、コピー&ペーストですぐに使える最小限のコードを体系的にまとめました。

スニペット早見表

以下は本記事で紹介する全スニペットの一覧です。目的に応じて必要なスニペットを素早く見つけることができます。

#	カテゴリ	スニペット名	用途
1	セットアップ	基本インストール	LlamaIndex の初期セットアップ
2	セットアップ	環境変数設定	OpenAI API キーの設定
3	セットアップ	ローカル LLM 設定	ローカルモデルの利用
4	セットアップ	カスタム LLM 設定	任意の LLM サービスへの接続
5	セットアップ	ロギング設定	デバッグ用ログの有効化
6	データ読込	テキストファイル読込	.txt ファイルのインデックス化
7	データ読込	PDF 読込	PDF ドキュメントの読込
8	データ読込	Markdown 読込	Markdown ファイルの処理
9	データ読込	CSV 読込	CSV データの読込
10	データ読込	JSON 読込	JSON データの読込
11	データ読込	Web ページ読込	URL からのデータ取得
12	データ読込	ディレクトリ一括読込	フォルダ内の全ファイル読込
13	データ読込	GitHub リポジトリ読込	GitHub からのコード取得
14	データ読込	Google Docs 読込	Google ドキュメントの読込
15	データ読込	Notion 読込	Notion データベースの読込
16	インデックス作成	シンプルインデックス	最も基本的なインデックス作成
17	インデックス作成	ベクトルストアインデックス	ベクトル検索用インデックス
18	インデックス作成	ツリーインデックス	階層的インデックス
19	インデックス作成	キーワードインデックス	キーワードベース検索
20	インデックス作成	ナレッジグラフインデックス	知識グラフの構築
21	インデックス保存	インデックス保存	インデックスのディスク保存
22	インデックス保存	インデックス読込	保存したインデックスの読込
23	インデックス保存	クラウド保存	S3 へのインデックス保存
24	クエリ実行	基本クエリ	最もシンプルな質問応答
25	クエリ実行	ストリーミングクエリ	リアルタイム応答生成
26	クエリ実行	メタデータフィルタ	条件付き検索
27	クエリ実行	Top-K 検索	上位 K 件の取得
28	クエリ実行	類似度しきい値	類似度による絞込
29	クエリ実行	ハイブリッド検索	キーワード+ベクトル検索
30	カスタマイズ	カスタムプロンプト	プロンプトテンプレートの変更
31	カスタマイズ	チャンクサイズ設定	テキスト分割サイズの調整
32	カスタマイズ	オーバーラップ設定	チャンク重複の設定
33	カスタマイズ	カスタムエンベディング	独自の埋め込みモデル
34	カスタマイズ	カスタムリトリーバー	独自の検索ロジック
35	高度な機能	チャットエンジン	対話型チャット
36	高度な機能	コンテキストチャット	会話履歴の保持
37	高度な機能	エージェント作成	ツール使用可能なエージェント
38	高度な機能	マルチドキュメント検索	複数インデックスの統合
39	高度な機能	サブクエスチョン生成	複雑な質問の分解
40	評価	応答評価	回答品質の評価
41	評価	忠実度評価	ソース忠実度のチェック
42	評価	関連性評価	検索結果の関連性評価
43	最適化	キャッシュ設定	応答キャッシュの有効化
44	最適化	バッチ処理	複数クエリの一括処理
45	最適化	並列処理	並列インデックス作成
46	ベクトル DB	Pinecone 統合	Pinecone の利用
47	ベクトル DB	ChromaDB 統合	ChromaDB の利用
48	ベクトル DB	Weaviate 統合	Weaviate の利用
49	デバッグ	コールバック設定	処理過程の可視化
50	デバッグ	コスト追跡	API コストの監視

セットアップ

RAG システムを構築する第一歩は、LlamaIndex の環境を正しくセットアップすることです。ここでは、インストールから各種設定まで、すぐに使えるスニペットを紹介します。

1. 基本インストール

LlamaIndex を使い始めるための最初のステップです。

bashpip install llama-index

シンプルなコマンド一つで、必要な依存関係がすべてインストールされます。

2. 環境変数設定

OpenAI API を使用する際の基本設定です。

pythonimport os

# OpenAI API キーの設定
os.environ["OPENAI_API_KEY"] = "your-api-key-here"

API キーは環境変数として設定することで、コード内にハードコーディングせずに済みます。セキュリティ上も推奨される方法です。

3. ローカル LLM 設定

OpenAI に依存せず、ローカルでモデルを実行する設定です。

pythonfrom llama_index.llms.ollama import Ollama
from llama_index.core import Settings

# Ollama を使用したローカル LLM の設定
Settings.llm = Ollama(model="llama2", request_timeout=120.0)

この設定により、データをクラウドに送信することなく、完全にローカル環境で RAG システムを構築できます。

4. カスタム LLM 設定

Azure OpenAI や他の LLM サービスを使用する場合の設定です。

pythonfrom llama_index.llms.azure_openai import AzureOpenAI
from llama_index.core import Settings

# Azure OpenAI の設定
Settings.llm = AzureOpenAI(
    model="gpt-4",
    deployment_name="your-deployment-name",
    api_key="your-azure-api-key",
    azure_endpoint="https://your-resource.openai.azure.com/",
    api_version="2024-02-15-preview",
)

エンタープライズ環境では Azure OpenAI を使用することが多いため、このような設定が必要になります。

5. ロギング設定

デバッグやトラブルシューティングのためのログ出力設定です。

pythonimport logging
import sys

# 詳細なログを出力
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

開発中は詳細なログを確認することで、どこで問題が発生しているかを素早く特定できます。

データ読込

RAG システムの核心は、適切なデータを読み込むことから始まります。LlamaIndex は多様なデータソースに対応しており、それぞれに最適化された読込方法を提供しています。

6. テキストファイル読込

最もシンプルなテキストファイルの読込方法です。

pythonfrom llama_index.core import SimpleDirectoryReader

# テキストファイルの読込
documents = SimpleDirectoryReader(
    input_files=["./data/sample.txt"]
).load_data()

.txt ファイルを指定するだけで、自動的にドキュメントオブジェクトに変換されます。

7. PDF 読込

PDF ドキュメントを読み込む際の設定です。

pythonfrom llama_index.core import SimpleDirectoryReader

# PDF ファイルの読込（PyPDF が自動使用される）
documents = SimpleDirectoryReader(
    input_files=["./data/document.pdf"]
).load_data()

内部で PyPDF が使用され、テキスト抽出が自動的に行われます。画像ベースの PDF の場合は OCR の追加設定が必要です。

8. Markdown 読込

Markdown ファイルの読込と処理です。

pythonfrom llama_index.core import SimpleDirectoryReader

# Markdown ファイルの読込
documents = SimpleDirectoryReader(
    input_files=["./data/README.md"]
).load_data()

Markdown の構造を保持しながら読み込まれるため、見出しなどの情報も活用できます。

9. CSV 読込

CSV データを読み込む方法です。

pythonfrom llama_index.readers.file import CSVReader

# CSV ファイルの読込
reader = CSVReader()
documents = reader.load_data(file="./data/data.csv")

各行が個別のドキュメントとして扱われ、列名がメタデータとして保存されます。

10. JSON 読込

JSON データの読込方法です。

pythonfrom llama_index.readers.file import JSONReader

# JSON ファイルの読込
reader = JSONReader()
documents = reader.load_data(input_file="./data/data.json")

ネストされた JSON 構造も適切に展開され、検索可能な形式に変換されます。

11. Web ページ読込

URL から直接コンテンツを取得する方法です。

pythonfrom llama_index.readers.web import SimpleWebPageReader

# Web ページの読込
documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["https://example.com/article"]
)

HTML タグが自動的に除去され、純粋なテキストコンテンツが抽出されます。

12. ディレクトリ一括読込

フォルダ内のすべてのファイルを一度に読み込みます。

pythonfrom llama_index.core import SimpleDirectoryReader

# ディレクトリ内の全ファイルを読込
documents = SimpleDirectoryReader(
    "./data",
    recursive=True  # サブディレクトリも含む
).load_data()

大量のドキュメントを扱う際に非常に便利で、ファイル形式を自動判定して適切に処理してくれます。

13. GitHub リポジトリ読込

GitHub からコードやドキュメントを直接読み込む方法です。

pythonfrom llama_index.readers.github import GithubRepositoryReader, GithubClient

# GitHub リポジトリの読込
github_client = GithubClient(github_token="your-token")
documents = GithubRepositoryReader(
    github_client=github_client,
    owner="owner-name",
    repo="repo-name",
    filter_file_extensions=[".py", ".md"],  # 特定の拡張子のみ
).load_data(branch="main")

コードベース全体を対象とした質問応答システムを構築する際に活用できます。

14. Google Docs 読込

Google ドキュメントから直接データを取得します。

pythonfrom llama_index.readers.google import GoogleDocsReader

# Google Docs の読込
documents = GoogleDocsReader().load_data(
    document_ids=["your-document-id"]
)

認証情報の設定が必要ですが、クラウド上のドキュメントを直接 RAG システムに統合できます。

15. Notion 読込

Notion データベースのコンテンツを読み込みます。

pythonfrom llama_index.readers.notion import NotionPageReader

# Notion ページの読込
documents = NotionPageReader(
    integration_token="your-notion-token"
).load_data(page_ids=["page-id-1", "page-id-2"])

チーム内のナレッジベースを Notion で管理している場合、そのまま RAG システムのデータソースとして活用できます。

インデックス作成

データを読み込んだ後は、効率的な検索を可能にするインデックスを作成します。用途に応じて最適なインデックスタイプを選択することが重要です。

16. シンプルインデックス

最も基本的で使いやすいインデックスです。

pythonfrom llama_index.core import VectorStoreIndex

# シンプルなインデックス作成
index = VectorStoreIndex.from_documents(documents)

この一行で、ドキュメントがベクトル化され、検索可能な状態になります。小〜中規模のデータセットに最適です。

17. ベクトルストアインデックス

詳細な設定を含むベクトルインデックスの作成です。

pythonfrom llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.node_parser import SentenceSplitter

# ノードパーサーの設定
node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)

# ベクトルストアインデックスの作成
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[node_parser],
)

チャンクサイズやオーバーラップを細かく制御できるため、データの特性に合わせた最適化が可能です。

18. ツリーインデックス

階層的な構造を持つドキュメントに適したインデックスです。

pythonfrom llama_index.core import TreeIndex

# ツリーインデックスの作成
index = TreeIndex.from_documents(documents)

要約を階層的に構築するため、大規模なドキュメントの概要把握に優れています。

19. キーワードインデックス

キーワードベースの検索を行うインデックスです。

pythonfrom llama_index.core import SimpleKeywordTableIndex

# キーワードインデックスの作成
index = SimpleKeywordTableIndex.from_documents(documents)

ベクトル検索よりも高速で、明確なキーワードマッチが必要な場合に有効です。

20. ナレッジグラフインデックス

エンティティ間の関係性を保持するグラフベースのインデックスです。

pythonfrom llama_index.core import KnowledgeGraphIndex

# ナレッジグラフインデックスの作成
index = KnowledgeGraphIndex.from_documents(
    documents,
    max_triplets_per_chunk=2,
)

関係性を重視した質問（「A と B の関係は？」など）に強みを発揮します。

インデックス保存

作成したインデックスを保存し、再利用することでパフォーマンスとコストを最適化できます。

21. インデックス保存

インデックスをローカルディスクに保存します。

python# インデックスの保存
index.storage_context.persist(persist_dir="./storage")

一度作成したインデックスを保存しておくことで、再実行時に時間とコストを節約できます。

22. インデックス読込

保存されたインデックスを読み込みます。

pythonfrom llama_index.core import StorageContext, load_index_from_storage

# 保存されたインデックスの読込
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

アプリケーション起動時にこのコードを使用することで、即座にクエリ可能な状態にできます。

23. クラウド保存

S3 などのクラウドストレージにインデックスを保存します。

pythonfrom llama_index.storage.storage_context import StorageContext
from llama_index.storage.docstore.s3_docstore import S3DocStore
from llama_index.storage.index_store.s3_index_store import S3IndexStore

# S3 への保存設定
storage_context = StorageContext.from_defaults(
    docstore=S3DocStore(bucket="your-bucket"),
    index_store=S3IndexStore(bucket="your-bucket"),
)

# インデックスの作成と保存
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
)
storage_context.persist()

本番環境では、複数サーバーからアクセス可能なクラウドストレージが推奨されます。

クエリ実行

インデックスが準備できたら、実際に質問を投げかけて回答を取得します。さまざまなクエリ方法を理解することで、用途に応じた最適な実装が可能になります。

24. 基本クエリ

最もシンプルな質問応答の実装です。

python# クエリエンジンの作成
query_engine = index.as_query_engine()

# 質問の実行
response = query_engine.query("LlamaIndex とは何ですか？")
print(response)

この 3 行で、完全な RAG システムが動作します。シンプルですが、非常に強力です。

25. ストリーミングクエリ

回答をリアルタイムでストリーミング出力します。

python# ストリーミングモードの有効化
query_engine = index.as_query_engine(streaming=True)

# ストリーミング応答
response = query_engine.query("RAG の仕組みを説明してください")

# リアルタイム出力
for token in response.response_gen:
    print(token, end="", flush=True)

ユーザー体験を向上させたい場合、回答が生成され次第表示されるため、待ち時間が短く感じられます。

26. メタデータフィルタ

特定の条件に合致するドキュメントのみを検索対象とします。

pythonfrom llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter

# メタデータフィルタの設定
filters = MetadataFilters(
    filters=[
        ExactMatchFilter(key="category", value="技術")
    ]
)

# フィルタ付きクエリ
query_engine = index.as_query_engine(filters=filters)
response = query_engine.query("最新の技術トレンドは？")

大規模なドキュメントセットから特定のカテゴリのみを対象に検索したい場合に有効です。

27. Top-K 検索

取得する関連ドキュメントの数を制御します。

python# Top-K の設定（上位3件のみ取得）
query_engine = index.as_query_engine(
    similarity_top_k=3
)

response = query_engine.query("RAG のメリットは？")

関連性の高い上位のみを使用することで、回答の精度と速度のバランスを調整できます。

28. 類似度しきい値

一定以上の類似度を持つドキュメントのみを使用します。

pythonfrom llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

# 類似度しきい値の設定
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
    similarity_cutoff=0.7,  # 70%以上の類似度のみ
)

query_engine = RetrieverQueryEngine(retriever=retriever)
response = query_engine.query("データの前処理方法は？")

関連性の低いドキュメントを除外することで、より正確な回答を生成できます。

29. ハイブリッド検索

キーワード検索とベクトル検索を組み合わせます。

pythonfrom llama_index.core.retrievers import (
    VectorIndexRetriever,
    KeywordTableSimpleRetriever,
)
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import BaseRetriever

# ベクトル検索
vector_retriever = VectorIndexRetriever(index=vector_index)

# キーワード検索
keyword_retriever = KeywordTableSimpleRetriever(index=keyword_index)

# ハイブリッド検索（カスタムリトリーバーで実装）
from llama_index.core import QueryBundle

class HybridRetriever(BaseRetriever):
    def _retrieve(self, query_bundle: QueryBundle):
        vector_nodes = vector_retriever.retrieve(query_bundle)
        keyword_nodes = keyword_retriever.retrieve(query_bundle)

        # 重複を除去してマージ
        all_nodes = {n.node.node_id: n for n in vector_nodes}
        all_nodes.update({n.node.node_id: n for n in keyword_nodes})

        return list(all_nodes.values())

query_engine = RetrieverQueryEngine(retriever=HybridRetriever())

それぞれの検索方法の強みを活かし、より高精度な検索を実現できます。

カスタマイズ

デフォルト設定でも十分に動作しますが、データの特性や要件に応じてカスタマイズすることで、さらに高い性能を引き出せます。

30. カスタムプロンプト

回答生成時のプロンプトテンプレートを変更します。

pythonfrom llama_index.core import PromptTemplate

# カスタムプロンプトテンプレート
qa_prompt_tmpl = (
    "以下の情報をもとに質問に答えてください。\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "質問: {query_str}\n"
    "回答: "
)

qa_prompt = PromptTemplate(qa_prompt_tmpl)

# プロンプトの適用
query_engine = index.as_query_engine(
    text_qa_template=qa_prompt
)

回答のトーンやスタイルをコントロールしたい場合に、プロンプトのカスタマイズが効果的です。

31. チャンクサイズ設定

テキストを分割する際のサイズを調整します。

pythonfrom llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings

# チャンクサイズの設定
Settings.node_parser = SentenceSplitter(
    chunk_size=1024,  # 1チャンクのトークン数
    chunk_overlap=200,  # 前後のチャンクとの重複
)

# インデックス作成時に適用される
index = VectorStoreIndex.from_documents(documents)

文書の特性に応じて適切なチャンクサイズを設定することで、検索精度が大きく改善されます。

32. オーバーラップ設定

チャンク間の重複を細かく制御します。

pythonfrom llama_index.core.node_parser import SentenceSplitter

# オーバーラップの詳細設定
node_parser = SentenceSplitter(
    chunk_size=512,
    chunk_overlap=128,  # 25%のオーバーラップ
    paragraph_separator="\n\n",  # 段落で優先的に分割
)

index = VectorStoreIndex.from_documents(
    documents,
    transformations=[node_parser],
)

適切なオーバーラップにより、文脈が途切れることを防ぎます。

33. カスタムエンベディング

独自の埋め込みモデルを使用します。

pythonfrom llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

# 日本語に最適化されたモデルの使用
Settings.embed_model = HuggingFaceEmbedding(
    model_name="intfloat/multilingual-e5-large"
)

# インデックス作成
index = VectorStoreIndex.from_documents(documents)

日本語や特定ドメインのデータを扱う場合、専用のエンベディングモデルの使用が効果的です。

34. カスタムリトリーバー

独自の検索ロジックを実装します。

pythonfrom llama_index.core.retrievers import BaseRetriever
from llama_index.core import QueryBundle
from llama_index.core.schema import NodeWithScore

class CustomRetriever(BaseRetriever):
    """カスタム検索ロジックを持つリトリーバー"""

    def __init__(self, index):
        self._index = index

    def _retrieve(self, query_bundle: QueryBundle):
        # 独自の検索ロジックを実装
        nodes = self._index.vector_store.query(
            query_bundle.query_str,
            top_k=5,
        )

        # スコアを再計算
        scored_nodes = []
        for node in nodes:
            # カスタムスコアリング
            custom_score = self._custom_scoring(node, query_bundle)
            scored_nodes.append(
                NodeWithScore(node=node, score=custom_score)
            )

        return sorted(scored_nodes, key=lambda x: x.score, reverse=True)

    def _custom_scoring(self, node, query_bundle):
        # ここにカスタムスコアリングロジックを実装
        return node.score * 1.0

# カスタムリトリーバーの使用
retriever = CustomRetriever(index)
query_engine = RetrieverQueryEngine(retriever=retriever)

ビジネスロジックに応じた独自の検索アルゴリズムを組み込めます。

高度な機能

基本的な RAG を超えて、より洗練されたアプリケーションを構築するための高度な機能を紹介します。

35. チャットエンジン

対話型のチャットインターフェースを構築します。

python# チャットエンジンの作成
chat_engine = index.as_chat_engine()

# 対話の実行
response = chat_engine.chat("こんにちは")
print(response)

response = chat_engine.chat("RAG について教えてください")
print(response)

単発の質問応答ではなく、文脈を保持した会話が可能になります。

36. コンテキストチャット

会話履歴を明示的に管理するチャットです。

pythonfrom llama_index.core.memory import ChatMemoryBuffer

# メモリバッファ付きチャットエンジン
memory = ChatMemoryBuffer.from_defaults(token_limit=3000)

chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt=(
        "あなたは技術文書のアシスタントです。"
        "常に丁寧で正確な回答を心がけてください。"
    ),
)

# 会話
response1 = chat_engine.chat("LlamaIndex とは？")
response2 = chat_engine.chat("それの主な機能は？")  # 文脈を理解

長い会話でも文脈を失わず、自然な対話を実現できます。

37. エージェント作成

ツールを使用できる自律的なエージェントを構築します。

pythonfrom llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# クエリエンジンをツールとしてラップ
query_engine_tools = [
    QueryEngineTool(
        query_engine=index.as_query_engine(),
        metadata=ToolMetadata(
            name="technical_docs",
            description="技術文書を検索するツール",
        ),
    ),
]

# エージェントの作成
agent = ReActAgent.from_tools(query_engine_tools, verbose=True)

# エージェントの実行
response = agent.chat("技術文書から RAG の仕組みを調べて説明して")

エージェントは自動的にツールを選択し、複雑なタスクを実行できます。

38. マルチドキュメント検索

複数のインデックスを統合して検索します。

pythonfrom llama_index.core import VectorStoreIndex
from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine import SubQuestionQueryEngine

# 複数のインデックスを作成
tech_index = VectorStoreIndex.from_documents(tech_docs)
business_index = VectorStoreIndex.from_documents(business_docs)

# それぞれをツール化
query_engine_tools = [
    QueryEngineTool(
        query_engine=tech_index.as_query_engine(),
        metadata=ToolMetadata(
            name="technical",
            description="技術関連の文書",
        ),
    ),
    QueryEngineTool(
        query_engine=business_index.as_query_engine(),
        metadata=ToolMetadata(
            name="business",
            description="ビジネス関連の文書",
        ),
    ),
]

# サブクエスチョンエンジンで統合
query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools
)

response = query_engine.query(
    "技術的な実装とビジネス的な効果の両面から RAG を説明して"
)

異なる種類のドキュメントを横断的に検索し、包括的な回答を生成できます。

39. サブクエスチョン生成

複雑な質問を自動的に分解します。

pythonfrom llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# サブクエスチョンエンジンの設定
query_engine_tools = [
    QueryEngineTool(
        query_engine=index.as_query_engine(),
        metadata=ToolMetadata(
            name="knowledge_base",
            description="技術ナレッジベース",
        ),
    ),
]

query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=query_engine_tools,
    verbose=True,  # サブクエスチョンを表示
)

# 複雑な質問
response = query_engine.query(
    "RAG のメリットとデメリットを比較し、"
    "どのような場合に使用すべきか教えてください"
)

質問が自動的に分解され、それぞれに回答してから統合されます。

評価

RAG システムの品質を定量的に測定することは、継続的な改善に不可欠です。

40. 応答評価

生成された回答の品質を評価します。

pythonfrom llama_index.core.evaluation import FaithfulnessEvaluator
from llama_index.llms.openai import OpenAI

# 評価用 LLM の設定
llm = OpenAI(model="gpt-4", temperature=0)

# 忠実度評価器の作成
evaluator = FaithfulnessEvaluator(llm=llm)

# クエリの実行
query_engine = index.as_query_engine()
response = query_engine.query("RAG とは？")

# 評価の実行
eval_result = evaluator.evaluate_response(response=response)
print(f"評価スコア: {eval_result.score}")
print(f"フィードバック: {eval_result.feedback}")

自動評価により、システムの改善点を客観的に把握できます。

41. 忠実度評価

回答がソースに忠実かどうかを評価します。

pythonfrom llama_index.core.evaluation import FaithfulnessEvaluator

# 忠実度評価の実行
evaluator = FaithfulnessEvaluator()

# 複数の質問で評価
queries = [
    "LlamaIndex の主な機能は？",
    "RAG のメリットは？",
    "インデックスの種類は？",
]

for query in queries:
    response = query_engine.query(query)
    eval_result = evaluator.evaluate_response(response=response)

    print(f"質問: {query}")
    print(f"忠実度: {eval_result.passing}")
    print("---")

ハルシネーション（事実に基づかない生成）を検出するのに有効です。

42. 関連性評価

検索されたドキュメントの関連性を評価します。

pythonfrom llama_index.core.evaluation import RelevancyEvaluator

# 関連性評価器の作成
evaluator = RelevancyEvaluator()

# 評価の実行
query = "RAG システムの構築方法は？"
response = query_engine.query(query)

eval_result = evaluator.evaluate_response(
    query=query,
    response=response
)

print(f"関連性スコア: {eval_result.score}")

検索精度を向上させるための指標として活用できます。

最適化

本番環境でのパフォーマンスとコストを最適化するためのテクニックです。

43. キャッシュ設定

応答をキャッシュして高速化します。

pythonfrom llama_index.core import set_global_handler

# キャッシュの有効化
set_global_handler("simple")

# OpenAI のキャッシュを使用
import openai
from functools import lru_cache

@lru_cache(maxsize=100)
def cached_query(query_text):
    query_engine = index.as_query_engine()
    return query_engine.query(query_text)

# キャッシュされたクエリの実行
response = cached_query("RAG とは？")

同じ質問に対して即座に回答を返せるため、レスポンス時間とコストを大幅に削減できます。

44. バッチ処理

複数のクエリを効率的に処理します。

pythonimport asyncio
from llama_index.core import VectorStoreIndex

# 非同期クエリの実装
async def batch_query(queries):
    query_engine = index.as_query_engine()

    # 並列クエリの実行
    tasks = [
        query_engine.aquery(query)
        for query in queries
    ]

    responses = await asyncio.gather(*tasks)
    return responses

# バッチクエリの実行
queries = [
    "RAG とは？",
    "LlamaIndex の特徴は？",
    "インデックスの種類は？",
]

responses = asyncio.run(batch_query(queries))

for query, response in zip(queries, responses):
    print(f"Q: {query}")
    print(f"A: {response}\n")

大量のクエリを処理する際に、処理時間を大幅に短縮できます。

45. 並列処理

インデックス作成を並列化します。

pythonfrom llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
import multiprocessing

def create_index_for_docs(doc_chunk):
    """ドキュメントチャンク用のインデックス作成"""
    return VectorStoreIndex.from_documents(doc_chunk)

# ドキュメントを分割
num_processes = multiprocessing.cpu_count()
doc_chunks = [
    documents[i::num_processes]
    for i in range(num_processes)
]

# 並列でインデックス作成
with multiprocessing.Pool(processes=num_processes) as pool:
    indices = pool.map(create_index_for_docs, doc_chunks)

# インデックスの統合（必要に応じて）
print(f"{len(indices)} 個のインデックスを作成しました")

大規模なドキュメントセットでも、効率的にインデックスを作成できます。

ベクトル DB

外部のベクトルデータベースを使用することで、スケーラビリティと検索性能を大幅に向上できます。

46. Pinecone 統合

Pinecone をバックエンドとして使用します。

pythonimport pinecone
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core import VectorStoreIndex, StorageContext

# Pinecone の初期化
pinecone.init(
    api_key="your-api-key",
    environment="your-environment"
)

# インデックスの作成
pinecone_index = pinecone.Index("your-index-name")

# ベクトルストアの設定
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# インデックス作成
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

Pinecone は高速で、数十億規模のベクトルにも対応できます。

47. ChromaDB 統合

ローカルで動作する軽量なベクトルデータベースです。

pythonimport chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import VectorStoreIndex, StorageContext

# Chroma クライアントの作成
chroma_client = chromadb.PersistentClient(path="./chroma_db")

# コレクションの取得
chroma_collection = chroma_client.get_or_create_collection("my_collection")

# ベクトルストアの設定
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# インデックス作成
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

開発環境や小規模なプロジェクトに最適で、セットアップが簡単です。

48. Weaviate 統合

GraphQL をサポートするベクトルデータベースです。

pythonimport weaviate
from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.core import VectorStoreIndex, StorageContext

# Weaviate クライアントの作成
client = weaviate.Client(
    url="http://localhost:8080",
)

# ベクトルストアの設定
vector_store = WeaviateVectorStore(
    weaviate_client=client,
    index_name="Documents",
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

# インデックス作成
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

複雑なフィルタリングやハイブリッド検索に強みがあります。

デバッグ

開発中やトラブルシューティング時に役立つデバッグツールです。

49. コールバック設定

処理過程を詳細に可視化します。

pythonfrom llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
from llama_index.core import Settings

# デバッグハンドラの設定
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

Settings.callback_manager = callback_manager

# クエリの実行（詳細なトレースが出力される）
query_engine = index.as_query_engine()
response = query_engine.query("RAG の仕組みは？")

# イベントの確認
print("\n--- イベント一覧 ---")
for event in llama_debug.get_events():
    print(f"{event.event_type}: {event.payload}")

どのコンポーネントがどの順序で呼ばれているかを把握でき、問題の特定が容易になります。

50. コスト追跡

API 使用量とコストを監視します。

pythonfrom llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.core import Settings
import tiktoken

# トークンカウンティングハンドラの設定
token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)

callback_manager = CallbackManager([token_counter])
Settings.callback_manager = callback_manager

# クエリの実行
query_engine = index.as_query_engine()
response = query_engine.query("LlamaIndex とは？")

# トークン数とコストの表示
print(f"プロンプトトークン数: {token_counter.prompt_llm_token_count}")
print(f"応答トークン数: {token_counter.completion_llm_token_count}")
print(f"総トークン数: {token_counter.total_llm_token_count}")

# コスト計算（例：gpt-3.5-turbo の場合）
prompt_cost = token_counter.prompt_llm_token_count * 0.0015 / 1000
completion_cost = token_counter.completion_llm_token_count * 0.002 / 1000
total_cost = prompt_cost + completion_cost

print(f"推定コスト: ${total_cost:.4f}")

本番運用前にコストを見積もり、予算内に収めるための調整ができます。

まとめ

本記事では、LlamaIndex を使った RAG システム構築に必要な 50 個のスニペットを紹介しました。これらのスニペットは、以下のような幅広いユースケースをカバーしています。

基本的な機能

セットアップと設定
多様なデータソースからの読込
各種インデックスの作成と保存
柔軟なクエリ実行

高度な機能

対話型チャットエンジン
マルチドキュメント検索
エージェントシステム

運用・最適化

パフォーマンス最適化
ベクトルデータベース統合
評価とデバッグ

RAG システムの構築は、これらのスニペットを組み合わせることで、わずか数行のコードから始められます。まずは基本的なスニペットから試して、徐々に高度な機能を追加していくことをお勧めします。

各スニペットは独立して動作するよう設計されているため、必要なものだけを選んで使用できます。プロジェクトの要件に応じてカスタマイズし、最適な RAG システムを構築してください。