AI Library
Books for Reading AI
Choose a book, then read it in order from the table of contents.
[AI Library] Chapter 11: Hybrid Search and Semantic Storage: Building Long-Term Memory
Mastering Claude Code
Chapter 11: Hybrid Search and Semantic Storage: Building Long-Term Memory
Kim Kyung-jin
Mastering Claude Code
Introduction
I handed a 68-page vacuum cleaner manual PDF to Claude Code and asked, "How do I clean the filter?" The agent performed a search, then showed me step-by-step instructions in text. Below that appeared a parts diagram image from the manual. The agent made its own judgment that for physical devices, pictures could be far clearer than words.
The 68-page document could not fit entirely within the agent's single read window. So how did it find the exact information on the exact page? The answer lies in Retrieval-Augmented Generation.
What is Retrieval-Augmented Generation?
AI agents face a fundamental constraint: they cannot know information absent from their training data. A company's internal manuals, meeting notes from last week, field photos taken this morning,such data never existed in the model's training. You could fit all documents into a single read window, but past a few dozen pages, you hit the token limit the model can process.
Retrieval-Augmented Generation is an architecture that sidesteps this problem. It works in three stages.
Retrieval: The agent searches an external data repository for information relevant to the user's question. Rather than reading the entire document, it extracts only the pieces that match the query.
Augmentation: The retrieved fragments are attached to the agent's prompt. The agent can now draw on information it originally did not possess, as if it had just read it.
Generation: The agent composes an answer based on the enriched context. The agent's reasoning fuses with external data to produce a response neither could deliver alone.
[Figure 11-1: Three stages of Retrieval-Augmented Generation: Retrieval → Augmentation → Generation flowchart]
Think of it as an open-book exam. The student (the agent) need not memorize every textbook. During the test, the student can flip to the relevant page. Yet the student must master two things: the ability to quickly decide which page to open, and the ability to combine that page's information with existing knowledge to construct an answer. These skills belong to the student.
In Retrieval-Augmented Generation, the technology that decides "which page to open" is embedding.
The Concept of Embedding
Embedding is the process of converting text into numerical vectors. The word "vector" may sound mathematical, but the core idea is simple: representing a sentence's meaning as a list of numbers.
The sentence "I want to drink a cup of coffee" and the sentence "I want to order a cappuccino" use different words. Yet their meanings are similar. An embedding model converts both sentences into comparable numerical vectors. By contrast, "The stock market fell today" transforms into a completely different vector. The meaning differs.
When these vectors are arranged in multidimensional space, semantically similar sentences cluster close together, while semantically different ones drift far apart. This is how semantic similarity-based search works. When a user asks "filter cleaning method," the system converts the question into a vector, then finds the closest vectors among pre-stored document fragment vectors.
Keywords need not match exactly. If the meaning aligns, the search returns it. A document labeled "dust cover washing procedure" can surface as an answer to "filter cleaning method."
[Figure 11-2: Visualization of similar sentences forming clusters in embedding space]
Summarizing embedding's role in Retrieval-Augmented Generation: Divide documents into small fragments, called chunks. Pass each chunk through an embedding model to convert it into a vector. Store the resulting vectors in a database. When a question arrives, convert it to a vector as well. Find the stored vectors closest to the question vector and deliver them to the agent.
Using Google Gemini Multimodal Embedding
So far, we have discussed text. But real-world data is not text alone. Manuals contain assembly diagrams. Field reports include photographs. Educational materials embed video.
Google's Gemini Embedding 2 is a multimodal embedding model that places text, images, video, and audio into the same vector space.
Let us examine the changes this model brings through concrete examples.
Vacuum cleaner manual example: Embed the entire 68-page PDF. Not only text fragments but diagram images are converted to vectors. When asked "filter cleaning method," the system returns the text explanation alongside the corresponding diagram image. You can now see component locations in the image that text alone would struggle to convey.
Roof repair company example: Embed thirteen past project photographs. Each photo carries metadata,cost, duration, workforce size. Upload a new roof photo, and the system returns five similar past projects with similarity scores. These serve as references for preparing estimates.
[Figure 11-3: Multimodal embedding space: 2D visualization of text, images, and video positioned by meaning]
The power of multimodal embedding lies in different data types coexisting in the same space. A smiley-face fries photograph lands in the "food" category, a dog playing guitar video in the "entertainment" category, a Claude Code tutorial in the "technology" category. Though data types differ, AI grasps the meaning and places each in the right location. If all data were roof photographs, the system auto-classifies them into subcategories: flood damage, age deterioration, structural defects.
As of now, video supports MP4 or MOV files up to 120 seconds long. Images handle up to 6 PNG or JPEG files per request. Audio is also supported; providing accurate descriptive metadata alongside audio improves search accuracy.
Hands-On Practice: Connecting Pinecone Semantic Data Storage
Embedded vectors must be stored and searchable somewhere. That repository is a vector database, and Pinecone ranks among the most widely used semantic data storage services.
The overall flow of this exercise is as follows.
Step 1: Environment Setup
In VS Code, create a new folder and open Claude Code. Switch to Plan Mode and give the agent this instruction.
The agent designs the project structure, catalogs dependencies, and presents a step-by-step plan.
Three API keys are needed. Pinecone accesses the vector store; Gemini calls the embedding model; OpenRouter accesses the chat model for answer generation. Pinecone offers a free starter plan, Gemini API keys come from Google AI Studio, and OpenRouter provides access to multiple models through a single API endpoint.
Step 2: Data Embedding
Enter the three keys into your .env file and save. Place the files you want to embed into a data folder. Mix text files, images, and video,no problem. Tell the agent, "The data is ready, put it in Pinecone." The agent creates a Pinecone index and embeds each file for storage.
During this process, the agent recognizes each file's type and applies the appropriate embedding method. Text is split into chunks and embedded. Images are converted to vectors capturing visual meaning. Videos are analyzed for frames and audio, then converted to vectors.
[Figure 11-4: Exercise pipeline: Original files → Embedding model → Pinecone index]
Step 3: Building a Chat Interface
Ask the agent, "Build me a chat web app I can test on my local machine." The agent constructs the web application and runs it on localhost. Type a question in the browser; Pinecone searches for relevant vectors, and the agent generates an answer based on what it finds.
In actual validation, when asked 'How should we procure a workflow client?', it finds the relevant content in a text file and answers. When requested 'Show me a video of a golden retriever playing guitar', it locates that video's metadata and plays it inline.
Step 4: Iterative Improvement
The first result may not be perfect. Images might not be returned, or video descriptions could be incomplete. When you describe the problem to the agent, it enriches the metadata or fixes the app. Request 'Add better descriptions to the videos and re-embed them', and it deletes the existing vectors, then saves them anew with improved metadata.
This entire process happens within 30 minutes. Building the same multimodal vector repository in a no-code tool like n8n would take hours to days. You must manually configure chunking strategy, image capture and storage methods, and search result formatting. Claude Code handles all of this with natural language instructions alone.
How Hybrid Search and Semantic Storage Transform Agent Workflows
An agent without hybrid search and semantic storage answers only within its training scope. It references only information that fits in its reading range at any moment. The world beyond that might as well not exist.
An agent equipped with hybrid search and semantic storage is different. It can read a company's 68-page manual. It can search hundreds of construction photographs. It can find the context of a specific decision in last quarter's meeting minutes. The agent's knowledge expands beyond its training data to include all data the organization holds.
Hybrid search and semantic storage serves multiple scenarios in agent workflows.
Customer support automation: Embed product manuals and FAQs, then generate answers that reference exact pages and images for customer questions. You can also search past ticket records to answer 'Have we received similar inquiries before?'
Internal knowledge management: Embed the team's project documents, decision logs, and brand guidelines. When a new team member asks 'What are our company's logo usage rules?', it finds the relevant section in the brand guidelines and answers.
Research assistance: Embed academic papers, reports, and market research materials. When asked 'What are recent trends in this field?', it searches relevant materials and generates a summary. It provides original sources and confidence scores, making fact-checking possible.
[Figure 11-5] Diagram comparing agent knowledge scope before and after implementing hybrid search and semantic storage.
What matters here is subject matter expertise. More than technical skill in building a hybrid search and semantic storage pipeline, result quality depends on how you describe the data and in what way. As the roof repair example showed, sparse metadata on a photo means sparse search results.
A photo described as 'This shows hail damage on a 10-year-old asphalt shingle roof; repair cost was 4.5 million won, took 3 days' and a photo tagged only as 'roof photo' have vastly different search utility.
The value of technical implementation skill is diminishing. We watched Claude Code build in 30 minutes what took days in n8n. Agents handle technical details,composing JSON, configuring HTTP requests. But clearly describing a process, precisely articulating what data means, spotting gaps and naming them remain human work.
Giving agents long-term memory meant connecting a database. Now let's look at how to give agents skilled techniques they can perform repeatedly,reusable patterns of behavior that, once taught, run at consistent quality any time.
AI Specialist Attorney Kim Kyung-jin
Specialist in AI law policy. Former member of the National Assembly. Author of multiple books.
If this book has been at your side, however briefly, support it so the next story can reach the world.
(Voluntary support account: NH Bank 302-1096-0948-81 Account holder: Kim Kyung-jin)
Kim Kyung-jin
Attorney · Former Member of the National Assembly · AI Policy Researcher
© 2026 Kim Kyung-jin. All rights reserved.