Implementing Pinecone Vector Search for Blog Content

Introduction

To enhance our blog's search capabilities, we implemented semantic search using Pinecone's vector database and embeddings. This allows us to find content based on meaning rather than just keywords.

Architecture Overview

Our search architecture combines several key components:

Pinecone for vector storage and similarity search
Pinecone's integrated embedding service
Pinecone's reranking model
Lambda@Edge for secure API access
CloudFront for edge distribution
API Gateway for the search endpoint
GitHub Actions for automated deployment including index creation and embedding generation

Create Index

Before we can use the search API, we need to create the Pinecone index. This can be done by running the following command:

import { Pinecone } from "@pinecone-database/pinecone";
import { config } from "dotenv";
 
config();
 
export async function createPineconeIndex() {
  try {
    const pinecone = new Pinecone({
      apiKey: process.env.PINECONE_API_KEY!,
    });
 
    const indexName = process.env.PINECONE_INDEX_NAME!;
 
    // Check if index exists
    const hasIndex = await pinecone.describeIndex(indexName).catch(() => false);
    if (hasIndex) {
      console.log(`Index ${indexName} already exists`);
      return;
    }
 
    await pinecone.createIndex({
      name: indexName,
      dimension: 1024,
      metric: "cosine",
      spec: {
        serverless: {
          cloud: "aws",
          region: "us-east-1",
        },
      },
    });
 
    console.log(`Created index: ${indexName}`);
  } catch (error) {
    console.error("Error creating Pinecone index:", error);
    throw error;
  }
}
 
createPineconeIndex();

We will run this script as part of the GitHub actions workflow to ensure that the index is created before the blog posts are generated.

Generating embeddings

During deployment of the blog, we generated embeddings for all of the blog posts. This was done by running the following command:

npx ts-node apps/infra/bin/app.ts generate-embeddings

The script will use the multilingual-e5-large model to generate embeddings for all of the blog posts that are generated by Velite. It will then upsert the vectors into the Pinecone index.

import { Pinecone } from "@pinecone-database/pinecone";
import type { RecordMetadata } from "@pinecone-database/pinecone";
import { readFileSync, existsSync } from "fs";
import { join } from "path";
import { config } from "dotenv";
 
config();
 
// Initialize Pinecone client
const pc = new Pinecone({
  apiKey: process.env.PINECONE_API_KEY!,
});
 
// Define Velite blog post structure based on the schema
interface VeliteBlogPost {
  slug: string;
  title: string;
  description: string;
  date: string;
  published: boolean;
  image: string;
  author: string;
  body: string;
  tags?: string[];
  categories?: string[];
}
 
// Define metadata that matches Pinecone's requirements
interface BlogMetadata extends RecordMetadata {
  slug: string;
  title: string;
  description: string;
  date: string;
  published: boolean;
  image: string;
  author: string;
  tags: string;
  categories: string;
}
 
async function getBlogPosts(): Promise<VeliteBlogPost[]> {
  try {
    const blogsPath = join(process.cwd(), "../web/.velite/blogs.json");
 
    if (!existsSync(blogsPath)) {
      console.error(`Blog data not found at ${blogsPath}`);
      console.log("Make sure you've built the web app first with 'yarn build'");
      return [];
    }
 
    const blogsJson = readFileSync(blogsPath, "utf-8");
    const blogs = JSON.parse(blogsJson);
    console.log(`Found ${blogs.length} blog posts`);
    return blogs;
  } catch (error) {
    console.error("Error reading blog posts:", error);
    if (error instanceof Error) {
      console.error("Details:", error.message);
    }
    return [];
  }
}
 
function prepareBlogForPinecone(post: VeliteBlogPost): BlogMetadata {
  return {
    slug: post.slug,
    title: post.title,
    description: post.description,
    date: post.date,
    published: post.published ?? true,
    image: post.image,
    author: post.author,
    tags: (post.tags || []).join(","),
    categories: (post.categories || []).join(","),
  };
}
 
export async function generateBlogEmbeddings() {
  try {
    const posts = await getBlogPosts();
    console.log(`Processing ${posts.length} blog posts...`);
 
    // Process posts in batches to avoid rate limits
    const batchSize = 10;
    for (let i = 0; i < posts.length; i += batchSize) {
      const batch = posts.slice(i, i + batchSize);
 
      // Prepare text for embedding (still use full content for embedding)
      const textsToEmbed = batch.map(
        (post) => `${post.title} ${post.description} ${post.body}`,
      );
 
      // Generate embeddings
      const embeddings = await pc.inference.embed(
        "multilingual-e5-large",
        textsToEmbed,
        { inputType: "passage", truncate: "END" },
      );
 
      // Target the index
      const index = pc.index(process.env.PINECONE_INDEX_NAME!);
 
      // Prepare records for upsert, filtering out any undefined embeddings
      const records = batch
        .map((post, idx) => {
          const values = embeddings[idx]?.values;
          if (!values) {
            console.warn(`No embedding generated for post: ${post.slug}`);
            return null;
          }
          return {
            id: post.slug,
            values,
            metadata: prepareBlogForPinecone(post),
          };
        })
        .filter(
          (
            record,
          ): record is {
            id: string;
            values: number[];
            metadata: BlogMetadata;
          } => record !== null,
        );
 
      if (records.length > 0) {
        // Upsert the vectors
        await index.upsert(records);
 
        console.log(
          `Processed batch ${Math.floor(i / batchSize) + 1} of ${Math.ceil(
            posts.length / batchSize,
          )}...`,
        );
      }
    }
 
    console.log("Successfully generated embeddings for all blog posts");
  } catch (error) {
    console.error("Error generating embeddings:", error);
    throw error;
  }
}
 
generateBlogEmbeddings();

Once we have the index created and embeddings generated, we need a way to interact with the index. To do this, we will create a dedicated construct for search infrastructure in lib/constructs/search-construct.ts. This construct will create an API Gateway and Lambda function that will be used to query the Pinecone index.

import {
  RestApi,
  LambdaIntegration,
  AuthorizationType,
} from "aws-cdk-lib/aws-apigateway";
import { Runtime } from "aws-cdk-lib/aws-lambda";
import { NodejsFunction } from "aws-cdk-lib/aws-lambda-nodejs";
import { Construct } from "constructs";
import { join } from "path";
import { ISecret, Secret } from "aws-cdk-lib/aws-secretsmanager";
import { Duration } from "aws-cdk-lib";
 
interface SearchConstructProps {
  domainName: string;
  pineconeIndexName: string;
  pineconeApiKey: ISecret;
}
 
export class SearchConstruct extends Construct {
  public readonly api: RestApi;
 
  constructor(scope: Construct, id: string, props: SearchConstructProps) {
    super(scope, id);
 
    // Reference existing secret
    const pineconeSecret = Secret.fromSecretNameV2(
      this,
      "PineconeSecret",
      "pinecone/api-key",
    );
 
    // Create search function
    const searchFunction = new NodejsFunction(this, "SearchFunction", {
      runtime: Runtime.NODEJS_18_X,
      handler: "handler",
      timeout: Duration.seconds(30),
      entry: join(__dirname, "../lambda/search/index.ts"),
      environment: {
        PINECONE_INDEX_NAME: props.pineconeIndexName,
        PINECONE_SECRET_NAME: pineconeSecret.secretName,
      },
    });
 
    // Grant the Lambda function permission to read the secret
    pineconeSecret.grantRead(searchFunction);
 
    // Create API Gateway
    this.api = new RestApi(this, "SearchApi", {
      defaultCorsPreflightOptions: {
        allowOrigins: [`https://${props.domainName}`],
        allowMethods: ["GET", "POST"],
        allowHeaders: ["Content-Type", "x-api-key", "Authorization"],
      },
    });
 
    // Add search endpoint
    const search = this.api.root.addResource("search");
    search.addMethod("GET", new LambdaIntegration(searchFunction), {
      authorizer,
      authorizationType: AuthorizationType.CUSTOM,
      apiKeyRequired: false,
    });
  }
}

Search Lambda Function

Implement the search functionality in lib/lambda/search/index.ts:

import { APIGatewayProxyHandler } from "aws-lambda";
import { Pinecone } from "@pinecone-database/pinecone";
import {
  SecretsManagerClient,
  GetSecretValueCommand,
} from "@aws-sdk/client-secrets-manager";
 
// Initialize clients outside handler for connection reuse
const secretsManager = new SecretsManagerClient({ region: "us-east-1" });
let pineconeClient: Pinecone | null = null;
 
async function getPineconeClient(): Promise<Pinecone> {
  // Reuse existing client if available
  if (pineconeClient) {
    return pineconeClient;
  }
 
  try {
    const command = new GetSecretValueCommand({
      SecretId: process.env.PINECONE_SECRET_NAME,
    });
 
    const response = await secretsManager.send(command);
    const secretJson = JSON.parse(response.SecretString || "{}");
    const apiKey = secretJson.apiKey;
 
    if (!apiKey) {
      throw new Error("API key not found in secret");
    }
 
    // Create and cache the client
    pineconeClient = new Pinecone({
      apiKey,
    });
 
    return pineconeClient;
  } catch (error) {
    console.error("Error getting Pinecone client:", error);
    throw error;
  }
}
 
export const handler: APIGatewayProxyHandler = async (event) => {
  try {
    if (!event.queryStringParameters?.q) {
      return {
        statusCode: 400,
        body: JSON.stringify({ error: "No search query provided" }),
      };
    }
 
    const pinecone = await getPineconeClient();
 
    // Run index retrieval and embedding generation in parallel
    const [queryEmbedding, index] = await Promise.all([
      pinecone.inference.embed(
        "multilingual-e5-large",
        [event.queryStringParameters.q],
        {
          inputType: "query",
        },
      ),
      pinecone.index(
        process.env.PINECONE_INDEX_NAME!,
        process.env.PINECONE_INDEX_HOST!,
      ),
    ]);
 
    const values = queryEmbedding[0]?.values;
    if (!values) {
      return {
        statusCode: 500,
        body: JSON.stringify({ error: "Failed to generate embedding" }),
      };
    }
 
    // Search the index with metadata - get more results for reranking
    const vectorResults = await index.query({
      vector: values,
      topK: 20, // Increased to get more candidates for reranking
      includeMetadata: true,
      filter: {
        published: { $eq: true },
      },
    });
 
    // Prepare documents for reranking
    const documents =
      vectorResults.matches?.map((match) => ({
        id: match.id,
        text: `${match.metadata?.title} ${match.metadata?.description}`,
      })) || [];
 
    // Rerank the results with proper options
    const rerankedResults = await pinecone.inference.rerank(
      "bge-reranker-v2-m3",
      event.queryStringParameters.q,
      documents,
      {
        topN: 5,
        returnDocuments: true,
        parameters: {
          truncate: "END",
        },
      },
    );
 
    // Sort by score and return results
    const finalResults = rerankedResults.data
      .sort((a, b) => b.score - a.score)
      .map((result) => ({
        ...result.document,
        score: result.score,
      }));
 
    return {
      statusCode: 200,
      headers: {
        "Content-Type": "application/json",
        "Access-Control-Allow-Origin": "*",
      },
      body: JSON.stringify(finalResults),
    };
  } catch (error) {
    console.error("Search error:", error);
    if (error instanceof Error) {
      console.error("Error details:", {
        name: error.name,
        message: error.message,
        stack: error.stack,
      });
    }
    return {
      statusCode: 500,
      body: JSON.stringify({ error: "Internal server error" }),
    };
  }
};

Now, when our front end makes a request to the search endpoint, the Lambda function will be invoked and the Pinecone index will be queried. The results will be returned to the front end in a JSON format.

Advanced Search with Reranking

Our search implementation uses a two-stage approach:

Vector Search

First, we use vector embeddings to find semantically similar content

Reranking

Then, we use a specialized reranking model to improve result relevance

// First stage: Vector search
const vectorResults = await index.query({
  vector: values,
  topK: 20, // Get more candidates for reranking
  includeMetadata: true,
  filter: {
    published: { $eq: true },
  },
});
 
// Second stage: Reranking
const rerankedResults = await pinecone.inference.rerank(
  "bge-reranker-v2-m3",
  event.queryStringParameters.q,
  vectorResults.matches?.map((match) => ({
    text: `${match.metadata?.title} ${match.metadata?.description}`,
    id: match.metadata?.slug,
  })),
);

This approach provides several benefits:

Better Relevance: Reranking fine-tunes the initial vector search results Context Awareness: The reranker considers the full text context Flexible Scoring: Combines semantic similarity with contextual relevance

Front End Integration

Once we have the search API deployed, we need to update the front end to use it. This involves updating the search component to make requests to the new endpoint and displaying the results.

"use client";
 
import { useEffect, useState, useCallback } from "react";
import { Search as SearchIcon, X } from "lucide-react";
import { useDebounce } from "@/hooks/useDebounce";
import { useSearch } from "@/store/search";
import { cn } from "@/lib/utils";
import type { SearchResponse } from "@/types/search";
 
export function Search() {
  const [query, setQuery] = useState("");
  const [isLoading, setIsLoading] = useState(false);
  const debouncedQuery = useDebounce(query, 300);
  const { setSearchResults } = useSearch();
 
  const searchPosts = useCallback(
    async (searchQuery: string) => {
      if (!searchQuery.trim()) {
        setSearchResults([]);
        return;
      }
 
      setIsLoading(true);
      try {
        const response = await fetch(
          `/search?q=${encodeURIComponent(searchQuery)}`,
        );
        const data = (await response.json()) as SearchResponse;
        setSearchResults(data.matches || []);
      } catch (error) {
        console.error("Search error:", error);
        setSearchResults([]);
      } finally {
        setIsLoading(false);
      }
    },
    [setSearchResults],
  );
 
  useEffect(() => {
    searchPosts(debouncedQuery);
  }, [debouncedQuery, searchPosts]);
 
  return (
    <div className="relative">
      <SearchIcon className="absolute left-3 top-1/2 size-4 -translate-y-1/2 text-muted-foreground" />
      <input
        className={cn(
          "w-full rounded-md border bg-background px-10 py-2",
          "placeholder:text-muted-foreground focus:outline-none focus:ring-2 focus:ring-primary",
          query && "pr-10",
        )}
        placeholder="Search posts..."
        type="search"
        value={query}
        onChange={(e) => setQuery(e.target.value)}
      />
      {query && (
        <button
          onClick={() => setQuery("")}
          className="absolute right-3 top-1/2 -translate-y-1/2 text-muted-foreground hover:text-foreground"
          aria-label="Clear search"
        >
          <X className="size-4" />
        </button>
      )}
      {isLoading && (
        <div className="absolute right-3 top-1/2 -translate-y-1/2">
          <div className="animate-spin h-4 w-4 border-2 border-primary border-t-transparent rounded-full" />
        </div>
      )}
    </div>
  );
}

As a user types in the search input, the searchPosts function will be called with the current query. This function will make a request to the search endpoint and update the search results. The results will be displayed in the search results component.

Front End Result Quality Handling

We've implemented a sophisticated approach to handling search results based on their relevance scores:

const SCORE_THRESHOLD = 0.5; // High quality threshold
const LOW_SCORE_THRESHOLD = 0.01; // Minimum acceptable score
 
// Separate results by score and sort them
const highQualityResults = searchResults
  .filter((result) => result.score > SCORE_THRESHOLD)
  .sort((a, b) => b.score - a.score) // Sort by score descending
  .map(searchResultToBlogLike);
 
const lowQualityResults = searchResults
  .filter(
    (result) =>
      result.score <= SCORE_THRESHOLD && result.score > LOW_SCORE_THRESHOLD,
  )
  .sort((a, b) => b.score - a.score) // Sort by score descending
  .map(searchResultToBlogLike);
 
// Show all search results if we have any, otherwise show all blogs
const postsToDisplay =
  searchResults.length > 0
    ? [...highQualityResults, ...lowQualityResults] // High quality results appear first
    : blogs;

This approach provides several benefits:

Results are sorted by relevance score
High-quality matches (score > 0.5) are shown first
Lower quality matches are still shown but with appropriate messaging
Results below a minimum threshold are filtered out

Loading States and User Feedback

We use Zustand to manage search state and loading indicators:

interface SearchStore {
  searchResults: SearchResult[];
  query: string;
  isLoading: boolean;
  setSearchResults: (results: SearchResult[]) => void;
  setQuery: (query: string) => void;
  setIsLoading: (isLoading: boolean) => void;
  clearSearch: () => void;
}
 
// Store implementation
export const useSearch = create<SearchStore>()(
  persist(
    (set) => ({
      searchResults: [],
      query: "",
      isLoading: false,
      setSearchResults: (results) => set({ searchResults: results }),
      setQuery: (query) => set({ query }),
      setIsLoading: (isLoading) => set({ isLoading }),
      clearSearch: () =>
        set({ searchResults: [], query: "", isLoading: false }),
    }),
    {
      name: "search-storage",
    },
  ),
);

The UI provides clear feedback about result quality:

{
  query && !isLoading && (
    <>
      {searchResults.length > 0 ? (
        <div className="mb-8">
          {highQualityResults.length > 0 ? (
            <div>
              <p className="text-sm text-muted-foreground">
                Found {highQualityResults.length} relevant results
              </p>
              {lowQualityResults.length > 0 && (
                <p className="text-sm text-yellow-800 dark:text-yellow-200 mt-1">
                  Also showing {lowQualityResults.length} additional posts that
                  might be less relevant
                </p>
              )}
            </div>
          ) : (
            <div className="rounded-md bg-yellow-50 dark:bg-yellow-900/10 p-4 mb-4">
              <p className="text-sm text-yellow-800 dark:text-yellow-200">
                Found {lowQualityResults.length} posts that might be related to
                your search, but they may not be exactly what you're looking for
              </p>
            </div>
          )}
        </div>
      ) : (
        <div className="rounded-md bg-yellow-50 dark:bg-yellow-900/10 p-4 mb-4">
          <p className="text-sm text-yellow-800 dark:text-yellow-200">
            No results found for your search. Try different keywords or browse
            all posts below.
          </p>
        </div>
      )}
    </>
  );
}

This provides a better user experience by:

Showing a loading indicator during searches
Being transparent about result quality
Sorting results by relevance
Providing helpful messaging for different result scenarios
Maintaining context with visual indicators for different result types

Conclusion

Implementing Pinecone vector search has significantly improved our blog's search capabilities. The combination of:

Semantic understanding through vector embeddings
Fast similarity search with Pinecone
Secure API access with Lambda@Edge
Efficient distribution through CloudFront

This creates a robust and scalable search solution that understands the meaning behind queries rather than just matching keywords.