Adding llms.txt Files to Enhance AI Model Compatibility

This blog now supports llms.txt files, making content more accessible to AI language models. Similar to how RSS feeds help readers stay updated, llms.txt files enable AI models to better understand and reference technical blog content. This article explains the implementation of this feature for a Next.js static blog with Velite. You can see it for yourself here https://subaud.io/blog/adding-llms-txt-for-ai-models for this post.

The Problem with AI Models and Technical Content

AI language models often struggle with accurately representing technical content, especially code examples. When models like Claude or ChatGPT reference blog posts, they typically access them through web crawling, which can result in code formatting issues, missing context, or incomplete information.

The llms.txt convention addresses this by providing a plain text version of articles in a predictable location (/[slug]/llms.txt), allowing models to access the complete, unprocessed content directly.

Creating the Generator Script

The implementation requires a generator script similar to the RSS feed generator previously described. The key difference is that instead of using Velite's processed content, the script reads the original MDX files to preserve code formatting:

import fs from "fs";
import path from "path";
import { blogs } from "./.velite";
 
async function generateLlmsFiles() {
  const publicDir = path.join(process.cwd(), "public");
  const contentDir = path.join(process.cwd(), "src/content");
 
  // Process each blog post
  for (const post of blogs.filter((post) => post.published)) {
    console.log(`Processing: ${post.title}`);
 
    // Create the directory structure /[slug]/ if it doesn't exist
    const slugPath = path.join(publicDir, post.slugAsParams);
    if (!fs.existsSync(slugPath)) {
      fs.mkdirSync(slugPath, { recursive: true });
    }
 
    // Get the original MDX file content
    const slugParts = post.slug.split("/");
    const filename = slugParts[slugParts.length - 1] + ".mdx";
    const mdxPath = path.join(contentDir, "blog", filename);
 
    // Read the original MDX file
    let mdxContent = "";
    try {
      mdxContent = fs.readFileSync(mdxPath, "utf-8");
      console.log(`Read original MDX file: ${mdxPath}`);
    } catch (error) {
      console.error(`Error reading MDX file ${mdxPath}:`, error);
      mdxContent = `Unable to read original MDX content. Using metadata only.`;
    }
 
    // Remove frontmatter from MDX content
    const frontmatterEnd = mdxContent.indexOf("---", 3);
    if (frontmatterEnd > 0) {
      mdxContent = mdxContent.slice(frontmatterEnd + 3);
    }
 
    // Create the base content with metadata
    let content = `Title: ${post.title}\n`;
    content += `Date: ${post.date}\n`;
    content += `Author: ${post.author}\n`;
    content += `Categories: ${post.categories?.join(", ") || ""}\n`;
    content += `Tags: ${post.tags?.join(", ") || ""}\n\n`;
    content += `${post.description}\n\n`;
 
    // Add the MDX content - keeping code blocks intact
    content += mdxContent.trim();
 
    // Handle image references - add image descriptions
    content = addImageDescriptions(content, post.slug);
 
    // Write the llms.txt file in the format /[slug]/llms.txt
    const llmsFilePath = path.join(slugPath, "llms.txt");
    fs.writeFileSync(llmsFilePath, content);
 
    console.log(
      `${fs.existsSync(llmsFilePath) ? "Updated" : "Created"}: ${post.slugAsParams}/llms.txt`,
    );
  }
 
  console.log("LLMs files generated successfully!");
}

The script performs several important tasks:

Reading the list of published blog posts from Velite
Reading the original MDX file from the content directory for each post
Extracting the content after the frontmatter
Creating a structured text file with metadata followed by the original content

A crucial part of the implementation is handling images properly. Since AI models cannot directly view images but benefit from knowing they exist, the script includes image references:

function addImageDescriptions(content: string, slug: string): string {
  // Find image directory for this post
  const slugParts = slug.split("/");
  const slugName = slugParts[slugParts.length - 1];
  const imageDir = path.join(
    process.cwd(),
    "public",
    "images",
    "blog",
    slugName,
  );
 
  // If image directory doesn't exist, return content unchanged
  if (!fs.existsSync(imageDir)) {
    return content + "\n\nNo images included in this post.";
  }
 
  // List all images and add descriptions
  const images = fs
    .readdirSync(imageDir)
    .filter(
      (file) =>
        file.endsWith(".jpg") || file.endsWith(".png") || file.endsWith(".gif"),
    );
 
  if (images.length === 0) {
    return content + "\n\nNo images included in this post.";
  }
 
  let imageSection = "\n\n## Images included in this post:\n\n";
 
  images.forEach((image) => {
    // Add image reference with path
    imageSection += `- ${image}: /images/blog/${slugName}/${image}\n`;
  });
 
  return content + imageSection;
}

This function scans the blog post's image directory and appends a section listing all available images with their paths. This allows AI models to reference images even if they cannot see them directly.

Integrating with the Build Process

To ensure llms.txt files are generated automatically during builds, the package.json needs to include the generator in the build process:

{
  "scripts": {
    "build": "yarn velite build && next build && yarn rss:build && yarn llms:build",
    "llms:build": "node -r esbuild-register llms-generator.ts"
  }
}

With this configuration, whenever the blog is built, the llms.txt files are automatically generated and included in the static export. This ensures they remain up to date with the latest content.

Testing the Implementation

Testing the implementation involves running the script:

yarn llms:build

When executed, the script processes all blog posts and creates llms.txt files in the correct locations. Verification confirms that code blocks are preserved correctly and image references are included.

The generated files contain the complete text of each post, including all code examples and image references. These files are accessible at URLs following the pattern https://subaud.io/blog-post-slug/llms.txt.

Benefits for Technical Content

This implementation is particularly valuable for technical blogs where code examples are critical. By preserving code formatting and structure, it ensures that AI models can accurately reference and explain code examples when users ask about them.

When a user asks an AI about specific CDK constructs or TypeScript examples, the model can access the complete code with proper formatting, rather than relying on web-crawled content where code blocks might be incorrectly parsed or formatted.

The llms.txt files complement the existing RSS implementation in a complementary way:

RSS feeds enable human readers to stay updated with new content
llms.txt files enable AI models to access and understand content accurately

Conclusion

Implementing llms.txt files enhances the accessibility of blog content for AI language models. This addition to the build process ensures that models can accurately reference technical content, including code examples and image references.

As AI models become more integrated into technical discovery and learning processes, providing well-structured content through conventions like llms.txt helps ensure users receive accurate information when asking about topics covered in technical blogs.