How to dynamically build a search index for a Next.js blog based on MDX files (Next.js)

2023/11/07

The guide overview on my website

The guide overview on my website

While creating my blog I ran into some issues implementing the search functionality. After fighting these issues I finally landed on a solution that I will be presenting in this article. Let's dig in 🙂.

The issues

First of all, here are some facts:

  • The blog is built with Next.js and is hosted at andersmadsen.dev.
  • The posts are stored in an MDX format. MDX allows markdown to be rendered as HTML with support for plugins and custom components.
  • The posts are therefore served and handled by the Next.js app and not some backend somewhere.
  • The posts are rendered from their MDX files at compile time. Meaning the actual files won't be in the compiled Next.js app.
  • The posts MDX files contain data such as title, teaser text and date. I need this data for searching, presenting and sorting the posts.

Now that you understand the facts, let me explain the issue. First of all, when developing the blog using the npm run dev/next dev command, the files are reachable by Next.js. This made it easy for me to just look in the directory of the files, list them, and then extract their data for indexing. However, what I didn't realise, was that these files dissapear upon compilation of the app (as explained before). This isn't a problem for blogs that do not include search, as their list of blogs are statically generated at compile time. I couldn't find an intuitive way to keep the files, so I decided to try for an alternate solution.

The MDX files in my solution

The MDX files in my solution

The solution

The solution I ended up with was building a JSON file containing all the needed post data during development, whenever a change was made in the post directory. This meant I had to code the actual file watcher and find some way to run it concurrently with next dev. My blog is coded in TypeScript, but because I couldn't make the file watcher parse MDX files, I had to use JavaScript.

Once the search endpoint is hit, it uses the JSON file to build a search index. The search is then run using fuse.js. This search index is of course cached on further requests.

The file watcher

I ended up with this code for the file watching itself:


_22
import fs from "fs";
_22
import {blogDir, guidesDir} from "./getSrcDirectoryPath.js";
_22
import {generatePostsData} from "./generate-posts-data.js";
_22
_22
const directoriesToWatch = [blogDir, guidesDir];
_22
_22
console.log("Post file watcher running!");
_22
_22
let timeout = null;
_22
_22
directoriesToWatch.forEach(directory => {
_22
fs.watch(directory, (event) => {
_22
if (event === 'change') {
_22
clearTimeout(timeout);
_22
timeout = setTimeout(async () => {
_22
console.log("Generating post data...");
_22
await generatePostsData();
_22
console.log("Post data generated");
_22
}, 5000);
_22
}
_22
});
_22
});

It's pretty straight forward. This script watches the directories for changes using the Node file system module. When a change is registered, we call generatePostsData() which generates the data. I added a 5 second interval, as running this whenever there was a change to a file, wouldn't work well for editors that save all the time (WebStorm for example).

The generatePostsData() function just iterates over each post and creates a JSON file containing all the data needed.

Using concurrently

To run the script while using next dev I use the concurrently library. The library allows you to run multiple commands at the same time. After installation I modified the scripts.dev command in my package.json file like this:


_10
"dev": "concurrently --kill-others \"next dev\" \"node dev/posts-watcher.js\""

This makes sure that both commands run in unison. The kill-others parameter makes it so if one process fails, the other one shuts down also. This is useful if Next crashes because of some issue during development.

Building and using the search index

The code is too long to paste, so you can find it here in my template project. The endpoint handler is straight forward and just extracts and passes the postType and query parameters. The postType parameter is because my content is seperated into blog or guide posts.

The PostSearcher, as you might have guessed contains everything to do with searching posts. When the search method is hit and the caches are undefined, we cache the articles. This is done by using the JSON files through the Article interface. Hereafter we simply put them in fuse.js search indexes.

The reason for storing the posts as both fuse.js search indexes and standard arrays, is because fuse.js doesn't support empty searches. Meaning if you use the fuse.search(query) function with an empty string, you won't get any results back.

As you might have noticed, pagination isn't implemented yet. Simply because I don't have enough posts for it to make sense. I might have implemented search a bit prematurely too 😛.

The running post file watcher generating the data

The running post file watcher generating the data

Conclusion

That is how I implemented search on my blog at andersmadsen.dev. I wasn't able to find anyone else with this issue when trying to solve it, so I hope it helps someone out there. Feel free to suggest alternative solutions! When trying to solve something like this, it is easy to feel like something must have gone wrong, for you to even have the problem to begin with 🙂.

I write mainly about web development, so follow me for more articles in this vein. Also, if you have any questions, feel free to reach out in the comments 🙂.