How to dynamically build a search index for a Next.js blog based on MDX files (Next.js)

While creating my blog I ran into some issues implementing the search functionality. After fighting these issues I finally landed on a solution that I will be presenting in this article. Let's dig in 🙂.

The issues

Now that you understand the facts, let me explain the issue. First of all, when developing the blog using the npm run dev/next dev command, the files are reachable by Next.js. This made it easy for me to just look in the directory of the files, list them, and then extract their data for indexing. However, what I didn't realise, was that these files dissapear upon compilation of the app (as explained before). This isn't a problem for blogs that do not include search, as their list of blogs are statically generated at compile time. I couldn't find an intuitive way to keep the files, so I decided to try for an alternate solution.

The solution

The solution I ended up with was building a JSON file containing all the needed post data during development, whenever a change was made in the post directory. This meant I had to code the actual file watcher and find some way to run it concurrently with next dev. My blog is coded in TypeScript, but because I couldn't make the file watcher parse MDX files, I had to use JavaScript.

Once the search endpoint is hit, it uses the JSON file to build a search index. The search is then run using fuse.js. This search index is of course cached on further requests.

The file watcher


_22import fs from "fs";
_22import {blogDir, guidesDir} from "./getSrcDirectoryPath.js";
_22import {generatePostsData} from "./generate-posts-data.js";
_22
_22const directoriesToWatch = [blogDir, guidesDir];
_22
_22console.log("Post file watcher running!");
_22
_22let timeout = null;
_22
_22directoriesToWatch.forEach(directory => {
_22    fs.watch(directory, (event) => {
_22        if (event === 'change') {
_22            clearTimeout(timeout);
_22            timeout = setTimeout(async () => {
_22                console.log("Generating post data...");
_22                await generatePostsData();
_22                console.log("Post data generated");
_22            }, 5000);
_22        }
_22    });
_22});

It's pretty straight forward. This script watches the directories for changes using the Node file system module. When a change is registered, we call generatePostsData() which generates the data. I added a 5 second interval, as running this whenever there was a change to a file, wouldn't work well for editors that save all the time (WebStorm for example).

The generatePostsData() function just iterates over each post and creates a JSON file containing all the data needed.

Using concurrently

To run the script while using next dev I use the concurrently library. The library allows you to run multiple commands at the same time. After installation I modified the scripts.dev command in my package.json file like this:


_10"dev": "concurrently --kill-others  \"next dev\" \"node dev/posts-watcher.js\""

This makes sure that both commands run in unison. The kill-others parameter makes it so if one process fails, the other one shuts down also. This is useful if Next crashes because of some issue during development.

Building and using the search index

The code is too long to paste, so you can find it here in my template project. The endpoint handler is straight forward and just extracts and passes the postType and query parameters. The postType parameter is because my content is seperated into blog or guide posts.

The PostSearcher, as you might have guessed contains everything to do with searching posts. When the search method is hit and the caches are undefined, we cache the articles. This is done by using the JSON files through the Article interface. Hereafter we simply put them in fuse.js search indexes.

The reason for storing the posts as both fuse.js search indexes and standard arrays, is because fuse.js doesn't support empty searches. Meaning if you use the fuse.search(query) function with an empty string, you won't get any results back.

As you might have noticed, pagination isn't implemented yet. Simply because I don't have enough posts for it to make sense. I might have implemented search a bit prematurely too 😛.

Conclusion

That is how I implemented search on my blog at andersmadsen.dev. I wasn't able to find anyone else with this issue when trying to solve it, so I hope it helps someone out there. Feel free to suggest alternative solutions! When trying to solve something like this, it is easy to feel like something must have gone wrong, for you to even have the problem to begin with 🙂.

I write mainly about web development, so follow me for more articles in this vein. Also, if you have any questions, feel free to reach out in the comments 🙂.