I don’t tend to use recipes most of the time when I’m cooking, but when I do one of the things that I had always found tedious was finding recipes that fit the macro-nutrient targets I was going for. I’ve been tracking what I eat for years as it is all too easy for me to not eat enough (I’ve never had a big appetite).
A lot of recipe sites actually do have the nutrition info on their recipes, there is just no way to search or filter by it. So I’ve built this search engine that does just that.
Nutrition-focused search engine
How I built it
I knew from the outset that this was going to be a set it and forget it type app. Once it was built, I didn’t want to have to spend any maintenance time on it. So the technology choices I made were focused on keeping it simple and fast.
The tech stack
It was built in Go using the Gin web framework. It just works and it is fast.
I used Pongo2 as the template engine. Over the years I’ve worked with Django a lot, so being able to use the same syntax was quite nice. The stock html/template
is fine, but I like the inheritance approach that Django-style templates have.
For the database I used sqlite. For a simple use case like this, it is perfect.
For the web scrapping I used Colly. It was pretty solid. In the past I’ve used Scrapy (Python) and found it to be largely overkill, so Colly was pleasant to work with.
For interactivity I chose to use HTMX and just plain JavaScript rather than using a framework. Because of the Go + sqlite setup the responses were super fast so HTMX worked quite nicely.
Testing and parsing
One aspect of any web scraping project that is always troublesome is the fact that websites change. When someone redesigns their site, there is a good chance that your scraper will break.
So how do you design a robust system that can scrape 100s of sites knowing that any and all of them will change at some point?
It is hardly a novel idea, but I took the approach of just expecting things to regularly fail and building around that. On any given crawling run it should be expected that some of the sites just aren’t going to work. So just keep on going and ignore the issues.
But at the same time, as part of my testing strategy, I built a “snapshot” system so I can test not only if I can successfully scrape and parse the information from each site, but am I getting the same parsing results as I previously had. I’m pretty proud of the setup I built around this. I built a small interactive CLI tool that would compare the results for each site and when there was a difference it would show me what the diff was and if the new data looked good, I could approve it and it would store that as the new snapshot state.
Critique
This was a project I built to scratch my own itch, so once it met the good enough threshold I stopped working on it. So it is left in a place that does solve the initial problem I set out to solve, yet it is a bit grating every time I use it for a few reasons.
1. The design sucks.
Both the UI and UX design are rather poor. It was left in “prototype+” mode, where I essentially took what I had in the initial prototype and made some minor style changes to make it look a bit better. But it needs some serious work.
I keep telling myself I’ll come back and re-design it, but whenever I think about doing it, I decide that time is always better spent elsewhere.
2. Limited functionality
In most situations you don’t want to find recipes purely based on their nutrition profiles and search terms. Ideally you’d want to be able to categorize them based on other qualities like which meals they are intended for, the cuisine, ingredients, cooking methods, required tools, etc.
Unfortunately, it turned out that while some of this information is available on different sites, it is very inconsistent. I explored several paths to build the appropriate data, most of which was using AI categorization techniques, but I came to the conclusion that it just wasn’t worth the time and effort.