Reading Time: 4 minutes
Problem statement
The geteduroam Android app fetches a list of 4380 different organizations from a public endpoint.
We need to make this very long list searchable for the user. Seems like a simple issue, but now it gets complicated:
- The user needs to search for abbreviations, so searching for “ARU” returns African Rural University
- Diacritics should not matter, so searching for “muszaki” should return Budapest Műszaki és Gazdaságtudományi Egyetem
- Partial word search should also be possible, so if I type “cali” I should see California State University in the results
Our initial implementation generates an in-memory index on a background thread, by creating an abbreviation list, and then also a normalized string, without the diacritics, and list of words.
Then, when the user searches for something, we iterate over this big list of words, and filter based on matches. This is all done in Kotlin using Regex and string operations.
On a Pixel 8 Pro, fully building up this index takes about 2.1s in average.
However, on an older device, like a Samsung Tab S 10.5 (back from 2014) this becomes 28 seconds.
SQLite with FTS
That’s why I went did some investigation, how we could improve on this. A colleague has pointed out, that SQLite has an extension, which enables the app to search in columns for substrings, when you have lots of textual data. It can even ignore diacritics, which is what we are looking for. This extension is called FTS (full-text search), and it has different versions:
- FTS4 (and so FTS3) is available from API level 11 (Android 3.0), so basically on every device
- FTS5 is available from SQLite version 3.9, so from API level 24 (Android 7.0). That means that if you are targeting relatively older devices, you might have to stick to FTS4
FTS4 should be able to fulfill your basic needs, although FTS5 has some very important additions, such as a custom tokenizer.
Building a lookup table in SQLite
You can find multiple tutorials and articles online, how you can build a simple FTS lookup table in SQLite from your Android app, this one won’t be different. When the database is initialized, we add a new table:
Here I have added a rowIndex, so that I can easily reference back to my in-memory array keeping all the data objects, the full name and abbreviation (because those I cannot search for, so I had to generate them myself, an add it as an extra column), and I set the tokenizer to unicode61, which will remove all the diacritics.
When I have my list of datamodels, I upload the values in the database, by first removing the previous dataset, and then inserting all the 4380 organizations, row by row, all in one transaction:
What’s left, is the actual query, where we search for a user entered keyword, and get all the matching rows in a cursor. Here, I only care about the rowIndex, so I just retrieve that from the database by searching with the filter:
This solution works fine, and matches all of our requirements. So the question is, does it perform better?
On a Pixel 8 Pro, the database was filled, and ready for queries in an average of 1.2s (old version: 2.1s).
On a Samsung Galaxy Tab S 10.5, the database was ready for use in an average of 2s (old version: 28s). A 14x speed increase, which is exactly what we were looking for!
But how about query times? If it does not search quickly, then it will degrade the user experience, making the entire transition useless. Luckily, that is not the case. Search is almost instant, with a delay of about 5–20ms, just like it is when searching in memory.
What about FTS5?
I did not see a noticeable performance change when using FTS5. This is mostly due to the fact that in the database version the majority of the time is not spent loading the rows into SQLite, but generating the abbreviations, which is still done on the Kotlin side (I tried to find a query which could search in FTS for abbreviations, but couldn’t find one, let me know if you think it is possible!)
So for now, I stick to FTS4 for the compatibility on older Android versions.
Conclusion
Altogether, I think this experiment was a success, because it did result in a significant performance increase, even on the newest devices, but especially on legacy, more sluggish phones and tablets.
It does come with added code complexity though, and we do know that SQLite can sometimes throw quirky errors, and migrations are not easy to write, so this is a serious point that you should consider first.
Also, this might not work on smaller or significantly larger datasets, so do some preliminary testing before switching over your search engine implementation.
And if you want to see how we did this on eduroam, check out this commit.