C#Dot Net Perls

 
SQLite FTS3 Virtual Table

by Sam Allen

Problem

Enable full-text searching using SQLite's FTS3 indexing mechanism. FTS3 allows you to enable full text search and create a search feature similar to Google. We want to index all the words in a TEXT column in your SQLite3 database. You need this function to be blazingly fast. You don't want to write 10,000 lines of code to accomplish this.

Solution

First I want to show you how to make a virtual table, which is what a FTS3 table is. I hope to walk you through some other points of how to get this working. The end result is very neat--don't give up if some detail throws you off.

CREATE the FTS3 table

Let's look at how you can create a virtual FTS3 table using SQL in SQLite. Note that you can store other kinds of columns in SQLite virtual tables, but only the textual ones will be searched with FTS3 and the MATCH syntax. You can create a virtual FTS3 table exactly the same way as a regular table in SQLite.

-- Create a fts3 table that contains two columns.
-- The first is a text column and the second is a datetime column.
-- The column names, table name, etc. must be customized.
CREATE VIRTUAL TABLE TableName
    USING FTS3(ColOne TEXT, ColTwo DATETIME)

UPDATE the FTS3 table

Before you are able to update data into your brand new virtual table, you will need to insert rows into it. I have researched this, and with SQL there is really no way around checking whether you need to insert or update a row.

-- Update our virtual table.
UPDATE TableName
    SET ColOne=@a
    WHERE ColTwo=@b

Use MATCH

You must use the MATCH syntax to search through full-text tables in SQLite. I am going to show you a somewhat longer query. If you are reading about FTS3, you may know more SQL than I do. I will show some MATCH syntax here. Make sure you use MATCH instead of equals or LIKE to scan a TEXT column in a virtual table.

-- Look at the line starting with MATCH.
-- That's where the fts3 stuff kicks in.
-- FullText is a fts3 virtual table.
SELECT jt.EntryTitle, jt.DateTimeNew, jt.WordCount
    FROM JournalTable jt, FullText ft
    WHERE ft.FullContent
    MATCH @query
    AND jt.DateTimeNew = ft.FullDate
    ORDER BY ft.FullDate DESC
    LIMIT 100

Should I use rowid?

If possible, yes. In SQLite and other databases, each row has a unique identifier called a rowid. You don't need to tell the database to create this, and you can blissfully ignore it. However, the rowid gives us some more power and it is very cheap to use. Here's an example of how you can use it.

-- Get some matches from the fts3 table, then select the rowid.
SELECT rowid
    FROM VirtualTable
    WHERE ContentCol
    MATCH @query
    LIMIT @limit
    OFFSET @offset;

SELECT with rowid

You have a main content table, which contains the content and data that is what the user interacts with, and the SQLite FTS3 virtual table. You can run text searches using MATCH on the virtual table, and then get the corresponding row in the other table, as I show in this example.

-- Fetch the relevant row from the other table.
SELECT ColA, ColB
    FROM DataTable
    WHERE rowid=@id;

JOIN on rowid

To optimize performance and improve code clarity, you will want a way to minimize the trips to the SQLite database. Thus, let's use an implicit join on the two tables, using rowid. Because rowid is an int, this will be extremely fast (just a seek).

-- Join virtual table with regular table on rowid.
SELECT dt.ColA, dt.ColB
    FROM DataTable dt, VirtualTable vt
    WHERE vt.ContentCol
    MATCH @query AND dt.rowid = vt.rowid
    LIMIT @limit
    OFFSET @offset;

Optimize database size with FTS3

My database grew to 3 times its previous size when I enabled FTS3. To improve things, I used code to remove stopwords. Stopwords are short, common words that Google and Yahoo don't index. These make searching slower and bloat the indexes. I added code to removes stopword using a Dictionary. Here are the results.

Version SQLite database size in MB
No full-text index 7
Unreduced full-text index 19
Reduced full-text index 9
Database size comparison graph.

Conclusion

Use FTS3 in SQLite 3 for lightning-fast searching similar to Google (only faster). SQLite does this well and your decision to use it is very wise. Richard Hipp is a genius and his work has a great effect on improving computing for us all. The results of FTS3 implementations in my programs were very promising and impressive.

Dot Net Perls is dedicated to sharing code and knowledge. It has
© 2007-2008 Sam Allen. All rights reserved.

Ads by The Lounge