How to ignore accent in SQLite query (Android)-ThrowExceptions

Exception or error:

I am new in Android and I’m working on a query in SQLite.
My problem is that when I use accent in strings e.g.

  • ÁÁÁ
  • ááá
  • ÀÀÀ
  • ààà
  • aaa
  • AAA

If I do:

SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%a%' ORDER BY MOVIE_NAME;

It’s return:

  • AAA
  • aaa (It’s ignoring the others)

But if I do:

SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%à%' ORDER BY MOVIE_NAME;

It’s return:

  • ààà (ignoring the title “ÀÀÀ”)

I want to select strings in a SQLite DB without caring for the accents and the case. Please help.

How to solve:

Generally, string comparisons in SQL are controlled by column or expression COLLATE rules. In Android, only three collation sequences are pre-defined: BINARY (default), LOCALIZED and UNICODE. None of them is ideal for your use case, and the C API for installing new collation functions is unfortunately not exposed in the Java API.

To work around this:

  1. Add another column to your table, for example MOVIE_NAME_ASCII
  2. Store values into this column with the accent marks removed. You can remove accents by normalizing your strings to Unicode Normal Form D (NFD) and removing non-ASCII code points since NFD represents accented characters roughly as plain ASCII + combining accent markers:

    String asciiName = Normalizer.normalize(unicodeName, Normalizer.Form.NFD)
        .replaceAll("[^\\p{ASCII}]", "");
    
  3. Do your text searches on this ASCII-normalized column but display data from the original unicode column.

###

In Android sqlite, LIKE and GLOB ignore both COLLATE LOCALIZED and COLLATE UNICODE (they only work for ORDER BY). However, there is a solution without having to add extra columns to your table. As @asat explains in this answer, you can use GLOB with a pattern that will replace each letter with all the available alternatives of that letter. In Java:

public static String addTildeOptions(String searchText) {
    return searchText.toLowerCase()
                     .replaceAll("[aáàäâã]", "\\[aáàäâã\\]")
                     .replaceAll("[eéèëê]", "\\[eéèëê\\]")
                     .replaceAll("[iíìî]", "\\[iíìî\\]")
                     .replaceAll("[oóòöôõ]", "\\[oóòöôõ\\]")
                     .replaceAll("[uúùüû]", "\\[uúùüû\\]")
                     .replace("*", "[*]")
                     .replace("?", "[?]");
}

And then (not literally like this, of course):

SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"

This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.

It is important to notice that GLOB ignores COLLATE NOCASE, that’s why I converted everything to lower case both in the function and in the query. Notice also that the lower() function in sqlite doesn’t work on non-ASCII characters – but again those are probably the ones that you are already replacing!

The function also replaces both GLOB wildcards, * and ?, with “escaped” versions.

###

You can use Android NDK to recompile the SQLite source including the desired ICU (International Components for Unicode).
Explained in russian here:
http://habrahabr.ru/post/122408/

The process of compiling the SQLilte with source with ICU explained here:

How to compile sqlite with ICU?

Unfortunately you will end up with different APKs for different CPUs.

###

You need to look at these, not as accented characters, but as entirely different characters. You might as well be looking for a, b, or c. That being said, I would try using a regex for it. It would look something like:

SELECT * from TB_MOVIE WHERE MOVIE_NAME REGEXP '.*[aAàÀ].*' ORDER BY MOVIE_NAME;

Leave a Reply

Your email address will not be published. Required fields are marked *