Match maker, match maker…
Of the 95 string functions available in PHP, not one gives you the convenience of loosely matching two strings. What do I mean by that? Well, you and I could easily discern that the phrase “funding and charters” is the same as “Funding & Charters“, however a computer would not consider these the same - certainly not using PHP as the processing tool anyway.
Recently I finished the build of a Training Package Issues Register for a government organisation. This Issues Register is setup so that the client can put an entire Training Package online (equivalent to 4000 pages in a *.doc file) and have registered members give feedback about each of the Qualifications, Units, Skill sets and various Discussion Papers.
Previous to this online system, the client had been gathering feedback manually and collating it into spreadsheets, which ultimately they wanted to import into the online system. The feedback table - in our database - was recording the “type” of feedback using a single character; “u” for feedback that has been made against a Unit, “q” for qualifications, etc, etc… The clients’ spreadsheet had a method of identifying the feedback type as well, however the values were not uniform and some examples included “Core Units”, “Training package unit”, “Qual”, etc…
This meant that we had to come up with a method of more loosely comparing strings, so that we could get “Core units” and “Training package unit” to be recognised as a “Unit” and ultimately store a “u” in our feedback table.
After some brainstorming and discussion with the team, we concluded that this was a job for SQL’s full-text search capabilities. In a nutshell - using full-text search - we asked the system to return the most likely matches for a given string and order them by “relevance” (how similar a result is to the original string).
We decided that when doing an import we would create a temporary (in memory) table of the different feedback types and then performing a full-text search against it to find the most relevant type we have on record. While this form of matching isn’t an exact science, the likelihood of a mismatch is quite low and we already had the functionality to allow the administrator to repair these edge-cases manually.
It could be argued that we are making a redundant call to the database, but considering that these imports are rarely performed and only ever by a single user, we decided that the convenience outweighed any possible performance issues.
In anycase, the point remains that while PHP isn’t very intelligent when it comes to matching natural language strings, SQL is and it’s definitely something that I recommend every developer make a place for in your mental toolkit for when you next encounter a problem similar to ours.