Group by similar strings in MySQL

1

I am normalizing a database in MySQL and I have to convert a field that was entered manually to an FK field that is related to a new table that should have that manual income but without repeating it.

The problem I have is that, being manual income, there are many variations. For example, I have "Auto Rojo" , "auto Rojo" , "AUTO rojo " , etc. and it should only have a single record.

I give an example with data that I am working on:

INSERT INTO corredor (nombre) VALUES ("MARCO ANTONIO ASTUDILLO GHO");
INSERT INTO corredor (nombre) VALUES ("MARCO ANTONIO ASTUDILLO GHO CORREDO");

INSERT INTO corredor (nombre) VALUES ("LILIENFELD CORREDORES DE  SEGUROS");
INSERT INTO corredor (nombre) VALUES ("LILIENFELD CORREDORES DE SEGUROS");

INSERT INTO corredor (nombre) VALUES ("FELIPE PAUL");
INSERT INTO corredor (nombre) VALUES ("FELIPE PAUL Y CIA LTDA");
INSERT INTO corredor (nombre) VALUES ("FELIPE PAUL Y CIA. LTDA");
INSERT INTO corredor (nombre) VALUES ("FELIPE PAUL Y CIA. LTDA.");
  • I previously grouped and exported as inserts with workbench, of 7000 records were grouped into 400 approx. The idea is that in the final application the user does not write and select from the list (or add if not in it).

What would be a way to clean these fields and reference them from my original table?

    
asked by Camilo Fernández 08.06.2017 в 00:48
source

1 answer

0

If you have 400 records, what I would do is: first, convert everything to uppercase (if it's acceptable), and do basic cleaning of strings: remove unwanted spaces, and check "rare" (non-alphabetic) characters. Then, add a temporary column to mark in some ways the "approved" values, which will not be modified and merged with others (and which will go to the list). Then, for each of the "unapproved" values, list:  amount of correctness and value (including approved and not) which, if not equal, has a minimum distance from Levenshtein ). Manually, decide what to do (modify them to match others, promote them to "approved", etc).

I'm not at all sure that it suits you to do it within Mysql ...

    
answered by 08.06.2017 в 02:16