Find matches between JAVA Strings

-2

I have a project in which I have to find character matches between 3 different Strings of 4 characters, for example:

String cadena1 = "aaattkggl"
String cadena2 = "rrattkmmt"
String cadena3 = "wwattkkllg"

As you can see between the 3 strings there are the characters "attk" and the program should print "attk".

if you have more than 4 characters joined in the 3 strings you should print them, example:

String cadena1 = "cadenaxyuiop"
String cadena2 = "qwertcadenaxqwert"
String cadena3 = "asdfghcadenax"

should print: "stringx"

Thanks to another user, I was able to move forward with this code:

    //Pedir datos al usuario y validarlos
    System.out.print("Ingrese primer cadena: ");
    cadena1 = leer.nextLine();
    if (cadena1.contains("b")||cadena1.contains("j")||cadena1.contains("ñ")||cadena1.contains("x")||cadena1.contains("z"))
    {
        System.out.println("La cadena contiene caracteres invalidos");
    }
    else
    {
        System.out.println("\n");
        System.out.print("Ingrese segunda cadena: ");
        cadena2 = leer.nextLine();
        if (cadena2.contains("b")||cadena2.contains("j")||cadena2.contains("ñ")||cadena2.contains("x")||cadena2.contains("z"))
        {
            System.out.println("La cadena contiene caracteres invalidos");
        }
        else
        {
            System.out.println("\n");
            System.out.print("Ingrese tercer cadena: ");
            cadena3 = leer.nextLine();
            if (cadena3.contains("b")||cadena3.contains("j")||cadena3.contains("ñ")||cadena3.contains("x")||cadena3.contains("z"))
            {
                System.out.println("La cadena contiene caracteres invalidos");
            }

            //dividir la cadena en 4 caracteres y compararlos con las otras cadenas
            else 
            {
                System.out.println("\n***************************************************");
                for (int i=4; i<cadena1.length();i++)
                {
                    String[] partes = cadena1.split("(?<=\G.{" + i + "})");
                    for (String parte : partes)
                    {
                        if (parte.length() < 4) break;
                        if(cadena2.contains(parte)&&cadena3.contains(parte))
                        cadenator.add(parte);
                    }
                }
            }

            System.out.println(cadenator);
        }
    }
}

but I can not get the expected result printed with any string, like the one in the first example.

In the same way, you should be able to print several matches if there are any, for example:

String cadena1 = "holaxxxxxxamigo"
String cadena2 = "amigottttttttthola"
String cadena3 = "ggggggggholagamigo"

this should print "hello" - "friend"

If someone could help me, I would appreciate it very much.

    
asked by ErickLugoJ 31.03.2017 в 04:45
source

4 answers

1

I think this can help you:

package cm;

/**
 *
 * @author Paolo Rios Garaundo
 */
public class Temp {

    public static void main(String[] args) {
        final String cadena1 = "cadenaxyuiop";
        final String cadena2 = "qwertcadenaxqwert";
        final String cadena3 = "asdfghcadenax";
        final String cadena4 = "cadenaxPorPaolo";

        final String[] cadenas = {cadena1, cadena2, cadena3, cadena4};
        String coincidencia = "";
        String coincidenciaFinal = "";

        for (int i = 1; i < cadenas.length; i++) { // no es necesario comprobar la primera cadena

            if (!(coincidencia.length() > 0)) {
                final String cadena = cadenas[i];
                final String cadenaAnterior = cadenas[i - 1];

                for (int z = 0; z < cadenaAnterior.length(); z++) { // se recorre la cadena anterior
                    final String caracterEvaluar = String.valueOf(cadenaAnterior.charAt(z));

                    // se hace la comprobacion de coincidencias
                    if (coincidencia.equals(cadena)) { 
                        if (cadena.contains(coincidencia)) {
                            coincidencia += caracterEvaluar;
                        }
                    } else {
                        if (cadena.contains(caracterEvaluar)) {
                            coincidenciaFinal += caracterEvaluar;
                            coincidencia = caracterEvaluar;
                        }
                    }
                }
            }
        }

        System.out.println("Coincidencia: " + coincidenciaFinal);
    }
}

    
answered by 31.03.2017 в 05:44
1

Basically what this code does is to separate the first string in parts from 4 characters forward making use of a regular expression (?<=\G.{" + numerodecaracteres+ "}) having those parts using the method contains of the String class. yes that part is in the other two chains is printed.

String One, Two, Three;
System.out.print("Ingrese cadena de aminoacidos numero 1: ");
One= leer.nextLine();

System.out.print("Ingrese cadena de aminoacidos numero 2: ");
Two= leer.nextLine();

System.out.print("Ingrese cadena de aminoacidos numero 3: ");
Three= leer.nextLine();
List<String> repeats= new ArrayList<>();
for (int i = 0; i < One.length(); i++) { 
    for (int j = 4; j < One.length(); j++) {
        String[] parts = One.substring(i,One.length()).split("(?<=\G.{" + j + "})");
        String part= parts[0];
        if(part.length()<4) break;
        if(Two.contains(part) && Three.contains(part)){
            if(!repeats.contains(part)) repeats.add(part);
        }
   }
}
    
answered by 28.03.2017 в 05:48
1

An alternative assuring maximum size matches for n Strings

import java.util.ArrayList;
import java.util.Iterator;


public class StringFind {

    private String[] strings;

    public StringFind(String... strings){
        this.strings=strings;
    }

    private String match(String string){
        // buscar string en strings[1]-strings[length-1];
        int found = 1;
        for (int i = 1; i<strings.length; i++){
            if (strings[i].contains(string)) found++;
        }
        return (found == strings.length) ? string : null;
    }

    public ArrayList<String> getMatches(){
        ArrayList<String> list = new ArrayList<String>();
        String lastMatch="";
        // recorrer substrings desde 0
        for (int i = 0; i<strings[0].length()-4; i++){
            String resto = strings[0].substring(i);
            // buscar substring hasta minimo tamaño 4
            for (int j = resto.length(); j>3; j--){
                // buscar en otras cadenas
                String match = match(resto.substring(0, j));
                // ignorar si es parte del ultimo resultado agregado
                if (match!=null && !lastMatch.contains(match) ){
                    // recordar y agregar
                    lastMatch=match;
                    list.add(match);
                }
            }
        }
        return list;
    }

    public static void main(String[] args) {
        StringFind sf = new StringFind("holaxxxxxxamigo", "amigottttttttthola", "ggggggggholagamigo");
        ArrayList<String> list = sf.getMatches();
        Iterator i = list.iterator();
        while(i.hasNext()){
            System.out.println(i.next());
        }

    }

}

Exit:

hola
amigo
    
answered by 31.03.2017 в 06:43
1

The problem you want to solve is called Longest common substring (LCS) (do not confuse with Longest Common Subsequence . An algorithm to do this looks like the following:

(I modified some parts to show all matches)

public class Main {

    public static void main(String[] args) {
        String s1 = "holaxperrxquetlssaxxxxamigos";
        String s2 = "amigottttquetlssatttttperrholas";

        System.out.println(allLongestCommonSubstring(s1, s2));
    }

    private static ArrayList allLongestCommonSubstring(String s1, String s2) {
        ArrayList<String> palabras = new ArrayList<>();

        ciclo:
        // recorrer caracter por caracter el string "s1"
        for (int i = 0, ilargo = s1.length(); i < ilargo; i++) {
            // para cada caracter de "s1" recorrer todos los de "s2"
            for (int j = 0, jlargo = s2.length(); j < jlargo; j++) {

                // contador de maximo
                int max = 0;

                // comparar "s1" frente a "s2" para encontrar la palabra
                while (s1.charAt(i + max) == s2.charAt(j + max)) {

                    // por cada coincidencia aumentar el maximo
                    max++;

                    // pero si el maximo excede el largo de algunos de los strings, romper el ciclo
                    if ((i + max >= ilargo) || (j + max >= jlargo)) {
                        break;
                    }
                }

                // si la cantidad de coincidencias es mayor que 3
                if (max > 3) {
                    // obtener la cadenas desde i hasta donde terminan de coincidir (i+max)
                    palabras.add(s1.substring(i, (i + max)));

                    if (i + max >= ilargo) {
                        // si está al final, saltar los caracteres que quedan y romper el ciclo
                        break ciclo;
                    } else {
                        // si no, saltar la palabra encontrada
                        i += (i + max < ilargo) ? max : 0;
                    }
                }
            }
        }

        return palabras;
    }
}

You can see more implementations on the Wiki .

    
answered by 31.03.2017 в 07:01