Delete duplicate values in a Map / HashMap and get the most frequent value

2

I have tried to delete keys and values of a map but I have not yet achieved it, meanwhile I have created another map .

  • It is required to find the words with the most appearances in a sentence. (I do not know what words there may be in the sentence nor do I have a list to compare).
  • Get the word that appears most times regardless of uppercase, lowercase or accent.
  • Find the second and third word with the most appearances in the sentence.
  • Print the top 3 of these words with their value.
  • Be as efficient as possible in coding.
  • Point 5 is the most important, then I show you what I have.

    public class StringTest {
    
        public static void main(String[] args) {
    
            String stringTest = "En esta cadena tenemos mas cadenas que la cadena principal la primera vez que intente esta solucion no pude mas que intentar una y otra vez vez vez vez";
            new StringTest(stringTest);
    
        }
    
        public StringTest(String string) {
    
            String [] splitString = string.split(" ");
            Map<String, Integer> mapString = new HashMap<String, Integer>();
            mapString.put(splitString[0], 1);
    
            for (int i=1; i <= splitString.length-1; i++){
                if (mapString.containsKey(splitString[i])){
                    mapString.put(splitString[i], mapString.get(splitString[i])+1);
                } else{
                    mapString.put(splitString[i], 1);
                }
            }
    
            Map<String, Integer> newMap = new HashMap<String, Integer>();
            for (Entry<String, Integer> entry : mapString.entrySet()){
                if (entry.getValue()!=1){
                    newMap.put(entry.getKey(), entry.getValue());
                }
            }
    
            System.out.println(newMap);
        }
    }
    
        
    asked by Dr3ko 30.06.2017 в 17:53
    source

    2 answers

    0

    I will propose a solution using Streams to get the words and the number of times they are repeated. All this will be saved in a Map in the first instance

    String stringTest = "En esta cadena tenemos mas cadenas que la cadena "
               + "principal la primera vez que intente esta solución no pude mas que "
               + "intentar una y otra vez vez vez vez vez";
    String[] valores = stringTest.split(" "); /* Obtenemos las Palabras */
    

    After obtaining the words of the String we use Streams , We convert to lista the array for then using the method collect to make a reduction of the initial array , it will basically reduce it to one < a href="https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#groupingBy-java.util.function.Function-"> Grouping of key, value that will be assigned to the Map , where the key will be the word and the value the total of repetitions, for this the method is used counting from the class Collectors

    Map<String, Long> repeticiones = Arrays.asList(valores).stream().
                                collect(Collectors.groupingBy(w -> w, Collectors.counting()));
    

    This will return the unordered values but as we want to obtain the values with the highest number of repetitions, we use Stream again, where the main method is reverseOrder that will return the values from highest to lowest.

    Stream<Map.Entry<String, Long>> ordenados = repeticiones.entrySet().stream()
    .sorted(Collections.reverseOrder(Map.Entry.comparingByValue()));
    

    To then get the first three values by applying a limit

    ordenados .limit(3).forEach(i -> System.out.println("Palabra " + i.getKey() 
                                    + " , Se repite " + i.getValue()+ " veces"));
    

    If you want to see all the words and the total of repetitions you would have to iterate the first map

    for (Map.Entry<String, Long> entry : repeticiones.entrySet())
        System.out.println("Palabra : " + entry.getKey()+
                            " , Se Repite : " + entry.getValue()+ " Veces");
    
        
    answered by 30.06.2017 / 19:18
    source
    0

    You can use TreeSet to make it more efficient, TreeSet does not allow duplicate elements, keeps the list sorted and you would use as a value to sort the number of repetitions of that word.

        
    answered by 30.06.2017 в 21:41