Compare several patterns with Pattern and Matcher in java

1

I have to read from a text file data line by line and, depending on what is on the line, apply a different pattern to recognize the information.

This is a part of the file:

[018-0001]
type= category
name= Distritos
id= http://datos.madrid.es/egob/kos/actividades/1ciudad21distritos
[018-0002]
type= category
name= Actividades calle, arte urbano
id= http://datos.madrid.es/egob/kos/actividades/ActividadesCalleArteUrbano
[018-0003-0006]
type= category
name= Carreras y maratones
id=http://datos.madrid.es/egob/kos/actividades/ActividadesDeportivas/CarrerasMaratones

I have created these 4 patterns, which work perfectly:

Pattern pCode = Pattern.compile("\d{3}\W(\d{4}|\d{4}\W\d{4})");
Pattern pTypeCategory = Pattern.compile(".*type= category.*");
Pattern pName = Pattern.compile(".*name=\s(.*)");
Pattern pId = Pattern.compile(".*id=\s(.*)");

Matcher mCode = null;
Matcher mName = null;
Matcher mTypeCategory = null;
Matcher mId = null;

The file I read it as text:

File archivo = new File (sFile);
FileReader fr = null;
try {
    fr = new FileReader (archivo);
} catch (FileNotFoundException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

BufferedReader br = new BufferedReader(fr);

And what gives me more headaches is that the data may come out of order. They do not have to go in that order. I had thought about doing a List of pattern and traverse it with a for:each to know which pattern is valid and, based on that, know which line I am in.

My question is: I read a line and do I have to compare the 4 patterns every time I read a new line to see what kind of data I am reading?

    
asked by Christian 16.02.2018 в 23:58
source

1 answer

0

INI files

You are reading an INI file.

[seccion1]
clave1=valor1
clave2=valor2

[seccion2]
clave1=valor3
clave2=valor4

The simplest thing would be to use a library specifically for that.

For example, with ini4j :

//cargar el carchivo como Wini
File archivo = new File(sFile); //sFile es la ruta
Wini ini = new Wini(archivo);

System.out.format("%-15s%-12s%-25s%s%n", "Codigo", "Categoria", "Nombre", "id");

//leer todas las secciones
Collection<Profile.Section> sectionList = ini.values();
for(Profile.Section section : sectionList){
    String code = section.getName();
    String name = section.get("name");
    String type = section.get("type");
    String id   = section.get("id");

    System.out.format("%-15s%-12s%-25s%s%n", code, type, name, id);
}


Read it manually

If you are still interested in analyzing it line by line, instead of generating a different pattern for each key, it is convenient to generate a generic pattern to differentiate sections of keys, and to populate some data structure with each value read, no matter which one is the name of each INI key.

import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.FileReader;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

import java.util.Map;
import java.util.List;
import java.util.ArrayList;
import java.util.HashMap;

We generate a class to record the value of each section:

//clase para 1 sección
class Seccion {
    public String code;
    public Map<String, String> claves = new HashMap<String, String>();

    public Seccion(String newCode) {
        code = newCode;
    }
}

We are using code , which will be the name of the section (and we pass it in the constructor). And in addition, we are a Map to store any key-value combination.

In this way, in the code where we read the file, we have a List of Seccion (the class that we declared above), and each element of the list will have the name of the section and all the keys with their respective values:

//list con todas las secciones
List<Seccion> ini = new ArrayList<Seccion>();
Seccion nuevaSeccion = new Seccion("");

And we are going to populate it as we read the file, seeing if it matches a regex for [seccion] or one for clave=valor :

//list con todas las secciones
List<Seccion> ini = new ArrayList<Seccion>();
Seccion nuevaSeccion = new Seccion("");

//regex para sección o clave
final Pattern reSeccion = Pattern.compile("^\h*\[(.*)\]");
final Pattern reClave   = Pattern.compile("^\h*([^=\s]+)\h*=\h*(\V*)");

//archivo
final String sFile = "ejemplo.ini";
String linea;


//LEER EL ARCHIVO
try {
    FileReader fr = new FileReader(sFile);
    BufferedReader bufferedReader = new BufferedReader(fr);
    while ((linea = bufferedReader.readLine()) != null) {
        //POPULAR LA LISTA (ini)
        //si es una sección (coincide el primer regex)
        Matcher matcher = reSeccion.matcher(linea);
        if (matcher.find()) {
            String code = matcher.group(1);   //el nombre de la sección capturado por el grupo 1
            nuevaSeccion = new Seccion(code); //creamos un nuevo objeto para esta sección
            ini.add(nuevaSeccion);            //lo agregamos a la lista
        } else {
            //si es una clave (coincide con el segundo regex)
            matcher = reClave.matcher(linea);
            if (matcher.find()) {
                String clave = matcher.group(1);      //el nombre de la clave es el grupo 1
                String valor = matcher.group(2);      //el valor es el grupo 2
                nuevaSeccion.claves.put(clave,valor); //lo agregamos al Map de claves de la sección actual
            }
        }
    }
    bufferedReader.close();
} catch (IOException e) {
    e.printStackTrace();
}


And then we can print all the values:

//IMPRMIR TODOS LOS VALORES DE LA LISTA
System.out.format("%-14s%-10s%-31s%s%n", "Codigo", "Categoria", "Nombre", "Id");
for (Seccion seccion : ini) {
    System.out.format(
        "%-14s%-10s%-31s%s%n", 
        seccion.code, 
        seccion.claves.get("type"), 
        seccion.claves.get("name"), 
        seccion.claves.get("id")
    );
}


Result:

Codigo        Categoria Nombre                         Id
018-0001      category  Distritos                      http://datos.madrid.es/egob/kos/actividades/1ciudad21distritos
018-0002      category  Actividades calle, arte urbano http://datos.madrid.es/egob/kos/actividades/ActividadesCalleArteUrbano
018-0003-0006 category  Carreras y maratones           http://datos.madrid.es/egob/kos/actividades/ActividadesDeportivas/CarrerasMaratones


Demo: link


Answer to your question

  

I read a line and I have to compare the 4 patterns every time I read a new line to see what kind of data I am reading?

Although it does not suit you, you can join different expressions in one, example:

(regex1)|(regex2)|(regex3)|(etc)

And then see if it coincided with the first expression:

if (matcher.group(1) != null) {

or with the second:

} else if (matcher.group(2) != null) {

... etc.


For example, for your particular case:

final String regex    = "^\h*(?:\[(\d{3}(?:-\d{4}){1,2})\]|(?:(type)|(name)|(id))\h*=\h*(\V*))$";
final Pattern pattern = Pattern.compile(regex);

String linea = "name= Actividades calle, arte urbano"; // <-- la línea leída del archivo
final Matcher matcher = pattern.matcher(linea);

String code, type, name, id;
code = type = name = id = "";

if (matcher.find()) {
    if (matcher.group(1) != null) {
        code = matcher.group(1);
    } else if (matcher.group(2) != null) {
        type = matcher.group(5);
    } else if (matcher.group(3) != null) {
        name = matcher.group(5);
    } else if (matcher.group(4) != null) {
        id = matcher.group(5);
    }

    System.out.format("Code: %s - Type: %s - Name: %s - Id: %s", code, type, name, id);
}

But I would recommend using any of the options above before this one.

    
answered by 06.03.2018 в 06:42