Read excel with Apache POI in java by column name

3

Good afternoon experts, I have a problem with a project that consists of reading an excel file and validating the data by columns, I have managed to read the file without problems and also validations (through regular expressions) through the library Apache POI the detail is that I can only read columns by column index (example index 0 = column 1, index 1 = column 2 etc) with the method getColumnIndex() the problem is that I need to read them by their column name instead of its index since the file to read will tend to change the position of the columns, there is some method to solve this, I have tried with:

columnIndex = cell.getSheet().getRow(rowIndex).getCell(0).getRichStringCellValue().toString();

But I can only read the entire row but not all the columns and rows.

I attach the code I use to read the files:

 Workbook workbook = WorkbookFactory.create(new FileInputStream("C:\archivo.xlsx"));

    Sheet sheet = workbook.getSheetAt(0);
    totalRows = sheet.getPhysicalNumberOfRows();
    System.out.println("Número total de filas: " + totalRows);
    Iterator<Row> rowIterator = sheet.iterator();

    while (rowIterator.hasNext()) {
        Row row = rowIterator.next();
        rowIndex = row.getRowNum();

        int rowIndex2 = rowIndex +1;

        if (rowIndex2 < 8) {
            continue;
        }
        Iterator<Cell> cellIterator = row.cellIterator();

        while (cellIterator.hasNext()) {

            cell = cellIterator.next();
            columnIndex = cell.getColumnIndex();


            switch (columnIndex) {
                case 0:
                    columnName = "columna 1";
                    //System.out.println(columnName+" -> " + rowIndex);
                    break;
                case 1:
                    columnName = "Columna 2";
                    break;
                case 2:
                    columnName = "Columna 3";
                    break;

            }


            value = this.getValue(cell);
            valid = this.isValid(columnIndex, value);

            if (valid && (value != "")) {
                continue;
            }


            if (value!="")
            {
            System.out.print("Valorno válido: " + columnName + " - " + rowIndex2);
            System.out.println(" -> valor no válido: " + value);
            }

        }

    }

    return procesarBTR();
}

private String getValue(Cell cell) {
    switch (cell.getCellType()) {
        case Cell.CELL_TYPE_BLANK:
            // return columnName;
            return "";

        case Cell.CELL_TYPE_BOOLEAN:
             return "CELL_TYPE_BOOLEAN";
           // return cell.getStringCellValue();

        case Cell.CELL_TYPE_ERROR:
            return "CELL_TYPE_ERROR";

        case Cell.CELL_TYPE_FORMULA:

            /*switch (cell.getCachedFormulaResultType()) {
                case Cell.CELL_TYPE_STRING:
                    return cell.getStringCellValue();
                case Cell.CELL_TYPE_NUMERIC:
                    return cell.getNumericCellValue() + "";
                default:
                    return "";}*/
            return cell.getStringCellValue();


        case Cell.CELL_TYPE_NUMERIC:
            return fmt.formatCellValue(cell);

        case Cell.CELL_TYPE_STRING:
            return cell.getStringCellValue();

        default:
            return "valor desconocido";

    }

}

Code that performs validations . . .

    
asked by ricardo 10.03.2016 в 00:39
source

1 answer

1

As I mentioned in comments (adapted):

In Excel you can not name a column, that concept does not exist. Therefore, there will be no way in other tools to read columns by their "name". POI works by reading rows using the class Row and the cells in that row using the class Cell . According to what you tell me, your best option would be:

  • Declare a map that serves as a metadata for column name and column location (number).
  • In your Excel file and sheet to analyze, position yourself in the row that will have the names of the columns.
  • For each cell in that row:

    3.1. If the cell has text and is not empty, store a new entry in the map. The entry consists of the text value of the cell as a key and the column number as a value.

  • Once the metadata of your file has been analyzed, parse the rest of the data in the Excel using only the columns that you want to read. To do this, you rely on the metadata map. For this part, it is assumed that you must have a collection or structure where the "names of the columns" are going to be processed.
  • An example of the algorithm described above is provided (IMPORTANT: it is assumed that the cells are not combined and that there is only 1 single row for the names of the columns)

    //paso 0. Definir una colección con nombres de las columnas a procesar
    //considera que esto lo puedes leer de un archivo de configuración,
    //input de usuario o cualquier otra fuente
    List<String> columnas = Arrays.asList("Apellido", "Prima");
    //paso 1.
    Map<String, Integer> mapNombresColumnas = new HashMap<>();
    //paso 2.
    //número de fila donde están los nombres de celda
    //recuerda que POI está basado con índice 0
    //si tus nombres están en la fila 1, entonces deberías iniciar esta
    //variable con 0.
    final int filaNombresColumnas = ...;
    //ubicación del archivo Excel a procesar
    File archivoExcel = new File("/ruta/de/archivo/excel.xlsx");
    //abrir el archivo con POI
    Workbook workbook = WorkbookFactory.create(archivoExcel);
    //ubicarse en la hoja donde vas a procesar
    //si es la primera hoja, debes indicar 0
    Sheet sheet = workbook.getSheetAt(...);
    //acceder a la fila con los nombres de las columnas
    Row filaNombresColumna = sheet.getRow(filaNombresColumnas);
    //paso 3.
    //utilizando el poder de Java 8
    row.cellIterator().forEachRemaining(cell -> {
        //paso 3.1.
        String valorCelda = cell.getStringCellValue().trim();
        if (!valorCelda.isEmpty()) {
            mapNombresColumnas.put(valorCelda, cell.getColumnIndex());
        }
    });
    //paso 4.
    //se asume que los valores para procesar se encuentran en la fila
    //siguiente a la fila donde están los nombres de las columnas
    int indiceDatos = filaNombresColumnas + 1;
    Row filaDatos = null;
    //recorrer todas las filas con datos
    while ((filaDatos = sheet.getRow(indiceDatos++)) != null) {
        //se procesan solo las celdas en base a los "nombres" de esas columnas
        for (String col : columnas) {
           //el resultado de mapNombresColumnas.get(col) es
           //el número de columna a leer
           //en este caso, solo se imprime el resultado
           //puedes reemplazar esto por la manera en que debas procesar la información
           System.out.print(filaDatos.getCell(mapNombresColumnas.get(col)) + " ");
        }
        System.out.println();
    }
    

    For an Excel file where the first sheet has the following structure:

            A           B         C      D
    1    Nombre   | Apellido  | Edad | Prima
    2    Luis     | Hernández |   28 | 150.43
    3    José     | Zavala    |   35 | 170.21
    4    Mercedes | Velásquez |   17 | 112.86
    

    And using:

    final int filaNombresColumnas = 0;
    //...
    Sheet sheet = workbook.getSheetAt(0);
    

    The following result is obtained:

    Hernández 150.43 
    Zavala 170.21 
    Velásquez 112.86 
    
        
    answered by 03.04.2016 в 08:08