Great efficiency problem. Tips to optimize hash table in C?

2

Good morning, I am implementing the Greedy algorithm to color graphs. Graphs can have up to 1 000 000 of vertices. All integers are uint_32 , and to save them I use a hash table that uses dynamic arrays to solve collisions (and not linked lists because I read that in this way is optimized).

In theory, the algorithm should run in minutes, but only loading the graph takes me an hour .. Any recommendations? In the code I use calloc instead of malloc because I read that for hash tables it is recommended.

Do you recommend me to use malloc ? Or maybe the algorithm is very inefficient because my operating system is slow (I have ubuntu 32 bits with 1 Gb of RAM).

Here I show the code of the structures and how they are created.

struct _tabla_type {
     uint32_t tamanho;
     elemento_t *tabla; //array dinamico de elemenot_t
};

//inicializar (n = numero de vertices)
tabla_type nuevatabla(uint32_t n) {
    assert(n > 0);
    size_t h = sizeStructelem(); // el tamaño de la structura correspondiente (esta en otro .c por lo que no la puedo acceder directamente)
    tabla_type table = NULL;
    table = calloc(1, sizeof( struct _tabla_type)) ;
    table->tamanho = n;
    table->tabla = calloc(n,  sizeof(h));
    for (uint32_t i = 0 ; i < table->tamanho; i++)
    {
     table->tabla[i] = NULL;
    }

    return table;
}

// Cada indice de la tabla anterior tiene un elemento de la sig
// estructura, que es un array de dinamicos de vertices que colisionaron
//por la funcion hash
struct _elemento_t{
    VerticeSt *array_vertex;
    uint32_t used;
    uint32_t size;
};

// agregar elemento 
elemento_t newElement(uint32_t vertice, uint32_t vecino, elemento_t b) {
    VerticeSt Vertice = NuevoVertice(vertice,vecino);
    if (b == NULL){
        b = calloc(1, sizeof(struct _elemento_t));
        b->array_vertex = calloc(1, sizeof(struct _VerticeSt));
        b->array_vertex[0] = Vertice;
        b->used = 1;
        b->size = 1;
    }
    else{
        if (b->used == b->size) {
            b->size *= 2;
            b->array_vertex = (VerticeSt *)realloc(b->array_vertex, b->size * sizeof(struct _VerticeSt));
      }
      b->array_vertex[b->used++] = Vertice;
  }
 return b;
}


// en los correspondientes .h esta typedef struct _tabla_type *tabla_type

// y typedef struct _elemnto_t *elemento_t
    
asked by Igna94 29.04.2016 в 19:37
source

2 answers

1

I think the problem is that you make millions of dynamic memory requests (it seems like at least 2 million)

You have a dynamic structure with very small but extremely numerous elements.

Depending on the hardware that takes an hour, it seems to me until a short time.

The solution is to reduce the dynamic memory requests, I can not be more specific because I would lack the code of the implementation but I would choose not to make memory requests of each of the vertices. Either use static memory or look for another solution, but that's crushed your performance.

Apart from that, if you really want speed, you should review the use of SSE assembler . It's the way to get maximum speed and I think that for you, you could do well.

The only problem is that you have to consider the memory structures so they can make the most of the speed, but that will be another question: D

    
answered by 12.06.2016 в 09:06
1

Mathematically, a graph and a binary relation is the same (isomorphism).

A binary relationship can be "represented" with NxN Booleans.

If the number of booleans to true (it can be said that the number of links in the graph) is small compared to the number NxN , one of the possible data structures to represent it is a "Sparse Matrix of Booleans ".

If the nodes are not nodes of numbers ordered from (0 or 1) to N, but Strings or any number, a hash table is needed for each String to associate an index for the Sparse Matrix.

An example of Sparse Matrix: link

    
answered by 05.04.2017 в 12:02