Best search algorithm in C #


I was wondering if anyone knows what kind of methods are the most efficient to implement in very intensive searches (arrays) (I speak of arrays of millions of elements) from C #, for example it is better to use IndexOf or BinarySearch to obtain the index of the item? Should I use arrays or HashSet? and how could I use the HashSet to find the matches in the array?

A logical example of the algorithm I need:

Block: 071827490593720123213023498230402000813

Value to find: 40200

Objective: Return the index where said group of numbers is

What you should return: 30

Code that I currently implement:

int offSet = 0;
while ((offSet = Array.IndexOf(bloque, encontrar[0], offSet)) != -1)
    if (encontrar.Length > 1)
        for (int i = 1; i < encontrar.Length; i++)
            if (bloque.Length <= offSet + encontrar.Length) break;
            else if (encontrar[i] != bloque[offSet + i])
                if (bloque[offSet + i] == encontrar[0])
                { offSet += (i - 1); break; }
                else if (i == encontrar.Length - 1)
                { offSet += i; break; }
            else if (i == encontrar.Length - 1)
                addresses.Add(new IntPtr((int)baseAddress + offSet));
    else addresses.Add(new IntPtr((int)baseAddress + offSet));

This algorithm is not slow, but it is not as fast looking as the program with which I am comparing it. The program that I am developing opens the processes and looks for values in its memory regions (Yes, I am comparing it with Cheat Engine). As you can see it is more or less similar to the Boyer Moore algorithm, but I need to know if I can replace functions to increase performance or if I should remove or change something in the logic of the algorithm to increase performance.


asked by Mr.Noone 01.09.2016 в 03:44

2 answers


First of all, I would do profiling of the executions to see if they tell me anything about the behavior of the code.

Things that I would try, because these things I understand are more trial and error than anything else. You may have done this at this time:

  • Separate completely the case where the length of encontrar is 1 (we get a if of loop.
  • Use a for instead of a while with what I understand you could eliminate the offSet += (i - 1) staying only with the break
  • It would seek to delegate or delay the creation of objects (that is, the addresses.Add(new IntPtr((int)baseAddress + offSet)) .) The instantiation and aggregation to the collection could be expensive with respect to the execution of the loop.


Rational: You are casting, creating an IntPtr object and adding it to address. It may be that the second particular operation is expensive compared to the rest of the algorithm, after all creating objects is not free (it is a hypothesis that you should try).

What could be done about it? Well, you could try delegating that task to another process or thread.

That is, put together a producer / consumer scheme, where the producer (your current process) passes the consumer the baseAddress + offset and the consumer creates the IntPtr and adds it to its collection.

Then at the end of the execution, you can ask the consumer to create the IntPtr that returns the generated IntPtr collection.

The idea is that in a multiprocessor environment, the search process and the creation process fall into different processors and with this you could gain better performance.

Here an article about it.

Of course, I can be wrong in everything. It may be much more interesting to read your results in the form of an answer.

answered by 08.09.2016 / 05:33

You did not evaluate using Regular Expression to find the block

with this you could search for all matches

string bloque = "071827490593720123213023498230402000813";

Regex rx = new Regex("40200");
foreach (Match match in rx.Matches(bloque ))
   int i = match.Index;

Source: Is there a function that returns index where RegEx match starts?

answered by 02.09.2016 в 18:58