np.empty
is considered a junk vector in which it serves to know what type of data contains the vector and a np.array
returns an array, but in which case it is more convenient to use each of them.
np.empty
is considered a junk vector in which it serves to know what type of data contains the vector and a np.array
returns an array, but in which case it is more convenient to use each of them.
First of all we must bear in mind that both methods return an array of NumPy, specifically a numpy.ndarray
object.
numpy.empty
creates an array of the specified data size and type (if not specified, the type will be float
by default) reserving the necessary memory space for it. The most important thing to keep in mind is that does not initialize the array , that is, the value of its elements is indeterminate because it contains "junk values". The value of its elements are the result of what was previously stored in that part of the RAM (because it was used by another program, the result of the hardware check during POST, etc.).
No initialization of the array has the advantage of avoiding the small overhead that implies initializing each and every element to a given value, such as 0
in an array of integers. But the above does not usually justify this practice because it has a great disadvantage, if we do not go through the array afterwards, giving it appropriate values to each element, we can have unexpected results if we operate with these positions by mistake.
An example:
>>> import numpy as np
>>> arr = np.empty(4, dtype = int)
>>> arr[0] = 2
>>> arr[1] = 4
>>> arr[2] = 1
>>> sum(arr)
504
2 + 4 + 1 = 504? Although in this stupid example it may be easy to see the problem, in complex codes we can find errors that are difficult to track, or worse, that we obtain erroneous results that go unnoticed at first. The cause is simple, we "forgot" to assign a value to the last element of the array and when we declare our array it is not initialized, so you know what it contains in that position:
>>> arr = np.empty(4, dtype = int)
>>> arr
array([120313928, 497, 54957592, 497])
As a general rule, it is a good practice to initialize the array, and if it is not done, you have to be very careful in its handling and absolutely sure that we are going to fill all the elements with values adequate. We can use numpy.zeros
/ numpy.zeros_like
(initialize all values to 0), numpy.one
/ numpy.one_like
(initialize all values to 1) or numpy.full
/ numpy.full_like
(initializes to a given value).
The previous methods are used when we want to create an array to operate with it, but we do not have initial values to fill it.
numpy.array
instead allows us to build an array with known previous data from any similar object ("array-like") such as a list, a tuple, another array, etc. Therefore, it is normal to use it when we know the initial values of the array .