set.seed()
is a function used to set the initial status of the random number generator ( RNG ), particularly, in your example, what you are doing is establishing the "seed" which is any integer from which the following random numbers are calculated (strictly speaking, pseudo random). Doing this assures us that always starting from the same initial seed, we will obtain the same sequence of random values.
In R it is very important to be able to share reproducible code, that is why, when using functions that work with the RNG , for example: sample()
, runif()
, *norm()
to name only a few, establish an initial "seed" will make the random values are repeatable, ie who reproduces the code, you will get the same values as originally the author used in his code.
For example, if I share this code:
> rnorm(10)
[1] -0.3315776 1.1207127 0.2987237 0.7796219 1.4557851 -0.6443284 -1.5531374
[8] -1.5977095 1.8050975 -0.4816474
If you try to play it, you will get another set of numbers, however, by doing this:
> set.seed(12345)
> rnorm(10)
[1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875 -1.8179560 0.6300986
[8] -0.2761841 -0.2841597 -0.9193220
You should get the same series of numbers.
Now, returning to your question, it does not seem that set.seed()
had any kind of interference in the code, data(iris)
is effectively to load a data.frame
included in R, in fact it is not necessary to do it, since you can directly access iris
without problems and with datos <- irirs
we simply make a copy. The generation of random numbers could intervene for example if you wanted to keep a smaller sample of iris
:
set.seed(12345)
datos <- iris[sample(1:nrow(iris), 5), ]
datos
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
109 6.7 2.5 5.8 1.8 virginica
131 7.4 2.8 6.1 1.9 virginica
113 6.8 3.0 5.5 2.1 virginica
149 6.2 3.4 5.4 2.3 virginica
67 5.6 3.0 4.5 1.5 versicolor
In this case we use sample()
to generate 5 random row numbers and cut the data.frame
for these.