Greetings to all, I am starting in the world of R Language and there are certain instructions that I do not understand its purpose. Like the following:


I found that the 1st line generates sequences of random numbers but I do not understand in what way it affects the following lines or with what other instructions would be easier to visualize or how it could be used in other cases.

When searching I found that the 2nd line is used to load an object in the memory, in this case the iris object itself of R but if I try to execute this instruction I do not see the significant change in any tab of the console or any change in the sequence of the code even if the 2nd line is executed to assign the value of iris to a dataframe called data.

set.seed() is a function used to set the initial status of the random number generator ( RNG ), particularly, in your example, what you are doing is establishing the "seed" which is any integer from which the following random numbers are calculated (strictly speaking, pseudo random). Doing this assures us that always starting from the same initial seed, we will obtain the same sequence of random values.

In R it is very important to be able to share reproducible code, that is why, when using functions that work with the RNG , for example: sample() , runif() , *norm() to name only a few, establish an initial "seed" will make the random values are repeatable, ie who reproduces the code, you will get the same values as originally the author used in his code.

For example, if I share this code:

> rnorm(10)
 [1] -0.3315776  1.1207127  0.2987237  0.7796219  1.4557851 -0.6443284 -1.5531374
 [8] -1.5977095  1.8050975 -0.4816474

If you try to play it, you will get another set of numbers, however, by doing this:

> set.seed(12345)
> rnorm(10)
 [1]  0.5855288  0.7094660 -0.1093033 -0.4534972  0.6058875 -1.8179560  0.6300986
 [8] -0.2761841 -0.2841597 -0.9193220

You should get the same series of numbers.

Now, returning to your question, it does not seem that set.seed() had any kind of interference in the code, data(iris) is effectively to load a data.frame included in R, in fact it is not necessary to do it, since you can directly access iris without problems and with datos <- irirs we simply make a copy. The generation of random numbers could intervene for example if you wanted to keep a smaller sample of iris :

datos <- iris[sample(1:nrow(iris), 5), ]

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
109          6.7         2.5          5.8         1.8  virginica
131          7.4         2.8          6.1         1.9  virginica
113          6.8         3.0          5.5         2.1  virginica
149          6.2         3.4          5.4         2.3  virginica
67           5.6         3.0          4.5         1.5  versicolor 

In this case we use sample() to generate 5 random row numbers and cut the data.frame for these.

