First of all we will set up a simpler example than yours to make it easier and more understandable, a data.frame
with only 3 groups of random data with different means in each, in your example it would be data1
and data2
repectively:
library(ggplot2)
set.seed(10)
df<-data.frame(x=rep(c("Grupo1","Grupo2","Grupo3"),100,each=100),
y=c(rnorm(100, mean = 3, sd = 1),
rnorm(100, mean = 10, sd = 1),
rnorm(100, mean = 1, sd = 2)
)
)
head(df,10)
x y
1 Grupo1 3.018746
2 Grupo1 2.815747
3 Grupo1 1.628669
4 Grupo1 2.400832
5 Grupo1 3.294545
6 Grupo1 3.389794
7 Grupo1 1.791924
8 Grupo1 2.636324
9 Grupo1 1.373327
10 Grupo1 2.743522
Let's start with a simple violin chart
ggplot(df, aes(x=x, y=y)) +
geom_violin() +
stat_summary(fun.y=mean, geom="point", shape=23, size=2, color = "black")
Three groups or violins arranged alphabetically and one point in each to indicate the value of the average. Let's see your questions now:
(1) I do not know what to do to order the violins from left to right, from a lower average to a higher average.
For this the Factors are very useful, they have internally an order determined by the levels
, when they are created automatically when creating a data.frame
, the order is alphabetical, so we simply have to order the levels
:
orden <- aggregate(y ~ x, df, mean)
orden <- orden[order(orden$y),]
df$x <- factor(df$x, levels = orden$x)
- With
aggregate(y ~ x, df, mean)
we group by x
and calculate the average of each of them
- With
orden <- orden[order(orden$y),]
we sort by the calculated average of each group in ascending order, if we want descending order: orden[-order(orden$y),]
- Finally we rearrange the
factor
by the desired order: df$x <- factor(df$x, levels = orden$x)
Now you just have to graph, the code remains the same:
And now the groups ordered by the media
(2) draw a horizontal line that represents the mean
For this, we will take advantage of the orden
object that we have created, this simply has the ordered means of each group:
x y
3 Grupo3 1.057504
2 Grupo1 2.863451
1 Grupo2 9.905037
To graph them as lines, we will add a geom_hline()
of the following Form:
ggplot(df, aes(x=x, y=y)) +
geom_violin() +
stat_summary(fun.y=mean, geom="point", shape=23, size=2, color = "black") +
geom_hline(data = orden, aes(group = x, yintercept = y), color = "red")
We get:
(3) How could I choose another range of colors that would differentiate the violins more from each other?
You have two options, either manually define the colors or use a function that returns a palette of the necessary size for the groups to be graphed. In any case you must set the parameter fill
for geom_violin
and relate it to each group: geom_violin(aes(fill=x))
, with this we say that the "filling" color of each violin is determined by the variable x
(in your case data2
). Then you just have to set the colors:
Manually:
scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
From a palette already defined in R:
scale_fill_manual(values=terrain.colors(3))
Note: There are several already available rainbow()
, heat.colors()
, terrain.colors()
, topo.colors()
, and cm.colors()
that you can evaluate. The operation is similar, you ask for a certain amount of colors and they will return a vector with them. You also have colorRampPalette()
that allows you to "generate" palettes from certain border colors that you pass: colfunc <- colorRampPalette(c("red", "yellow", "green"))
But you also have multiple palettes to choose from, product of the tastes and needs of the users who share them as packages, for example: RColorbrewer
that already offers functions to be integrated into ggplot
:
library("RColorBrewer")
scale_fill_brewer(palette="Blues")
Let's see a final example:
library("RColorBrewer")
ggplot(df, aes(x=x, y=y)) +
geom_violin(aes(fill=x)) +
stat_summary(fun.y=mean, geom="point", shape=23, size=2, color = "black") +
geom_hline(data = orden, aes(group = x, yintercept = y), color = "red") +
scale_fill_brewer(palette="Blues")
(4) Is there a repository where you can choose different color palettes to try?
There is no specific repository for palettes, you can search for these by the usual means to look for any package of R
, what yes there is a very complete collection of palettes in this site .
More info: