Subsets
DataArrays
The DataArray type is meant to behave like a standard Julia Array and tries to implement identical indexing rules:
One dimensional DataArray:
julia> using DataArrays
julia> dv = data([1, 2, 3])
3-element DataArray{Int64,1}:
1
2
3
julia> dv[1]
1
julia> dv[2] = NA
NA
julia> dv[2]
NA
Two dimensional DataArray:
julia> using DataArrays
julia> dm = data([1 2; 3 4])
2x2 DataArray{Int64,2}:
1 2
3 4
julia> dm[1, 1]
1
julia> dm[2, 1] = NA
NA
julia> dm[2, 1]
NA
DataFrames
In contrast, a DataFrame offers substantially more forms of indexing because columns can be referred to by name:
julia> using DataFrames
julia> df = DataFrame(A = 1:10, B = 2:2:20)
10x2 DataFrame
| Row | A | B |
|:----|:---|:---|
| 1 | 1 | 2 |
| 2 | 2 | 4 |
| 3 | 3 | 6 |
| 4 | 4 | 8 |
| 5 | 5 | 10 |
| 6 | 6 | 12 |
| 7 | 7 | 14 |
| 8 | 8 | 16 |
| 9 | 9 | 18 |
| 10 | 10 | 20 |
Refering to the first column by index or name:
julia> df[1]
10-element DataArray{Int64,1}:
1
2
3
4
5
6
7
8
9
10
julia> df[:A]
10-element DataArray{Int64,1}:
1
2
3
4
5
6
7
8
9
10
```
Refering to the first element of the first column:
```julia
julia> df[1, 1]
1
julia> df[1, :A]
1
Selecting a subset of rows by index and an (ordered) subset of columns by name:
julia> df[1:3, [:A, :B]]
3x2 DataFrame
| Row | A | B |
|:----|:--|:--|
| 1 | 1 | 2 |
| 2 | 2 | 4 |
| 3 | 3 | 6 |
julia> df[1:3, [:B, :A]]
3x2 DataFrame
| Row | B | A |
|:----|:--|:--|
| 1 | 2 | 1 |
| 2 | 4 | 2 |
| 3 | 6 | 3 |
Selecting a subset of rows by using a condition:
julia> df[df[:A] % 2 .== 0, :]
5x2 DataFrame
| Row | A | B |
|:----|:---|:---|
| 1 | 2 | 4 |
| 2 | 4 | 8 |
| 3 | 6 | 12 |
| 4 | 8 | 16 |
| 5 | 10 | 20 |
julia> df[df[:B] % 2 .== 0, :]
10x2 DataFrame
| Row | A | B |
|:----|:---|:---|
| 1 | 1 | 2 |
| 2 | 2 | 4 |
| 3 | 3 | 6 |
| 4 | 4 | 8 |
| 5 | 5 | 10 |
| 6 | 6 | 12 |
| 7 | 7 | 14 |
| 8 | 8 | 16 |
| 9 | 9 | 18 |
| 10 | 10 | 20 |