Subsets
DataArrays
The DataArray type is meant to behave like a standard Julia Array and tries to implement identical indexing rules:
One dimensional DataArray:
julia> using DataArrays
julia> dv = data([1, 2, 3])
3-element DataArray{Int64,1}:
1
2
3
julia> dv[1]
1
julia> dv[2] = NA
NA
julia> dv[2]
NA
Two dimensional DataArray:
julia> using DataArrays
julia> dm = data([1 2; 3 4])
2x2 DataArray{Int64,2}:
1 2
3 4
julia> dm[1, 1]
1
julia> dm[2, 1] = NA
NA
julia> dm[2, 1]
NA
DataFrames
In contrast, a DataFrame offers substantially more forms of indexing because columns can be referred to by name:
julia> using DataFrames julia> df = DataFrame(A = 1:10, B = 2:2:20) 10x2 DataFrame | Row | A | B | |-----|----|----| | 1 | 1 | 2 | | 2 | 2 | 4 | | 3 | 3 | 6 | | 4 | 4 | 8 | | 5 | 5 | 10 | | 6 | 6 | 12 | | 7 | 7 | 14 | | 8 | 8 | 16 | | 9 | 9 | 18 | | 10 | 10 | 20 |
Refering to the first column by index or name:
julia> df[1]
10-element DataArray{Int64,1}:
1
2
3
4
5
6
7
8
9
10
julia> df[:A]
10-element DataArray{Int64,1}:
1
2
3
4
5
6
7
8
9
10
Refering to the first element of the first column:
julia> df[1, 1] 1 julia> df[1, :A] 1
Selecting a subset of rows by index and an (ordered) subset of columns by name:
julia> df[1:3, [:A, :B]] 3x2 DataFrame | Row | A | B | |-----|---|---| | 1 | 1 | 2 | | 2 | 2 | 4 | | 3 | 3 | 6 | julia> df[1:3, [:B, :A]] 3x2 DataFrame | Row | B | A | |-----|---|---| | 1 | 2 | 1 | | 2 | 4 | 2 | | 3 | 6 | 3 |
Selecting a subset of rows by using a condition:
julia> df[df[:A] % 2 .== 0, :] 5x2 DataFrame | Row | A | B | |-----|----|----| | 1 | 2 | 4 | | 2 | 4 | 8 | | 3 | 6 | 12 | | 4 | 8 | 16 | | 5 | 10 | 20 | julia> df[df[:B] % 2 .== 0, :] 10x2 DataFrame | Row | A | B | |-----|----|----| | 1 | 1 | 2 | | 2 | 2 | 4 | | 3 | 3 | 6 | | 4 | 4 | 8 | | 5 | 5 | 10 | | 6 | 6 | 12 | | 7 | 7 | 14 | | 8 | 8 | 16 | | 9 | 9 | 18 | | 10 | 10 | 20 |