Finding data
Frequently, we need to find data (i.e. rows of the table) that matches certain criteria, and there are multiple mechanisms for achieving this in Julia. Here we will briefly review map, findall and filter as options.
map(predicate, table)
Following the previous section, we can identify row satisfying an arbitrary predicate using the map function. Note that "predicate" is just a name for function that takes an input and returns either true or false.
julia> t = Table(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37])
Table with 2 columns and 3 rows:
name age
┌─────────────
1 │ Alice 25
2 │ Bob 42
3 │ Charlie 37
julia> is_old = map(row -> row.age > 40, t)
3-element Array{Bool,1}:
false
true
falseFinally, we can use "logical" (i.e. Boolean) indexing to extract the rows where the predicate is true.
julia> t[is_old]
Table with 2 columns and 1 row:
name age
┌──────────
1 │ Bob 42The map(predicate, table) approach will allocate one Bool for each row in the input table - for a total of length(table) bytes. SplitApplyCombine defines a mapview function to do this lazily.
findall(predicate, table)
If we wish to locate the indices of the rows where the predicate returns true, we can use Julia's findall function.
julia> inds = findall(row -> row.age > 40, t)
1-element Array{Int64,1}:
2
julia> t[inds]
Table with 2 columns and 1 row:
name age
┌──────────
1 │ Bob 42This method may be less resource intensive (result in less memory allocated) if you are expecting a small number of matching rows, returing one Int per result.
filter(predicate, table)
Finally, if we wish to directly filter the table and obtain the rows of interest, we can do that as well.
julia> filter(row -> row.age > 40, t)
Table with 2 columns and 1 row:
name age
┌──────────
1 │ Bob 42Internally, the filter method may rely on one of the implementations above.
Generators
Julia's "generator" syntax also allows for filtering operations using if.
julia> Table(row for row in t if row.age > 40)
Table with 2 columns and 1 row:
name age
┌──────────
1 │ Bob 42
This can be combined with mapping at the same time, as in Table(f(row) for row in table if predicate(row)). In Joining Data we discuss how to use generator syntax to combine multiple datasets.
Preselection
As mentioned in other sections, it is frequently worthwhile to preselect the columns relating to your search predicate, to avoid any wastage in fetching from memory values in columns that you don't care about.
One simple example of such a transformation is to first project to the column(s) of interest, followed by using map or findall to identify the indices of the rows where predicate is true, and finally to use getindex or view to obtain the result of the full table.
julia> inds = findall(age -> age > 40, t.age)
1-element Array{Int64,1}:
2
julia> t[inds]
Table with 2 columns and 1 row:
name age
┌──────────
1 │ Bob 42Easy, peasy!