polars_lazy

Module dsl

Source
Expand description

Domain specific language for the Lazy API.

This DSL revolves around the Expr type, which represents an abstract operation on a DataFrame, such as mapping over a column, filtering, group_by, or aggregation. In general, functions on LazyFrames consume the LazyFrame and produce a new LazyFrame representing the result of applying the function and passed expressions to the consumed LazyFrame. At runtime, when LazyFrame::collect is called, the expressions that comprise the LazyFrame’s logical plan are materialized on the actual underlying Series. For instance, let expr = col("x").pow(lit(2)).alias("x2"); would produce an expression representing the abstract operation of squaring the column "x" and naming the resulting column "x2", and to apply this operation to a LazyFrame, you’d use let lazy_df = lazy_df.with_column(expr);. (Of course, a column named "x" must either exist in the original DataFrame or be produced by one of the preceding operations on the LazyFrame.)

There are many, many free functions that this module exports that produce an Expr from scratch; col and lit are two examples. Expressions also have several methods, such as pow and alias, that consume them and produce a new expression.

Several expressions are only available when the necessary feature is enabled. Examples of features that unlock specialized expression include string, temporal, and dtype-categorical. These specialized expressions provide implementations of functions that you’d otherwise have to implement by hand.

Because of how abstract and flexible the Expr type is, care must be take to ensure you only attempt to perform sensible operations with them. For instance, as mentioned above, you have to make sure any columns you reference already exist in the LazyFrame. Furthermore, there is nothing stopping you from calling, for example, any with an expression that will yield an f64 column (instead of bool), or col("string") - col("f64"), which would attempt to subtract an f64 Series from a string Series. These kinds of invalid operations will only yield an error at runtime, when collect is called on the LazyFrame.

Re-exports§

pub use functions::*;

Modules§

binary
catdtype-categorical
dttemporal
function_expr
functions
Functions
stringstrings
udf

Structs§

ArrayNameSpace
Specialized expressions for Series of DataType::Array.
CategoricalNameSpace
Specialized expressions for Categorical dtypes.
ChainedThen
Utility struct for the when-then-otherwise expression.
ChainedWhen
Utility struct for the when-then-otherwise expression.
DatetimeArgs
Arguments used by datetime in order to produce an Expr of Datetime
DurationArgs
Arguments used by duration in order to produce an Expr of Duration
ExprNameNameSpace
Specialized expressions for modifying the name of existing expressions.
FieldsMapper
JoinOptions
ListNameSpace
Specialized expressions for Series of DataType::List.
MetaNameSpace
Specialized expressions for Categorical dtypes.
RollingCovOptions
SpecialEq
Wrapper type that has special equality properties depending on the inner type specialization
StrptimeOptions
StructNameSpace
Specialized expressions for Struct dtypes.
Then
Utility struct for the when-then-otherwise expression.
UnpivotArgsDSL
UserDefinedFunction
Represents a user-defined function
When
Utility struct for the when-then-otherwise expression.

Enums§

AggExpr
BooleanFunction
CategoricalFunction
Excluded
Expr
Expressions that can be used in various contexts.
FunctionExpr
JoinTypeOptionsIR
LazySerde
NestedType
Operator
PowFunction
Selector
StringFunction
StructFunction
TemporalFunction
WindowMapping
WindowType

Traits§

BinaryUdfOutputField
ColumnBinaryUdf
A wrapper trait for any binary closure Fn(Column, Column) -> PolarsResult<Column>
ColumnsUdf
A wrapper trait for any closure Fn(Vec<Series>) -> PolarsResult<Series>
ExprEvalExtensioncumulative_eval or list_eval
FunctionOutputField
IntoListNameSpacelist_eval
ListNameSpaceExtensionlist_eval
RenameAliasFn
UdfSchema

Functions§

all
Selects all columns. Shorthand for col("*").
all_horizontal
Create a new column with the bitwise-and of the elements in each row.
any_horizontal
Create a new column with the bitwise-or of the elements in each row.
apply_binary
Like map_binary, but used in a group_by-aggregation context.
apply_multiple
Apply a function/closure over the groups of multiple columns. This should only be used in a group_by aggregation.
arange
Generate a range of integers.
arg_sort_byrange
Find the indexes that would sort these series in order of appearance.
arg_wherearg_where
Get the indices where condition evaluates true.
as_struct
Take several expressions and collect them into a StructChunked.
avg
Find the mean of all the values in the column named name. Alias for mean.
binary_expr
Compute op(l, r) (or equivalently l op r). l and r must have types compatible with the Operator.
cast
Casts the column given by Expr to a different type.
coalesce
Folds the expressions from left to right keeping the first non-null values.
col
Create a Column Expression based on a column name.
cols
Select multiple columns by name.
concat_arr
Horizontally concatenate columns into a single array-type column.
concat_expr
concat_list
Concat lists entries.
concat_strconcat_str and strings
Horizontally concat string columns in linear time
cov
Compute the covariance between two columns.
cum_fold_exprsdtype-struct
Accumulate over multiple columns horizontally / row wise.
cum_reduce_exprsdtype-struct
Accumulate over multiple columns horizontally / row wise.
date_rangestemporal
Create a column of date ranges from a start and stop expression.
datetime
Construct a column of Datetime from the provided DatetimeArgs.
datetime_rangedtype-datetime
Create a datetime range from a start and stop expression.
datetime_rangesdtype-datetime
Create a column of datetime ranges from a start and stop expression.
dtype_col
Select multiple columns by dtype.
dtype_cols
Select multiple columns by dtype.
duration
Construct a column of Duration from the provided DurationArgs
first
First column in a DataFrame.
fold_exprs
Accumulate over multiple columns horizontally / row wise.
format_strconcat_str and strings
Format the results of an array of expressions using a format string
index_cols
Select multiple columns by index.
int_range
Generate a range of integers.
int_ranges
Generate a range of integers for each row of the input columns.
is_not_null
A column which is false wherever expr is null, true elsewhere.
is_null
A column which is true wherever expr is null, false elsewhere.
last
Last column in a DataFrame.
len
Return the number of rows in the context.
linear_space
Generate a series of equally-spaced points.
lit
Create a Literal Expression from L. A literal expression behaves like a column that contains a single distinct value.
map_binary
Apply a closure on the two columns that are evaluated from Expr a and Expr b.
map_list_multiple
Apply a function/closure over multiple columns once the logical plan get executed.
map_multiple
Apply a function/closure over multiple columns once the logical plan get executed.
max
Find the maximum of all the values in the column named name. Shorthand for col(name).max().
max_horizontal
Create a new column with the maximum value per row.
mean
Find the mean of all the values in the column named name. Shorthand for col(name).mean().
mean_horizontal
Compute the mean of all values horizontally across columns.
median
Find the median of all the values in the column named name. Shorthand for col(name).median().
min
Find the minimum of all the values in the column named name. Shorthand for col(name).min().
min_horizontal
Create a new column with the minimum value per row.
not
Negates a boolean column.
nth
Nth column in a DataFrame.
pearson_corr
Compute the pearson correlation between two columns.
quantile
Find a specific quantile of all the values in the column named name.
reduce_exprs
Analogous to Iterator::reduce.
repeat
Create a column of length n containing n copies of the literal value.
rolling_corrrolling_window and cov
rolling_covrolling_window and cov
spearman_rank_corrrank and propagate_nans
Compute the spearman rank correlation between two columns. Missing data will be excluded from the computation.
sum
Sum all the values in the column named name. Shorthand for col(name).sum().
sum_horizontal
Sum all values horizontally across columns.
ternary_expr
time_rangesdtype-time
Create a column of time ranges from a start and stop expression.
when
Start a when-then-otherwise expression.

Type Aliases§

FieldsNameMapperdtype-struct
GetOutput
OpaqueColumnUdf