Struct polars::prelude::LazyFrame [−][src]
pub struct LazyFrame { /* fields omitted */ }
Expand description
Lazy abstraction over an eager DataFrame
.
It really is an abstraction over a logical plan. The methods of this struct will incrementally
modify a logical plan until output is requested (via collect)
Implementations
Get a hold on the schema of the current LazyFrame computation.
Create a LazyFrame directly from a parquet scan.
Get a dot language representation of the LogicalPlan.
Toggle projection pushdown optimization.
Toggle predicate pushdown optimization.
Toggle type coercion optimization.
Toggle expression simplification optimization on or off
Toggle aggregate pushdown.
Toggle global string cache.
Toggle join pruning optimization
Describe the logical plan.
Describe the optimized logical plan.
Add a sort operation to the logical plan.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
/// Sort DataFrame by 'sepal.width' column
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.sort("sepal.width", false)
}
Add a sort operation to the logical plan.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
/// Sort DataFrame by 'sepal.width' column
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.sort_by_exprs(vec![col("sepal.width")], vec![false])
}
Reverse the DataFrame
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.reverse()
}
Rename a column in the DataFrame
pub fn rename<I, J, T, S>(self, existing: I, new: J) -> LazyFrame where
I: IntoIterator<Item = T> + Clone,
J: IntoIterator<Item = S>,
T: AsRef<str>,
S: AsRef<str>,
pub fn rename<I, J, T, S>(self, existing: I, new: J) -> LazyFrame where
I: IntoIterator<Item = T> + Clone,
J: IntoIterator<Item = S>,
T: AsRef<str>,
S: AsRef<str>,
Rename columns in the DataFrame. This does not preserve ordering.
pub fn drop_columns<I, T>(self, columns: I) -> LazyFrame where
I: IntoIterator<Item = T>,
T: AsRef<str>,
pub fn drop_columns<I, T>(self, columns: I) -> LazyFrame where
I: IntoIterator<Item = T>,
T: AsRef<str>,
Removes columns from the DataFrame. Note that its better to only select the columns you need and let the projection pushdown optimize away the unneeded columns.
Shift the values by a given period and fill the parts that will be empty due to this operation
with Nones
.
See the method on Series for more info on the shift
operation.
Shift the values by a given period and fill the parts that will be empty due to this operation
with the result of the fill_value
expression.
See the method on Series for more info on the shift
operation.
Caches the result into a new LazyFrame. This should be used to prevent computations running multiple times
Fetch is like a collect operation, but it overwrites the number of rows read by every scan operation. This is a utility that helps debug a query on a smaller number of rows.
Note that the fetch does not guarantee the final number of rows in the DataFrame. Filter, join operations and a lower number of rows available in the scanned file influence the final number of rows.
pub fn optimize(
self,
lp_arena: &mut Arena<ALogicalPlan>,
expr_arena: &mut Arena<AExpr>
) -> Result<Node, PolarsError>
Execute all the lazy operations and collect them into a DataFrame. Before execution the query is being optimized.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn example(df: DataFrame) -> Result<DataFrame> {
df.lazy()
.groupby([col("foo")])
.agg([col("bar").sum(), col("ham").mean().alias("avg_ham")])
.collect()
}
Filter by some predicate expression.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.filter(col("sepal.width").is_not_null())
.select(&[col("sepal.width"), col("sepal.length")])
}
Select (and rename) columns from the query.
Columns can be selected with col;
If you want to select all columns use col("*")
.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
/// This function selects column "foo" and column "bar".
/// Column "bar" is renamed to "ham".
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.select(&[col("foo"),
col("bar").alias("ham")])
}
/// This function selects all columns except "foo"
fn exclude_a_column(df: DataFrame) -> LazyFrame {
df.lazy()
.select(&[col("*").exclude("foo")])
}
Group by and aggregate.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn example(df: DataFrame) -> LazyFrame {
df.lazy()
.groupby([col("date")])
.agg([
col("rain").min(),
col("rain").sum(),
col("rain").quantile(0.5).alias("median_rain"),
])
.sort("date", false)
}
Similar to groupby, but order of the DataFrame is maintained.
Join query with other lazy query.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
ldf
.left_join(other, col("foo"), col("bar"))
}
Join query with other lazy query.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
ldf
.outer_join(other, col("foo"), col("bar"))
}
Join query with other lazy query.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
ldf
.inner_join(other, col("foo"), col("bar").cast(DataType::Utf8))
}
Creates the cartesian product from both frames, preserves the order of the left keys.
Generic join function that can join on multiple columns.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn example(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
ldf
.join(other, vec![col("foo"), col("bar")], vec![col("foo"), col("bar")], JoinType::Inner)
}
Control more join options with the join builder.
Add a column to a DataFrame
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn add_column(df: DataFrame) -> LazyFrame {
df.lazy()
.with_column(
when(col("sepal.length").lt(lit(5.0)))
.then(lit(10))
.otherwise(lit(1))
.alias("new_column_name"),
)
}
Add multiple columns to a DataFrame.
Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn add_columns(df: DataFrame) -> LazyFrame {
df.lazy()
.with_columns(
vec![lit(10).alias("foo"), lit(100).alias("bar")]
)
}
Aggregate all the columns as their quantile values.
Apply explode operation. See eager explode.
Drop duplicate rows. See eager.
Drop null rows.
Equal to LazyFrame::filter(col("*").is_not_null())
Melt the DataFrame from wide to long format
Limit the DataFrame to the first n
rows. Note if you don’t want the rows to be scanned,
use fetch.
Apply a function/closure once the logical plan get executed.
Warning
This can blow up in your face if the schema is changed due to the operation. The optimizer relies on a correct schema.
You can toggle certain optimizations off.
Add a new column at index 0 that counts the rows.
Trait Implementations
Performs the conversion.
Auto Trait Implementations
impl !RefUnwindSafe for LazyFrame
impl !UnwindSafe for LazyFrame
Blanket Implementations
Mutably borrows from an owned value. Read more