Struct polars::prelude::LazyFrame[][src]

pub struct LazyFrame { /* fields omitted */ }
Expand description

Lazy abstraction over an eager DataFrame. It really is an abstraction over a logical plan. The methods of this struct will incrementally modify a logical plan until output is requested (via collect)

Implementations

Get a hold on the schema of the current LazyFrame computation.

Create a LazyFrame directly from a parquet scan.

Get a dot language representation of the LogicalPlan.

Toggle projection pushdown optimization.

Toggle predicate pushdown optimization.

Toggle type coercion optimization.

Toggle expression simplification optimization on or off

Toggle aggregate pushdown.

Toggle global string cache.

Toggle join pruning optimization

Describe the logical plan.

Describe the optimized logical plan.

Add a sort operation to the logical plan.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

/// Sort DataFrame by 'sepal.width' column
fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .sort("sepal.width", false)
}

Add a sort operation to the logical plan.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

/// Sort DataFrame by 'sepal.width' column
fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .sort_by_exprs(vec![col("sepal.width")], vec![false])
}

Reverse the DataFrame

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .reverse()
}

Rename a column in the DataFrame

Rename columns in the DataFrame. This does not preserve ordering.

Removes columns from the DataFrame. Note that its better to only select the columns you need and let the projection pushdown optimize away the unneeded columns.

Shift the values by a given period and fill the parts that will be empty due to this operation with Nones.

See the method on Series for more info on the shift operation.

Shift the values by a given period and fill the parts that will be empty due to this operation with the result of the fill_value expression.

See the method on Series for more info on the shift operation.

Fill none values in the DataFrame

Fill NaN values in the DataFrame

Caches the result into a new LazyFrame. This should be used to prevent computations running multiple times

Fetch is like a collect operation, but it overwrites the number of rows read by every scan operation. This is a utility that helps debug a query on a smaller number of rows.

Note that the fetch does not guarantee the final number of rows in the DataFrame. Filter, join operations and a lower number of rows available in the scanned file influence the final number of rows.

Execute all the lazy operations and collect them into a DataFrame. Before execution the query is being optimized.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> Result<DataFrame> {
    df.lazy()
      .groupby([col("foo")])
      .agg([col("bar").sum(), col("ham").mean().alias("avg_ham")])
      .collect()
}

Filter by some predicate expression.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .filter(col("sepal.width").is_not_null())
        .select(&[col("sepal.width"), col("sepal.length")])
}

Select (and rename) columns from the query.

Columns can be selected with col; If you want to select all columns use col("*").

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

/// This function selects column "foo" and column "bar".
/// Column "bar" is renamed to "ham".
fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
        .select(&[col("foo"),
                  col("bar").alias("ham")])
}

/// This function selects all columns except "foo"
fn exclude_a_column(df: DataFrame) -> LazyFrame {
      df.lazy()
        .select(&[col("*").exclude("foo")])
}

Group by and aggregate.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(df: DataFrame) -> LazyFrame {
      df.lazy()
       .groupby([col("date")])
       .agg([
           col("rain").min(),
           col("rain").sum(),
           col("rain").quantile(0.5).alias("median_rain"),
       ])
       .sort("date", false)
}

Similar to groupby, but order of the DataFrame is maintained.

Join query with other lazy query.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .left_join(other, col("foo"), col("bar"))
}

Join query with other lazy query.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .outer_join(other, col("foo"), col("bar"))
}

Join query with other lazy query.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn join_dataframes(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .inner_join(other, col("foo"), col("bar").cast(DataType::Utf8))
}

Creates the cartesian product from both frames, preserves the order of the left keys.

Generic join function that can join on multiple columns.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;

fn example(ldf: LazyFrame, other: LazyFrame) -> LazyFrame {
        ldf
        .join(other, vec![col("foo"), col("bar")], vec![col("foo"), col("bar")], JoinType::Inner)
}

Control more join options with the join builder.

Add a column to a DataFrame

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn add_column(df: DataFrame) -> LazyFrame {
    df.lazy()
        .with_column(
            when(col("sepal.length").lt(lit(5.0)))
            .then(lit(10))
            .otherwise(lit(1))
            .alias("new_column_name"),
            )
}

Add multiple columns to a DataFrame.

Example
use polars_core::prelude::*;
use polars_lazy::prelude::*;
fn add_columns(df: DataFrame) -> LazyFrame {
    df.lazy()
        .with_columns(
            vec![lit(10).alias("foo"), lit(100).alias("bar")]
         )
}

Aggregate all the columns as their maximum values.

Aggregate all the columns as their minimum values.

Aggregate all the columns as their sum values.

Aggregate all the columns as their mean values.

Aggregate all the columns as their median values.

Aggregate all the columns as their quantile values.

Aggregate all the columns as their standard deviation values.

Aggregate all the columns as their variance values.

Apply explode operation. See eager explode.

Drop duplicate rows. See eager.

Drop null rows.

Equal to LazyFrame::filter(col("*").is_not_null())

Slice the DataFrame.

Get the first row.

Get the last row

Get the n last rows

Melt the DataFrame from wide to long format

Limit the DataFrame to the first n rows. Note if you don’t want the rows to be scanned, use fetch.

Apply a function/closure once the logical plan get executed.

Warning

This can blow up in your face if the schema is changed due to the operation. The optimizer relies on a correct schema.

You can toggle certain optimizations off.

Add a new column at index 0 that counts the rows.

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Returns the “default value” for a type. Read more

Performs the conversion.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Performs the conversion.

Performs the conversion.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

🔬 This is a nightly-only experimental API. (toowned_clone_into)

recently added

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.