polars_core::frame::group_by

Struct GroupBy

Source
pub struct GroupBy<'df> {
    pub df: &'df DataFrame,
    /* private fields */
}
Available on crate feature algorithm_group_by only.
Expand description

Returned by a group_by operation on a DataFrame. This struct supports several aggregations.

Until described otherwise, the examples in this struct are performed on the following DataFrame:

use polars_core::prelude::*;

let dates = &[
"2020-08-21",
"2020-08-21",
"2020-08-22",
"2020-08-23",
"2020-08-22",
];
// date format
let fmt = "%Y-%m-%d";
// create date series
let s0 = DateChunked::parse_from_str_slice("date", dates, fmt)
        .into_series();
// create temperature series
let s1 = Series::new("temp".into(), [20, 10, 7, 9, 1]);
// create rain series
let s2 = Series::new("rain".into(), [0.2, 0.1, 0.3, 0.1, 0.01]);
// create a new DataFrame
let df = DataFrame::new(vec![s0, s1, s2]).unwrap();
println!("{:?}", df);

Outputs:

+------------+------+------+
| date       | temp | rain |
| ---        | ---  | ---  |
| Date       | i32  | f64  |
+============+======+======+
| 2020-08-21 | 20   | 0.2  |
+------------+------+------+
| 2020-08-21 | 10   | 0.1  |
+------------+------+------+
| 2020-08-22 | 7    | 0.3  |
+------------+------+------+
| 2020-08-23 | 9    | 0.1  |
+------------+------+------+
| 2020-08-22 | 1    | 0.01 |
+------------+------+------+

Fields§

§df: &'df DataFrame

Implementations§

Source§

impl<'df> GroupBy<'df>

Source

pub fn new( df: &'df DataFrame, by: Vec<Column>, groups: GroupsProxy, selected_agg: Option<Vec<PlSmallStr>>, ) -> Self

Source

pub fn select<I: IntoIterator<Item = S>, S: Into<PlSmallStr>>( self, selection: I, ) -> Self

Select the column(s) that should be aggregated. You can select a single column or a slice of columns.

Note that making a selection with this method is not required. If you skip it all columns (except for the keys) will be selected for aggregation.

Source

pub fn get_groups(&self) -> &GroupsProxy

Get the internal representation of the GroupBy operation. The Vec returned contains: (first_idx, Vec<indexes>) Where second value in the tuple is a vector with all matching indexes.

Source

pub unsafe fn get_groups_mut(&mut self) -> &mut GroupsProxy

Get the internal representation of the GroupBy operation. The Vec returned contains: (first_idx, Vec<indexes>) Where second value in the tuple is a vector with all matching indexes.

§Safety

Groups should always be in bounds of the DataFrame hold by this [GroupBy]. If you mutate it, you must hold that invariant.

Source

pub fn take_groups(self) -> GroupsProxy

Source

pub fn take_groups_mut(&mut self) -> GroupsProxy

Source

pub fn keys_sliced(&self, slice: Option<(i64, usize)>) -> Vec<Column>

Source

pub fn keys(&self) -> Vec<Column>

Source

pub fn mean(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped series and compute the mean per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp", "rain"]).mean()
}

Returns:

+------------+-----------+-----------+
| date       | temp_mean | rain_mean |
| ---        | ---       | ---       |
| Date       | f64       | f64       |
+============+===========+===========+
| 2020-08-23 | 9         | 0.1       |
+------------+-----------+-----------+
| 2020-08-22 | 4         | 0.155     |
+------------+-----------+-----------+
| 2020-08-21 | 15        | 0.15      |
+------------+-----------+-----------+
Source

pub fn sum(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped series and compute the sum per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).sum()
}

Returns:

+------------+----------+
| date       | temp_sum |
| ---        | ---      |
| Date       | i32      |
+============+==========+
| 2020-08-23 | 9        |
+------------+----------+
| 2020-08-22 | 8        |
+------------+----------+
| 2020-08-21 | 30       |
+------------+----------+
Source

pub fn min(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped series and compute the minimal value per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).min()
}

Returns:

+------------+----------+
| date       | temp_min |
| ---        | ---      |
| Date       | i32      |
+============+==========+
| 2020-08-23 | 9        |
+------------+----------+
| 2020-08-22 | 1        |
+------------+----------+
| 2020-08-21 | 10       |
+------------+----------+
Source

pub fn max(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped series and compute the maximum value per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).max()
}

Returns:

+------------+----------+
| date       | temp_max |
| ---        | ---      |
| Date       | i32      |
+============+==========+
| 2020-08-23 | 9        |
+------------+----------+
| 2020-08-22 | 7        |
+------------+----------+
| 2020-08-21 | 20       |
+------------+----------+
Source

pub fn first(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and find the first value per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).first()
}

Returns:

+------------+------------+
| date       | temp_first |
| ---        | ---        |
| Date       | i32        |
+============+============+
| 2020-08-23 | 9          |
+------------+------------+
| 2020-08-22 | 7          |
+------------+------------+
| 2020-08-21 | 20         |
+------------+------------+
Source

pub fn last(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and return the last value per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).last()
}

Returns:

+------------+------------+
| date       | temp_last |
| ---        | ---        |
| Date       | i32        |
+============+============+
| 2020-08-23 | 9          |
+------------+------------+
| 2020-08-22 | 1          |
+------------+------------+
| 2020-08-21 | 10         |
+------------+------------+
Source

pub fn n_unique(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series by counting the number of unique values.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).n_unique()
}

Returns:

+------------+---------------+
| date       | temp_n_unique |
| ---        | ---           |
| Date       | u32           |
+============+===============+
| 2020-08-23 | 1             |
+------------+---------------+
| 2020-08-22 | 2             |
+------------+---------------+
| 2020-08-21 | 2             |
+------------+---------------+
Source

pub fn quantile( &self, quantile: f64, method: QuantileMethod, ) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and determine the quantile per group.

§Example

fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).quantile(0.2, QuantileMethod::default())
}
Source

pub fn median(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and determine the median per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).median()
}
Source

pub fn var(&self, ddof: u8) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and determine the variance per group.

Source

pub fn std(&self, ddof: u8) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and determine the standard deviation per group.

Source

pub fn count(&self) -> PolarsResult<DataFrame>

Aggregate grouped series and compute the number of values per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).count()
}

Returns:

+------------+------------+
| date       | temp_count |
| ---        | ---        |
| Date       | u32        |
+============+============+
| 2020-08-23 | 1          |
+------------+------------+
| 2020-08-22 | 2          |
+------------+------------+
| 2020-08-21 | 2          |
+------------+------------+
Source

pub fn groups(&self) -> PolarsResult<DataFrame>

Get the group_by group indexes.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.groups()
}

Returns:

+--------------+------------+
| date         | groups     |
| ---          | ---        |
| Date(days)   | list [u32] |
+==============+============+
| 2020-08-23   | "[3]"      |
+--------------+------------+
| 2020-08-22   | "[2, 4]"   |
+--------------+------------+
| 2020-08-21   | "[0, 1]"   |
+--------------+------------+
Source

pub fn agg_list(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate the groups of the group_by operation into lists.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    // GroupBy and aggregate to Lists
    df.group_by(["date"])?.select(["temp"]).agg_list()
}

Returns:

+------------+------------------------+
| date       | temp_agg_list          |
| ---        | ---                    |
| Date       | list [i32]             |
+============+========================+
| 2020-08-23 | "[Some(9)]"            |
+------------+------------------------+
| 2020-08-22 | "[Some(7), Some(1)]"   |
+------------+------------------------+
| 2020-08-21 | "[Some(20), Some(10)]" |
+------------+------------------------+
Source

pub fn par_apply<F>(&self, f: F) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Apply a closure over the groups as a new DataFrame in parallel.

Source

pub fn apply<F>(&self, f: F) -> PolarsResult<DataFrame>

Apply a closure over the groups as a new DataFrame.

Trait Implementations§

Source§

impl<'df> Clone for GroupBy<'df>

Source§

fn clone(&self) -> GroupBy<'df>

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<'df> Debug for GroupBy<'df>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<'df> Freeze for GroupBy<'df>

§

impl<'df> !RefUnwindSafe for GroupBy<'df>

§

impl<'df> Send for GroupBy<'df>

§

impl<'df> Sync for GroupBy<'df>

§

impl<'df> Unpin for GroupBy<'df>

§

impl<'df> !UnwindSafe for GroupBy<'df>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> DynClone for T
where T: Clone,

Source§

fn __clone_box(&self, _: Private) -> *mut ()

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize = _

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V