polars_core::frame::group_by

Struct GroupBy

Source
pub struct GroupBy<'df> {
    pub df: &'df DataFrame,
    /* private fields */
}
Available on crate feature algorithm_group_by only.
Expand description

Returned by a group_by operation on a DataFrame. This struct supports several aggregations.

Until described otherwise, the examples in this struct are performed on the following DataFrame:

use polars_core::prelude::*;

let dates = &[
"2020-08-21",
"2020-08-21",
"2020-08-22",
"2020-08-23",
"2020-08-22",
];
// date format
let fmt = "%Y-%m-%d";
// create date series
let s0 = DateChunked::parse_from_str_slice("date", dates, fmt)
        .into_series();
// create temperature series
let s1 = Series::new("temp".into(), [20, 10, 7, 9, 1]);
// create rain series
let s2 = Series::new("rain".into(), [0.2, 0.1, 0.3, 0.1, 0.01]);
// create a new DataFrame
let df = DataFrame::new(vec![s0, s1, s2]).unwrap();
println!("{:?}", df);

Outputs:

+------------+------+------+
| date       | temp | rain |
| ---        | ---  | ---  |
| Date       | i32  | f64  |
+============+======+======+
| 2020-08-21 | 20   | 0.2  |
+------------+------+------+
| 2020-08-21 | 10   | 0.1  |
+------------+------+------+
| 2020-08-22 | 7    | 0.3  |
+------------+------+------+
| 2020-08-23 | 9    | 0.1  |
+------------+------+------+
| 2020-08-22 | 1    | 0.01 |
+------------+------+------+

Fields§

§df: &'df DataFrame

Implementations§

Source§

impl<'df> GroupBy<'df>

Source

pub fn new( df: &'df DataFrame, by: Vec<Column>, groups: GroupsProxy, selected_agg: Option<Vec<PlSmallStr>>, ) -> Self

Source

pub fn select<I: IntoIterator<Item = S>, S: Into<PlSmallStr>>( self, selection: I, ) -> Self

Select the column(s) that should be aggregated. You can select a single column or a slice of columns.

Note that making a selection with this method is not required. If you skip it all columns (except for the keys) will be selected for aggregation.

Source

pub fn get_groups(&self) -> &GroupsProxy

Get the internal representation of the GroupBy operation. The Vec returned contains: (first_idx, Vec<indexes>) Where second value in the tuple is a vector with all matching indexes.

Source

pub unsafe fn get_groups_mut(&mut self) -> &mut GroupsProxy

Get the internal representation of the GroupBy operation. The Vec returned contains: (first_idx, Vec<indexes>) Where second value in the tuple is a vector with all matching indexes.

§Safety

Groups should always be in bounds of the DataFrame hold by this GroupBy. If you mutate it, you must hold that invariant.

Source

pub fn take_groups(self) -> GroupsProxy

Source

pub fn take_groups_mut(&mut self) -> GroupsProxy

Source

pub fn keys_sliced(&self, slice: Option<(i64, usize)>) -> Vec<Column>

Source

pub fn keys(&self) -> Vec<Column>

Source

pub fn mean(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped series and compute the mean per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp", "rain"]).mean()
}

Returns:

+------------+-----------+-----------+
| date       | temp_mean | rain_mean |
| ---        | ---       | ---       |
| Date       | f64       | f64       |
+============+===========+===========+
| 2020-08-23 | 9         | 0.1       |
+------------+-----------+-----------+
| 2020-08-22 | 4         | 0.155     |
+------------+-----------+-----------+
| 2020-08-21 | 15        | 0.15      |
+------------+-----------+-----------+
Source

pub fn sum(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped series and compute the sum per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).sum()
}

Returns:

+------------+----------+
| date       | temp_sum |
| ---        | ---      |
| Date       | i32      |
+============+==========+
| 2020-08-23 | 9        |
+------------+----------+
| 2020-08-22 | 8        |
+------------+----------+
| 2020-08-21 | 30       |
+------------+----------+
Source

pub fn min(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped series and compute the minimal value per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).min()
}

Returns:

+------------+----------+
| date       | temp_min |
| ---        | ---      |
| Date       | i32      |
+============+==========+
| 2020-08-23 | 9        |
+------------+----------+
| 2020-08-22 | 1        |
+------------+----------+
| 2020-08-21 | 10       |
+------------+----------+
Source

pub fn max(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped series and compute the maximum value per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).max()
}

Returns:

+------------+----------+
| date       | temp_max |
| ---        | ---      |
| Date       | i32      |
+============+==========+
| 2020-08-23 | 9        |
+------------+----------+
| 2020-08-22 | 7        |
+------------+----------+
| 2020-08-21 | 20       |
+------------+----------+
Source

pub fn first(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and find the first value per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).first()
}

Returns:

+------------+------------+
| date       | temp_first |
| ---        | ---        |
| Date       | i32        |
+============+============+
| 2020-08-23 | 9          |
+------------+------------+
| 2020-08-22 | 7          |
+------------+------------+
| 2020-08-21 | 20         |
+------------+------------+
Source

pub fn last(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and return the last value per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).last()
}

Returns:

+------------+------------+
| date       | temp_last |
| ---        | ---        |
| Date       | i32        |
+============+============+
| 2020-08-23 | 9          |
+------------+------------+
| 2020-08-22 | 1          |
+------------+------------+
| 2020-08-21 | 10         |
+------------+------------+
Source

pub fn n_unique(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series by counting the number of unique values.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).n_unique()
}

Returns:

+------------+---------------+
| date       | temp_n_unique |
| ---        | ---           |
| Date       | u32           |
+============+===============+
| 2020-08-23 | 1             |
+------------+---------------+
| 2020-08-22 | 2             |
+------------+---------------+
| 2020-08-21 | 2             |
+------------+---------------+
Source

pub fn quantile( &self, quantile: f64, method: QuantileMethod, ) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and determine the quantile per group.

§Example

fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).quantile(0.2, QuantileMethod::default())
}
Source

pub fn median(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and determine the median per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).median()
}
Source

pub fn var(&self, ddof: u8) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and determine the variance per group.

Source

pub fn std(&self, ddof: u8) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate grouped Series and determine the standard deviation per group.

Source

pub fn count(&self) -> PolarsResult<DataFrame>

Aggregate grouped series and compute the number of values per group.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.select(["temp"]).count()
}

Returns:

+------------+------------+
| date       | temp_count |
| ---        | ---        |
| Date       | u32        |
+============+============+
| 2020-08-23 | 1          |
+------------+------------+
| 2020-08-22 | 2          |
+------------+------------+
| 2020-08-21 | 2          |
+------------+------------+
Source

pub fn groups(&self) -> PolarsResult<DataFrame>

Get the group_by group indexes.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["date"])?.groups()
}

Returns:

+--------------+------------+
| date         | groups     |
| ---          | ---        |
| Date(days)   | list [u32] |
+==============+============+
| 2020-08-23   | "[3]"      |
+--------------+------------+
| 2020-08-22   | "[2, 4]"   |
+--------------+------------+
| 2020-08-21   | "[0, 1]"   |
+--------------+------------+
Source

pub fn agg_list(&self) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Aggregate the groups of the group_by operation into lists.

§Example
fn example(df: DataFrame) -> PolarsResult<DataFrame> {
    // GroupBy and aggregate to Lists
    df.group_by(["date"])?.select(["temp"]).agg_list()
}

Returns:

+------------+------------------------+
| date       | temp_agg_list          |
| ---        | ---                    |
| Date       | list [i32]             |
+============+========================+
| 2020-08-23 | "[Some(9)]"            |
+------------+------------------------+
| 2020-08-22 | "[Some(7), Some(1)]"   |
+------------+------------------------+
| 2020-08-21 | "[Some(20), Some(10)]" |
+------------+------------------------+
Source

pub fn par_apply<F>(&self, f: F) -> PolarsResult<DataFrame>

👎Deprecated since 0.24.1: use polars.lazy aggregations

Apply a closure over the groups as a new DataFrame in parallel.

Source

pub fn apply<F>(&self, f: F) -> PolarsResult<DataFrame>

Apply a closure over the groups as a new DataFrame.

Source

pub fn sliced(self, slice: Option<(i64, usize)>) -> Self

Trait Implementations§

Source§

impl<'df> Clone for GroupBy<'df>

Source§

fn clone(&self) -> GroupBy<'df>

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<'df> Debug for GroupBy<'df>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<'df> Freeze for GroupBy<'df>

§

impl<'df> !RefUnwindSafe for GroupBy<'df>

§

impl<'df> Send for GroupBy<'df>

§

impl<'df> Sync for GroupBy<'df>

§

impl<'df> Unpin for GroupBy<'df>

§

impl<'df> !UnwindSafe for GroupBy<'df>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dst: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
Source§

impl<T> DynClone for T
where T: Clone,

Source§

fn __clone_box(&self, _: Private) -> *mut ()

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize = _

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V