In cases where there are many lists of items to group (think column-major
data), consider using group_indices() and apply_grouping()
instead.
Parameters:
item_list (NDArray) – The input array of items to group.
Extended typing NDArray[Any,VT]
groupid_list (NDArray) – Each item is an id corresponding to the item at the same position
in item_list. For the fastest runtime, the input array must be
numeric (ideally with integer types). This list must be
1-dimensional.
Extended typing NDArray[Any,KT]
assume_sorted (bool) – If the input array is sorted, then setting this to True will avoid
an unnecessary sorting operation and improve efficiency. Defaults
to False.
axis (int | None) – Group along a particular axis in items if it is n-dimensional.
Returns:
mapping from groupids to corresponding items.
Extended typing Dict[KT,NDArray[Any,VT]].
Find unique items and the indices at which they appear in an array.
A common use case of this function is when you have a list of objects
(often numeric but sometimes not) and an array of “group-ids” corresponding
to that list of objects.
Using this function will return a list of indices that can be used in
conjunction with apply_grouping() to group the elements. This is
most useful when you have many lists (think column-major data)
corresponding to the group-ids.
In cases where there is only one list of objects or knowing the indices
doesn’t matter, then consider using func:group_items instead.
Parameters:
idx_to_groupid (NDArray) – The input array, where each item is interpreted as a group id.
For the fastest runtime, the input array must be numeric (ideally
with integer types). If the type is non-numeric then the less
efficient ubelt.group_items() is used.
assume_sorted (bool) – If the input array is sorted, then setting this to True will avoid
an unnecessary sorting operation and improve efficiency.
Defaults to False.
Returns:
(keys, groupxs) -
keys (NDArray):
The unique elements of the input array in order
groupxs (List[NDArray]):
Corresponding list of indexes. The i-th item is an array
indicating the indices where the item key[i] appeared in
the input array.
>>> # xdoctest: +IGNORE_WHITESPACE>>> importkwarray>>> importubeltasub>>> # 2d arrays must be flattened before coming into this function so>>> # information is on the last axis>>> idx_to_groupid=np.array([[24],[129],[659],[659],[24],... [659],[659],[822],[659],[659],[24]]).T[0]>>> (keys,groupxs)=kwarray.group_indices(idx_to_groupid)>>> # Different versions of numpy may produce different orderings>>> # so normalize these to make test output consistent>>> #[gxs.sort() for gxs in groupxs]>>> print('keys = '+ub.urepr(keys,with_dtype=False))>>> print('groupxs = '+ub.urepr(groupxs,with_dtype=False))keys = np.array([ 24, 129, 659, 822])groupxs = [ np.array([ 0, 4, 10]), np.array([1]), np.array([2, 3, 5, 6, 8, 9]), np.array([7]),]
Returns lists of consecutive values. Implementation inspired by [3].
Parameters:
arr (NDArray) – array of ordered values
offset (float, default=1) – any two values separated by this offset are grouped. In the
default case, when offset=1, this groups increasing values like: 0,
1, 2. When offset is 0 it groups consecutive values thta are the
same, e.g.: 4, 4, 4.
Returns:
a list of arrays that are the groups from the input
Return type:
List[NDArray]
Note
This is equivalent (and faster) to using:
apply_grouping(data, group_consecutive_indices(data))