Grouping in Java 8

  • Last Updated: May 7, 2024
  • By: javahandson
  • Series
img

Grouping in Java 8

In this article, we will understand what is Grouping in Java 8. We will also learn about different variants of grouping methods with examples.

 

What is Grouping

Grouping organizes similar data into groups based on one or more properties. Grouping is particularly powerful in combination with aggregate functions like SUM, AVG, COUNT, MAX, and MIN which perform calculations on grouped data, returning a single result per group.

Below are a couple of examples that will help to understand grouping in a better way.

Ex1. Count the number of employees in each department

Sample Data for Employees Table

EmployeeIDEmployeeNameDepartmentID
1ShwetaD1
2ReshmaD3
3ShrunaliniD2
4PoornimaD1
5ShubhangiD3
6SurajD1
7IqbalD2
8KartikD3
9AmitD3
10SuchitD2

Now we will group the Employees by DepartmentID and count the number of employees in each department. The result will be like below.

DepartmentIDNumberOfEmployess
D13
D23
D34

Ex2. Group the Students by Subject and get the maximum mark for each Subject.

Sample Data for Student Table

StudentIDNameSubjectMarks
1ShwetaMathematics95
2ReshmaEnglish90
3ShrunaliniPhysics78
4PoornimaMathematics85
5ShubhangiChemistry91
6SurajMathematics84
7IqbalEnglish75
8KartikPhysics82
9AmitMathematics65
10SuchitChemistry90

Now we will group the Students by subject and find the maximum marks for each subject. The result will be like below.

SubjectHighestMarks 
Mathematics95 
Physics82 
Chemistry91 
English90 

Java 8 has provided a set of groupingBy methods that will help us to do the grouping of data. These methods are part of the Collectors utility class, typically used with streams.

Variants of groupingBy methods

There are 3 variants of groupingBy methods.

  1. Using only classifier
  2. Using a classifier and downstream collector
  3. Using classifier, mapFactory, and downstream collector

We will learn about these methods with examples in the below section.

1. Using only classifier

Syntax:

static<T, K> Collector<T,?,Map<K, List<T>>> groupingBy(Function<? super T, ? extends K> classifier)

Method Signature Explained:

a. static<T, K>: This indicates that groupingBy method is a static method. <T, K> are generic type parameters where T is the type of the stream elements and K is the type of the keys in the resulting Map.

b. Collector<T, ?, Map<K, List<T>>>: This describes the return type of the method. The method returns a Collector that collects elements of type T into a Map<K, List>>.

  • Map<K, List<T>> is the resulting type where each key (K) in the map is associated with a list of items (List<T>) from the stream.
  • The second type parameter of Collector, represented as ? typically refers to an intermediate accumulation type used during the collection process, but it’s not specified here.

c. Function<? super T, ? extends K>: This is the only parameter of the method. This function is used to group elements of the stream into different groups. The function takes an element of the stream (of type T or any of its superclasses) and returns a key (of type K or any of its subclasses).

Write a program to group the Students by subject.

package com.javahands.collectors.grouping;
public class Student {

    String name;
    String subject;
    int marks;

    public Student(String name, String subject, int marks) {
        super();
        this.name = name;
        this.subject = subject;
        this.marks = marks;
    }

    public String getSubject() {
        return subject;
    }

    @Override
    public String toString() {
        return "Student [name=" + name + ", subject=" + subject + ", marks=" + marks + "]";
    }
}
package com.javahands.collectors.grouping;

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class GroupingEx {
    public static void main(String[] args) {

        List<Student> studentList = Arrays.asList(
                new Student("Shweta", "Mathematics", 95),
                new Student("Reshma", "English", 90),
                new Student("Shrunalini", "Physics", 78),
                new Student("Suraj", "Mathematics", 84),
                new Student("Iqbal", "English", 75),
                new Student("Kartik", "Physics", 82),
                new Student("Poornima", "Mathematics", 85),
                new Student("Amit", "Mathematics", 65));

        Map<String, List<Student>> groupBySubjects = studentList.stream()
                .collect(Collectors.groupingBy(Student::getSubject));

        System.out.println(groupBySubjects);
    }
}
Output: 
{English=[Student [name=Reshma, subject=English, marks=90], Student [name=Iqbal, subject=English, marks=75]], 
Mathematics=[Student [name=Shweta, subject=Mathematics, marks=95], Student [name=Suraj, subject=Mathematics, marks=84], Student [name=Poornima, subject=Mathematics, marks=85], Student [name=Amit, subject=Mathematics, marks=65]], 
Physics=[Student [name=Shrunalini, subject=Physics, marks=78], Student [name=Kartik, subject=Physics, marks=82]]}

In the above example, Student::getSubject is a classifier. groupingBy function takes a group of Students, it groups the students by subject. Here the output will be a map that contains the subject as the key and a list of students as its value.

2. Using a classifier and downstream collector

Syntax:

static <T, K, A, D> Collector<T, ?, Map<K,D>> groupingBy(Function<? super T, ? extends K> classifier, Collector<? super T, A, D> downstream)

This version of the groupingBy function is a more general and flexible form of the groupingBy collector that not only groups elements by a classifier but also allows for a further reduction of the grouped elements using another collector.

Method Signature Explained:

a. static <T, K, A, D>: This indicates that the method is static. <T, K, A, D> are generic type parameters:

  • T is the type of the stream element.
  • K is the type of the key in the resulting Map.
  • A is the accumulation type of the downstream collector.
  • D is the result type of the downstream collector.

b. Collector<T, ?, Map<K,D>>: This describes the return type of the method. The method returns a Collector that processes elements of type T and aggregates them into a Map<K,D>.

  • K represents the type of the keys in the resulting map.
  • D is the type of the values in the resulting map, determined by the operation of the downstream collector.

c. Function<? super T, ? extends K> classifier: This is a function that is used to determine the key for each element in the stream. It maps each element of the stream to a key of type K.

d. Collector<? super T, A, D> downstream: This parameter represents a downstream collector that is applied to the elements grouped under the same key.

It specifies how the classification results (grouping) are further reduced or transformed. The downstream collector operates on elements of type T (or its super types), accumulates them into a structure of type A, and finally produces a result of type D.

Write a program to count the number of employees in each department.

package com.javahands.collectors.grouping;

public class Employee {

    int id;
    String name;
    String department;

    public Employee(int id, String name, String department) {
        super();
        this.id = id;
        this.name = name;
        this.department = department;
    }

    public String getDepartment() {
        return department;
    }

    @Override
    public String toString() {
        return "Employee [id=" + id + ", name=" + name + ", department=" + department + "]";
    }
}
package com.javahands.collectors.grouping;

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class GroupingDemo {
    public static void main(String[] args) {

        List<Employee> listOfEmployees = Arrays.asList(
                new Employee(1, "Shweta", "D1"),
                new Employee(2, "Reshma", "D3"),
                new Employee(3, "Poornima", "D3"),
                new Employee(4, "Shubhangi", "D1"),
                new Employee(5, "Suraj", "D1"),
                new Employee(6, "Iqbal", "D2"),
                new Employee(7, "Amit", "D1"),
                new Employee(8, "Suchit", "D2"));

        Map<String, Long> countByDepartment = listOfEmployees.stream()
                .collect(Collectors.groupingBy(Employee::getDepartment, Collectors.counting()));

        System.out.println(countByDepartment);
    }
}
Output: {D1=4, D2=2, D3=2}

Write a program to group the Students by Subject and get the maximum mark for each Subject.

package com.javahands.collectors.grouping;

import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collectors;

public class GroupingEx {
    public static void main(String[] args) {

        List<Student> studentList = Arrays.asList(
                new Student("Shweta", "Mathematics", 95),
                new Student("Reshma", "English", 90),
                new Student("Shrunalini", "Physics", 78),
                new Student("Suraj", "Mathematics", 84),
                new Student("Iqbal", "English", 75),
                new Student("Kartik", "Physics", 82),
                new Student("Poornima", "Mathematics", 85),
                new Student("Amit", "Mathematics", 65));

        Map<String, Optional<Student>> groupBySubjects = studentList.stream()
                .collect(Collectors.groupingBy(Student::getSubject,
                        Collectors.maxBy(Comparator.comparing(Student::getMarks))));

        System.out.println(groupBySubjects);
    }
}
Output: {English=Optional[Student [name=Reshma, subject=English, marks=90]], 
         Mathematics=Optional[Student [name=Shweta, subject=Mathematics, marks=95]], 
	Physics=Optional[Student [name=Kartik, subject=Physics, marks=82]]}

3. Using classifier, mapFactory, and downstream collector

Syntax:

static <T, K, D, A, M extends Map<K, D>> Collector<T, ?, M> groupingBy(Function<? super T, ? extends K> classifier, Supplier<M> mapFactory, Collector<? super T, A, D> downstream)

This method allows us to group the elements of a stream by a specified classifier, process them with a specified downstream collector, and store the results in a map provided by a custom map factory.

Method Signature Explained:

a. <T, K, D, A, M extends Map<K, D>>: These are the generic type parameters used by this method:

  • T: The type of elements being processed by the stream.
  • K: The type of the keys in the resulting map.
  • D: The result type of the downstream collector.
  • A: The accumulation type used by the downstream collector.
  • M: The type of map that will be used to collect the results. It extends Map, meaning it could be any map implementation that can hold keys of type K and values of type D.

b. Function<? super T, ? extends K> classifier: This is the classifier function used to assign a key to each element in the stream. It maps each element T to a key K.

c. Supplier<M> mapFactory: mapFactory provides a custom map implementation that is used for storing the results. The supplier is a factory method that produces a new empty map instance that will be populated by the collector.

d. Collector<? super T, A, D> downstream: This is the downstream collector that accumulates the elements of the stream once they are grouped by the classifier. The downstream collector works on elements of type T, accumulates them into a structure of type A, and produces a result of type D.

Write a program to count the number of employees in each department and ensure the results are stored in a LinkedHashMap to preserve the order in which the departments appear first.

package com.javahands.collectors.grouping;

import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collector;
import java.util.stream.Collectors;

public class GroupingDemo {
    public static void main(String[] args) {

        List<Employee> listOfEmployees = Arrays.asList(
                new Employee(2, "Reshma", "D3"),
                new Employee(1, "Shweta", "D1"),
                new Employee(3, "Poornima", "D3"),
                new Employee(4, "Shubhangi", "D1"),
                new Employee(5, "Suraj", "D1"),
                new Employee(6, "Iqbal", "D2"),
                new Employee(7, "Amit", "D1"),
                new Employee(8, "Suchit", "D2"));

        Collector<Employee, ?, LinkedHashMap<String, Long>> collector = Collectors.groupingBy(
                Employee::getDepartment,
                LinkedHashMap::new,
                Collectors.counting()
        );

        LinkedHashMap<String, Long> employeesByDepartment = listOfEmployees.stream()
                .collect(collector);

        System.out.println(employeesByDepartment);
    }
}
Output: {D3=2, D1=4, D2=2}

In the above section, we have learned about the 3 types of groupingBy functions. Other than the above 3 types of groupingBy functions we also have a set of groupingByConcurrent functions, which are as follows.

Concurrent variants of grouping

1. Using only classifier

Syntax:

static<T, K> Collector<T,?,ConcurrentMap<K, List<T>>> groupingByConcurrent(Function<? super T, ? extends K> classifier)

2. Using a classifier and downstream collector

Syntax:

static <T, K, A, D> Collector<T, ?, ConcurrentMap<K,D>> groupingByConcurrent(Function<? super T, ? extends K> classifier, Collector<? super T, A, D> downstream)

3. Using classifier, mapFactory, and downstream collector

Syntax:

static <T, K, D, A, M extends ConcurrentMap<K, D>> Collector<T, ?, M> groupingByConcurrent(Function<? super T, ? extends K> classifier, Supplier<M> mapFactory, Collector<? super T, A, D> downstream)

groupingByConcurrent functions are almost the same as groupingBy functions but with few differences.

Differences between groupingBy and groupingByConcurrent functions

Usage Scenarios

groupingBy : This method is typically used with sequential streams and with a single-threaded environment. This can also be used with parallel streams without the need for optimized concurrent performance. It is a safe choice for any stream but may not be as efficient as groupingByConcurrent in truly parallel scenarios.

groupingByConcurrent : It is best used with parallel streams when thread safety and performance in a multithreaded environment are priorities.

Concurrency Handling

groupingBy: This method is designed for use in a single-threaded context within a sequential stream. When used in a parallel stream, the operation is still safe because it uses synchronization internally and it is thread-safe. However, it doesn’t specifically optimize for multithreaded environments because if multiple threads are executing parallel then one thread locks the resource and performs the operation, other threads have to keep on waiting till the first completes. So it is not that efficient.

groupingByConcurrent: As the name suggests, this method is specifically designed for concurrent execution in a parallel stream. It uses a concurrent map, such as ConcurrentHashMap, to accumulate results, allowing multiple threads to update the map concurrently without blocking. This can lead to performance improvements when processing large datasets in parallel.

Resulting Map Type

groupingBy: It uses standard HashMap to store the grouping results. We can also supply a custom map factory to use a different type of map, such as a TreeMap or LinkedHashMap.

groupingByConcurrent: It uses a ConcurrentHashMap by default to store the results, ensuring thread safety during updates. We can also provide a custom map factory if we want to use a different type of concurrent map.

Write a program to group the Students by subjects using groupingBy and groupingByConcurrent.

package com.javahands.collectors.grouping;

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class GroupingEx {
    public static void main(String[] args) {

        List<Student> studentList = Arrays.asList(
                new Student("Shweta", "Mathematics", 95),
                new Student("Reshma", "English", 90),
                new Student("Shrunalini", "Physics", 78),
                new Student("Suraj", "Mathematics", 84),
                new Student("Iqbal", "English", 75),
                new Student("Kartik", "Physics", 82),
                new Student("Poornima", "Mathematics", 85),
                new Student("Amit", "Mathematics", 65));

        Map<String, List<Student>> groupingBySubjects = studentList.stream()
                .collect(Collectors.groupingBy(Student::getSubject));

        System.out.println(groupingBySubjects);
        System.out.println("Resultant map type for groupingBy : "+ groupingBySubjects.getClass());

        Map<String, List<Student>> groupingByConcurrentSubjects = studentList.parallelStream()
                .collect(Collectors.groupingByConcurrent(Student::getSubject));

        System.out.println(groupingByConcurrentSubjects);
        System.out.println("Resultant map type for groupingByConcurrent : "+ groupingByConcurrentSubjects.getClass());
    }
}
Output:
 Grouping By : {English=[Student [name=Reshma, subject=English, marks=90], Student [name=Iqbal, subject=English, marks=75]], Mathematics=[Student [name=Shweta, subject=Mathematics, marks=95], Student [name=Suraj, subject=Mathematics, marks=84], Student [name=Poornima, subject=Mathematics, marks=85], Student [name=Amit, subject=Mathematics, marks=65]], Physics=[Student [name=Shrunalini, subject=Physics, marks=78], Student [name=Kartik, subject=Physics, marks=82]]}

Resultant map type for groupingBy : class java.util.HashMap

Grouping By Concurrent : {English=[Student [name=Iqbal, subject=English, marks=75], Student [name=Reshma, subject=English, marks=90]], Mathematics=[Student [name=Amit, subject=Mathematics, marks=65], Student [name=Poornima, subject=Mathematics, marks=85], Student [name=Suraj, subject=Mathematics, marks=84], Student [name=Shweta, subject=Mathematics, marks=95]], Physics=[Student [name=Kartik, subject=Physics, marks=82], Student [name=Shrunalini, subject=Physics, marks=78]]}

Resultant map type for groupingByConcurrent : class java.util.concurrent.ConcurrentHashMap

So this is all about Grouping in Java 8. If you have any questions on this topic, please raise them in the comments section. If you liked this article then please share this post with your friends and colleagues.

Leave a Comment