HashSet vs LinkedHashSet: Ordering Without Sacrificing Speed
-
Last Updated: July 4, 2026
-
By: javahandson
-
Series
Learn Java in a easy way
HashSet vs LinkedHashSet compared — insertion order, memory overhead, and why LinkedHashSet often iterates faster thanks to cache locality.
In the previous article, HashSet in Java was introduced as the default, go-to Set implementation — fast, unordered, and backed by a HashMap. But what happens when you still want that same speed, and you also want the elements to come back out in the order you put them in? That is exactly the gap LinkedHashSet fills. This article puts HashSet and LinkedHashSet side by side, explains the small but clever data structure trick LinkedHashSet uses to remember order, and digs into a detail most beginner tutorials skip entirely: why iterating a HashSet is slower than iterating a LinkedHashSet, even though both claim O(1) performance.
Before comparing the two, it helps to remember what a Set promises in the first place: no duplicate elements, ever. Every Set implementation in Java honours this rule, and HashSet is the simplest way to get it. As covered in Set Interface in Java: Core Concepts and HashSet, a HashSet is internally just a HashMap in disguise — every element you add becomes a key in that map, and since map keys are unique, uniqueness comes for free.
The one thing HashSet does not promise is order. When you loop over a HashSet, the elements can appear in almost any sequence, and that sequence depends on how the elements hash into buckets, not on the order you inserted them. For plenty of use cases, that is perfectly fine. But sometimes it is not, and that is where LinkedHashSet steps in.
It also helps to remember that both equals() and hashCode() still matter just as much here as they did for plain HashSet. Neither HashSet nor LinkedHashSet changes how duplicates are detected — they only change what order you get the elements back in once uniqueness has already been decided. Keep that distinction in mind as we compare the two implementations, because it is easy to assume that adding an insertion-order guarantee somehow also changes the rules for equality, and it does not.
LinkedHashSet is a direct subclass of HashSet. It inherits everything HashSet can do, and then adds exactly one feature on top: it remembers the order in which elements were inserted, and returns them in that same order every time you iterate.
The trick behind LinkedHashSet is simple once you see it. Internally, LinkedHashSet is backed by a LinkedHashMap, which itself is a HashMap with one small addition: every entry in the map also holds two extra pointers, one to the entry inserted before it and one to the entry inserted after it. This turns the entries into a doubly linked list that runs alongside the normal hash table.
So think of it as two structures layered on top of each other. The hash table is still there, doing the same job it always did — deciding which bucket an element belongs in, so that add, remove, and contains stay fast. The linked list is a separate, thin layer that cares only about one thing: the order in which the elements were inserted. When you iterate over a LinkedHashSet, Java does not touch the hash table at all. It simply walks the linked list from the head node to the tail node, and hands you elements in exactly the order they were added.
// Simplified view of what a LinkedHashSet entry looks like internally
class Entry<K> {
K key;
int hash;
Entry<K> next; // next entry in the SAME hash bucket (collision chain)
Entry<K> before; // previous entry in INSERTION order
Entry<K> after; // next entry in INSERTION order
}Notice there are two completely different kinds of “next” pointers here. The next field is about hash collisions within a bucket — it has nothing to do with order. The before and after fields are the insertion-order chain, and they are what makes LinkedHashSet special.
Because of this linked list overlay, LinkedHashSet gives you a guarantee that plain HashSet cannot: elements come out of the iterator in the exact order they were inserted, every single time, on every JVM, on every run. This is not a coincidence or an implementation detail that might change — it is part of the documented contract of LinkedHashSet.
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.Set;
public class OrderingDemo {
public static void main(String[] args) {
Set<String> hash = new HashSet<>();
Set<String> linked = new LinkedHashSet<>();
for (String s : new String[]{"delta", "alpha", "charlie", "bravo"}) {
hash.add(s);
linked.add(s);
}
System.out.println(hash); // order NOT guaranteed, may vary
System.out.println(linked); // [delta, alpha, charlie, bravo]
}
}Run this on any machine, and the LinkedHashSet line will always print in insertion order. The HashSet line might happen to look ordered, too, in small examples, but that is a coincidence, not a guarantee — a point worth repeating because it trips up a lot of beginners.
This guarantee also holds up even after removals and re-insertions. If you remove an element from a LinkedHashSet and add it back later, it is treated as a brand new insertion and moves to the end of the iteration order, exactly where you would expect a freshly added item to sit. The linked list is always kept in sync with whatever the current insertion history actually is, not with some fixed snapshot taken at the very first insert.
Since LinkedHashSet inherits from HashSet, it inherits HashSet’s null handling too. You can insert exactly one null element. Java gives null a fixed hash code, so it lands in a predictable bucket, and a second attempt to insert null is simply treated as a duplicate and ignored, just like any other repeated value.
Adding the linked list does cost a small amount of extra memory — two pointers per element — but it does not change the algorithmic speed of add, remove, or contains. All three remain O(1) on average, exactly the same as HashSet, because the hash table underneath is doing the same work it always did. The linked list is purely additional bookkeeping for iteration order, not a replacement for the hash table.
With the internals out of the way, the comparison becomes straightforward. Both are hash-based, both give you uniqueness, and both are fast. The differences come down to three things: ordering, memory, and a subtler point on iteration speed that gets its own section below.
HashSet gives no ordering guarantee at all. LinkedHashSet guarantees insertion order, always. If your code depends on seeing elements in a specific sequence, HashSet is simply the wrong tool, no matter how consistent it looks in your testing.
Every element stored in a LinkedHashSet carries two extra references — one for the entry before it and one for the entry after it — on top of what a HashSet entry already needs. For most applications, this overhead is small and easy to ignore, but if you are storing tens of millions of elements and memory is tight, it is worth knowing that LinkedHashSet is not entirely free.
Both structures offer average O(1) time for add, remove, and contains, because both rely on the same underlying hashing mechanism. The gap between them is not in these operations — it shows up specifically during iteration, which is important enough to deserve its own dedicated section next.
| Feature | HashSet | LinkedHashSet |
|---|---|---|
| Ordering | None (unpredictable) | Insertion order (guaranteed) |
| Backing structure | HashMap | LinkedHashMap (HashMap + doubly linked list) |
| add/remove/contains | O(1) average | O(1) average |
| Allows null | Yes (one) | Yes (one) |
| Memory per element | Lower | Slightly higher (two extra pointers) |
| Iteration speed | Slower in practice | Faster in practice |
On paper, the two look almost identical. The row that most tutorials leave out — iteration speed — is where the real, practical difference lives, and that is the focus of the next section.
It helps to think of HashSet and LinkedHashSet as the same engine with a different dashboard bolted on top. The engine — the hash table that decides buckets, handles collisions, and resizes when needed — is identical in both. LinkedHashSet does not replace any of that machinery; it simply adds a thin, extra layer of bookkeeping that remembers the order in which elements arrived. This is why the two share the same average-case complexity for add, remove, and contains: you are asking the same engine the same question either way.
The practical takeaway is that choosing LinkedHashSet over HashSet is rarely a performance sacrifice in any way that shows up in typical application code. The extra memory per element is small and fixed, and it does not grow worse as your data grows. So the decision usually comes down entirely to whether your code cares about order, not whether you can afford LinkedHashSet.
Here is a fact that surprises a lot of developers: even though both HashSet and LinkedHashSet offer O(1) average time for individual add and contains calls, iterating over a HashSet is generally slower than iterating over a LinkedHashSet of the same size. Big-O notation does not capture this, because Big-O only counts the number of steps, not how expensive each step is on real hardware. To understand why, we need a short primer on how computer memory actually behaves.
Your computer’s CPU is extremely fast, but the main memory (RAM) it reads from is comparatively slow. To bridge that gap, CPUs keep a small, very fast memory area called the cache close to the processor. Whenever the CPU reads one piece of memory, it does not fetch just that one item — it pulls in a whole nearby chunk, called a cache line, betting that you will probably need the data right next to it very soon.
This bet pays off brilliantly when data is stored sequentially, one item right after another, like in an array. Reading item after item stays inside the same cache lines that are already loaded, so the CPU rarely has to go back to slow main memory. This is called good cache locality. When data is scattered randomly around memory, each read is likely to miss the cache and force a fresh, slow trip to RAM. This is poor cache locality, and it can make a loop many times slower than the raw instruction count would suggest.
A HashSet stores elements according to their hash code, spread across an internal array of buckets. The bucket an element lands in has nothing to do with when it was inserted — it depends purely on the hash value. When you iterate a HashSet, Java walks through the bucket array from the first slot to the last, returning whatever it finds in each one. The elements you get back are effectively in hash order, not insertion order, and not in any order related to how they sit in memory as objects.
The main, documented reason is simpler than cache behaviour: Oracle’s own HashSet documentation states that iterating a HashSet takes time proportional to its size (the number of elements) plus the capacity of its backing HashMap (the number of buckets). LinkedHashMap’s documentation states the opposite for LinkedHashSet’s backing structure: its iteration time is proportional to size alone, regardless of capacity. In plain terms, a HashSet’s iterator has to walk every slot in the bucket array from start to finish, including empty ones, to find the elements that are actually there. A LinkedHashSet’s iterator skips empty slots entirely and follows only the chain of actual elements.
This matters more than it sounds. If you create a HashSet expecting 1,000 elements but only ever add 100, the backing HashMap may still be sized with far more buckets than you are using, and every iteration pays the cost of stepping past all those unused buckets. A LinkedHashSet never pays this particular cost, because its linked list only ever contains real entries.
On top of the documented capacity cost, there is a secondary, less formal effect worth knowing for interviews: because bucket order has nothing to do with when an element was created, a HashSet’s iterator can end up jumping between unrelated, scattered memory locations as it walks the bucket array. Modern CPUs are considerably faster at reading memory in a predictable, sequential pattern than in a scattered one, because of the CPU cache behaviour introduced above. This tends to add a further, real-world slowdown on top of the documented capacity effect, though — unlike the capacity point — this part is a general hardware tendency rather than something spelled out in the Java documentation itself.
LinkedHashSet iteration ignores the hash buckets entirely and instead follows the doubly linked list described in section 2 — starting at the head node (the first element inserted) and following the after pointer node by node to the tail. Because this list only ever contains real elements, with no empty slots to step over, its iteration cost depends solely on how many elements are actually in the set, not on how many buckets the backing table has. This is the documented, size-only iteration cost mentioned above, and it is the main reason LinkedHashSet tends to iterate faster than a HashSet of the same size once the backing table has more buckets than elements in it.
On top of that documented advantage, walking a linked list node by node also tends to produce a more sequential, predictable traversal pattern than jumping between arbitrary hash buckets, which plays more nicely with CPU caching as described earlier. This part is a general tendency backed by how CPU caches behave rather than a documented guarantee, but combined with the size-only iteration cost, it is why real-world benchmarks usually show LinkedHashSet iterating at least as fast as, and often faster than, an equivalently sized HashSet.
Think of HashSet iteration like finding books on a library shelf that are organized by a code stamped on the spine rather than by when they arrived. To read every book in shelf order, you might have to walk from the far left of the building to the far right and back again, because a book that arrived first could easily have a code that places it near the end of the shelf. LinkedHashSet iteration is more like reading books that are stacked in a single pile in the order they arrived: you simply pick up the top book, then the next one directly beneath it, and so on, in a straight, predictable line. Both approaches eventually let you see every book, but one involves a lot more walking back and forth than the other.
For a set with a few dozen or a few hundred elements, the difference is too small to notice — both will feel instant. The gap becomes visible once you are iterating large sets repeatedly, for example, inside a hot loop that runs millions of times per second, or when processing large in-memory datasets. In those situations, if you do not specifically need sorted order or unordered speed on single lookups, and you are iterating frequently, LinkedHashSet’s more predictable iteration pattern can be a genuine, measurable win — on top of the insertion-order guarantee you get for free.
It is worth being honest about the limits of this claim, too. The exact size of the gap depends on JVM internals, object layout, garbage collector behaviour, and how full or sparse the hash table happens to be at the time, so it is not something you should promise a fixed percentage for without measuring it on your own workload. Treat it as a genuine, well-understood tendency backed by how CPU caches work, not as a guaranteed fixed speed-up you can quote in every situation.
Interview Insight: If asked why LinkedHashSet can iterate faster than HashSet, the strongest answer leads with the documented fact: Oracle’s own HashSet documentation states that iteration time is proportional to size plus capacity (bucket count), while LinkedHashMap’s documentation states its iteration time is proportional to size alone. You can mention CPU cache locality as a secondary, reinforcing factor, but the size-plus-capacity distinction is the one you can point to in the official Javadoc.
Knowing the internals is good, but the real question is: when should you actually reach for LinkedHashSet instead of plain HashSet? Three situations come up again and again in real code.
This is the single most common reason to choose LinkedHashSet. Imagine a form where users type comma-separated tags, and you want to remove duplicates but keep the tags in the order the user typed them, so the display still makes sense. A plain HashSet would deduplicate correctly, but the tags would come back out scrambled. A LinkedHashSet solves both problems in one line.
import java.util.LinkedHashSet;
import java.util.Set;
public class DedupPreservingOrder {
public static void main(String[] args) {
String input = "java, spring, java, hibernate, spring, jvm";
Set<String> uniqueTags = new LinkedHashSet<>();
for (String tag : input.split(",\s*")) {
uniqueTags.add(tag);
}
System.out.println(uniqueTags);
// [java, spring, hibernate, jvm] <-- duplicates gone, first-seen order kept
}
}When you write unit tests that print or compare the contents of a set, a plain HashSet can make the test output look different across runs or JVM versions, which is confusing and can even make tests flaky if you are not careful about how you compare results. Using a LinkedHashSet during development and debugging gives you the same uniqueness guarantee with output that stays consistent and easy to reason about, which makes tracking down bugs considerably less frustrating.
It is worth being precise here: LinkedHashSet itself does not offer an access-order mode — that feature belongs to its sibling class, LinkedHashMap. LinkedHashMap has a special constructor that, when enabled, reorders its internal linked list every time an entry is read or updated, not just when it is inserted. When combined with overriding a method, this is the classic textbook trick for building a simple Least Recently Used (LRU) cache in Java.
If you need Set-like unique membership with that same LRU behaviour, the usual approach is to wrap a LinkedHashMap in access-order mode and use its keys as your “set,” ignoring the values, since java.util.Collections and the Set interface itself do not directly expose an access-ordered Set implementation. It is a useful pattern to know, but it is a LinkedHashMap feature borrowed for a Set-like purpose, not something LinkedHashSet supports out of the box.
import java.util.LinkedHashMap;
import java.util.Map;
// LinkedHashMap's access-order constructor, the basis of a simple LRU cache.
// Note: this is a LinkedHashMap feature, not something LinkedHashSet exposes.
public class SimpleLruCache<K> extends LinkedHashMap<K, Boolean> {
private final int maxEntries;
public SimpleLruCache(int maxEntries) {
super(16, 0.75f, true); // true = access order, not insertion order
this.maxEntries = maxEntries;
}
@Override
protected boolean removeEldestEntry(Map.Entry<K, Boolean> eldest) {
return size() > maxEntries; // evict the least recently used entry
}
}Some applications load a set of configuration keys, feature flags, or processing steps from a file or database and need to apply them in the exact order in which they were defined, while still ensuring that no key appears more than once. A LinkedHashSet is a natural fit here: it rejects accidental duplicates the same way any Set would, but when you later iterate it to apply each setting in turn, you get back the exact sequence the file specified. Using a plain HashSet for this would technically still correctly reject duplicates, but the order in which settings are applied could shift unpredictably, which is a dangerous property for anything involving ordered configuration or audit-style processing.
It is easy to run a small example, see a HashSet print in a tidy-looking order, and conclude it preserves insertion order. This is coincidental, depends on the specific hash codes and bucket layout involved, and can change between JVM versions or as the set grows and resizes. If order matters even a little, say so explicitly by using LinkedHashSet, rather than relying on an accident of the current implementation.
Because LinkedHashSet is only marginally slower than HashSet for add and contains, some developers default to it everywhere out of caution. This is not wrong, exactly, but it does throw away a small amount of memory and a small constant-time cost for a guarantee you may never actually need. A better habit is to reach for HashSet by default, and switch to LinkedHashSet the moment you notice you actually depend on iteration order.
Switching between HashSet and LinkedHashSet does not change how duplicates are detected. Both still rely entirely on equals() and hashCode() to decide whether two elements are the same. If those methods are implemented inconsistently on your objects, both implementations will silently store logical duplicates, regardless of which one you pick.
Because LinkedHashSet feels more “organized” than HashSet, beginners sometimes expect it to also sort elements automatically. It does not. Insertion order simply means the order you happened to call add() in, which has nothing to do with alphabetical order, numeric order, or any other sorted arrangement. If you need elements sorted by value rather than by when they arrived, TreeSet is the implementation designed for that job, not LinkedHashSet.
LinkedHashSet is backed by a LinkedHashMap, which is a HashMap with an additional doubly linked list threading through all its entries to record insertion order. HashSet is backed by a plain HashMap with no such list, so it has no ordering guarantee.
Not in terms of Big-O complexity. Both remain average O(1) for these operations, because both rely on the same hash table mechanism underneath. LinkedHashSet pays a small constant-time cost to maintain its linked list pointers, but this does not change the algorithmic class of the operation.
The documented reason is that HashSet iteration takes time proportional to size plus the backing table’s capacity (its bucket count), so empty buckets still get walked, whereas LinkedHashSet’s iteration is proportional to size alone, since it follows a linked list of only the real entries. A secondary factor is that this sequential linked-list walk also tends to have better CPU cache behaviour than jumping between scattered hash buckets, though both are still described as O(n) for a full pass over n elements.
Yes, exactly one, inherited directly from HashSet’s behaviour. A second attempt to add null is treated like any other duplicate and is simply ignored.
No. Access-order mode is a feature of LinkedHashMap’s constructor, not of LinkedHashSet. To build an LRU-style cache, you typically use a LinkedHashMap in access-order mode, with removeEldestEntry overridden, and use its keys if you only need set-like membership.
It is the right choice when the insertion order specifically is what you need. If you need sorted order instead of insertion order, TreeSet is the correct choice, not LinkedHashSet, since LinkedHashSet has no concept of sorting elements by value.
No. Both implementations still rely entirely on equals() and hashCode() to determine uniqueness. Changing the implementation affects only the ordering and iteration behavior, not the rules for what counts as a duplicate.
Usually not, since the overhead is just two references per element, which is small and fixed. It only becomes worth worrying about at very large scales, for example, tens of millions of elements held in memory at once, where the cumulative extra pointers start to add up to a meaningful amount of heap space.
Not really. If you genuinely do not care about order and are not iterating frequently enough for cache behaviour to matter, plain HashSet is the simpler, slightly leaner choice. LinkedHashSet earns its keep specifically when order, predictable iteration, or order-preserving deduplication is part of the requirement.
HashSet and LinkedHashSet solve the same core problem — storing unique elements with fast average-case operations — but they diverge the moment you care about order. HashSet stays unordered and slightly leaner on memory. LinkedHashSet adds a doubly linked list on top of the same hash table to guarantee insertion order, at a small memory cost and with no change to the O(1) average complexity of add, remove, and contains. The detail most articles skip is that this same linked list also tends to make iteration faster in practice, thanks to better cache locality than HashSet’s hash-bucket traversal.
None of this changes how uniqueness itself is decided — equals() and hashCode() still do that job in both implementations. What changes is purely the order you get the elements back in and how efficiently your CPU can walk through them. As a simple rule: default to HashSet, and reach for LinkedHashSet the moment insertion order, predictable iteration, or a de-duplication task that needs to preserve first-seen order enters the picture. The next article in this series moves on to TreeSet, where ordering is taken a step further — not just remembered, but actively enforced through sorting.
Oracle Java Docs – Set Interface
Oracle Java Docs – HashMap (iteration cost details)
Oracle Java Docs – LinkedHashSet
Oracle Java Docs – LinkedHashMap