第9章：集合与数据结构

作为 Java 开发者，你已经熟悉了 Java 的集合框架。Python 也有丰富的集合类型和数据结构，本章将详细对比两种语言的集合与数据结构。

9.1 集合类型对比

Java 集合框架

List：有序集合，允许重复元素（如 ArrayList, LinkedList）
Set：无序集合，不允许重复元素（如 HashSet, TreeSet）
Map：键值对集合（如 HashMap, TreeMap）
Queue：队列（如 LinkedList, PriorityQueue）
Deque：双端队列（如 ArrayDeque）

Python 集合类型

list：有序集合，允许重复元素
tuple：有序、不可变集合，允许重复元素
set：无序集合，不允许重复元素
dict：键值对集合
collections 模块：提供额外的数据结构

对比：

Java 集合	Python 集合	特性
`ArrayList`	`list`	动态数组，支持随机访问
`LinkedList`	`list`	Python 的 list 是基于动态数组的，不是链表
`HashSet`	`set`	无序，不重复
`TreeSet`	`set` + 排序	Python 的 set 是无序的，需要手动排序
`HashMap`	`dict`	键值对，无序（Python 3.7+ 保持插入顺序）
`TreeMap`	`dict` + 排序	Python 的 dict 可以按键排序
`PriorityQueue`	`heapq` 模块	Python 使用 heapq 实现优先队列

9.2 列表（List）

Java ArrayList

ArrayList<String> list = new ArrayList<>();
list.add("apple");
list.add("banana");
list.add("cherry");

// 访问元素
String first = list.get(0);

// 修改元素
list.set(0, "orange");

// 删除元素
list.remove(1);

// 遍历
for (String fruit : list) {
    System.out.println(fruit);
}

Python list

# 创建列表
fruits = ["apple", "banana", "cherry"]

# 访问元素
first = fruits[0]

# 修改元素
fruits[0] = "orange"

# 删除元素
fruits.remove("banana")  # 按值删除
del fruits[0]            # 按索引删除

# 遍历
for fruit in fruits:
    print(fruit)

# 列表方法
fruits.append("date")       # 添加元素到末尾
fruits.insert(1, "grape")    # 插入元素
fruits.pop()                 # 移除并返回最后一个元素
fruits.pop(0)                # 移除并返回指定索引元素
fruits.sort()                # 排序
fruits.reverse()             # 反转
print(len(fruits))           # 长度
print("apple" in fruits)     # 检查元素是否存在

对比：

操作	Java	Python
创建	`new ArrayList<>()`	`[]` 或 `list()`
添加	`add()`	`append()`
插入	`add(index, element)`	`insert(index, element)`
删除	`remove(index)`	`del list[index]` 或 `list.pop(index)`
访问	`get(index)`	`list[index]`
修改	`set(index, element)`	`list[index] = element`
长度	`size()`	`len(list)`
遍历	增强 for 循环	`for element in list`
排序	`Collections.sort()`	`list.sort()`

9.3 集合（Set）

Java HashSet

HashSet<String> set = new HashSet<>();
set.add("apple");
set.add("banana");
set.add("cherry");
set.add("apple");  // 重复元素，不会添加

// 遍历
for (String fruit : set) {
    System.out.println(fruit);
}

// 检查元素
boolean contains = set.contains("apple");

// 删除元素
set.remove("banana");

// 大小
int size = set.size();

Python set

# 创建集合
fruits = {"apple", "banana", "cherry"}
fruits.add("apple")  # 重复元素，不会添加

# 遍历
for fruit in fruits:
    print(fruit)

# 检查元素
contains = "apple" in fruits

# 删除元素
fruits.remove("banana")  # 元素不存在会抛出异常
fruits.discard("orange")  # 元素不存在不会抛出异常

# 大小
print(len(fruits))

# 集合运算
a = {1, 2, 3}
b = {3, 4, 5}
print(a | b)  # 并集 {1, 2, 3, 4, 5}
print(a & b)  # 交集 {3}
print(a - b)  # 差集 {1, 2}
print(a ^ b)  # 对称差集 {1, 2, 4, 5}

对比：

操作	Java	Python
创建	`new HashSet<>()`	`{}` 或 `set()`
添加	`add()`	`add()`
删除	`remove()`	`remove()` 或 `discard()`
检查	`contains()`	`in` 操作符
大小	`size()`	`len(set)`
遍历	增强 for 循环	`for element in set`
并集	`addAll()`	`
交集	`retainAll()`	`&` 操作符或 `intersection()`
差集	`removeAll()`	`-` 操作符或 `difference()`

9.4 字典（Map）

Java HashMap

HashMap<String, Integer> map = new HashMap<>();
map.put("apple", 10);
map.put("banana", 20);
map.put("cherry", 30);

// 访问元素
Integer value = map.get("apple");

// 修改元素
map.put("apple", 15);  // 覆盖值

// 删除元素
map.remove("banana");

// 遍历键
for (String key : map.keySet()) {
    System.out.println(key);
}

// 遍历值
for (Integer val : map.values()) {
    System.out.println(val);
}

// 遍历键值对
for (Map.Entry<String, Integer> entry : map.entrySet()) {
    System.out.println(entry.getKey() + ": " + entry.getValue());
}

Python dict

# 创建字典
fruits = {"apple": 10, "banana": 20, "cherry": 30}

# 访问元素
value = fruits["apple"]
value = fruits.get("apple")  # 安全访问
value = fruits.get("orange", 0)  # 带默认值

# 修改元素
fruits["apple"] = 15  # 覆盖值

# 删除元素
del fruits["banana"]
value = fruits.pop("cherry")  # 移除并返回值

# 遍历键
for key in fruits:
    print(key)

# 遍历值
for value in fruits.values():
    print(value)

# 遍历键值对
for key, value in fruits.items():
    print(f"{key}: {value}")

# 字典方法
print(fruits.keys())     # 所有键
print(fruits.values())   # 所有值
print(fruits.items())    # 所有键值对
print(len(fruits))       # 长度
print("apple" in fruits)  # 检查键是否存在

# 字典推导式
squares = {x: x**2 for x in range(5)}
print(squares)  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

对比：

操作	Java	Python
创建	`new HashMap<>()`	`{}` 或 `dict()`
添加/修改	`put(key, value)`	`dict[key] = value`
获取	`get(key)`	`dict[key]` 或 `dict.get(key)`
删除	`remove(key)`	`del dict[key]` 或 `dict.pop(key)`
遍历键	`keySet()`	`for key in dict`
遍历值	`values()`	`for value in dict.values()`
遍历键值对	`entrySet()`	`for key, value in dict.items()`
大小	`size()`	`len(dict)`
检查键	`containsKey()`	`key in dict`

9.5 元组（Tuple）

Java 不可变列表

// 使用 Arrays.asList 创建不可变列表
List<String> immutableList = Arrays.asList("apple", "banana", "cherry");
// immutableList.add("date");  // 会抛出 UnsupportedOperationException

// 使用 Collections.unmodifiableList
List<String> mutableList = new ArrayList<>();
mutableList.add("apple");
mutableList.add("banana");
List<String> unmodifiableList = Collections.unmodifiableList(mutableList);

Python tuple

# 创建元组
fruits = ("apple", "banana", "cherry")
# fruits[0] = "orange"  # 会抛出 TypeError，元组不可变

# 访问元素
print(fruits[0])  # apple

# 元组方法
print(fruits.count("apple"))  # 计算元素出现次数
print(fruits.index("banana"))  # 查找元素位置

# 元组解包
a, b, c = fruits
print(a, b, c)  # apple banana cherry

# 单元素元组
single = (42,)

# 元组与列表转换
lst = list(fruits)
lst.append("date")
tup = tuple(lst)

对比：

特性	Java	Python
不可变列表	`Collections.unmodifiableList()`	`tuple`
创建	`Arrays.asList()`	`()`
访问	`get(index)`	`tuple[index]`
长度	`size()`	`len(tuple)`
遍历	增强 for 循环	`for element in tuple`
方法	有限	`count()`, `index()`

9.6 高级数据结构

Java 高级数据结构

PriorityQueue：优先队列
LinkedHashMap：保持插入顺序的 HashMap
TreeMap：有序映射
LinkedHashSet：保持插入顺序的 HashSet
ArrayDeque：双端队列

Python 高级数据结构（collections 模块）

Counter：计数工具
defaultdict：默认值字典
OrderedDict：有序字典（Python 3.7+ 中普通 dict 已有序）
deque：双端队列
namedtuple：命名元组
ChainMap：链式映射

Python collections 模块示例：

from collections import Counter, defaultdict, deque, namedtuple, OrderedDict, ChainMap

# Counter：计数
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
count = Counter(words)
print(count)  # Counter({'apple': 3, 'banana': 2, 'cherry': 1})
print(count.most_common(2))  # 最常见的2个元素

# defaultdict：默认值字典
d = defaultdict(int)  # 默认值为0
d["apple"] += 1
d["banana"] += 1
print(d)  # defaultdict(<class 'int'>, {'apple': 1, 'banana': 1})

# deque：双端队列
q = deque([1, 2, 3])
q.append(4)      # 右端添加
q.appendleft(0)  # 左端添加
q.pop()          # 右端移除
q.popleft()      # 左端移除
print(q)  # deque([1, 2, 3])

# namedtuple：命名元组
Person = namedtuple("Person", ["name", "age", "city"])
alice = Person("Alice", 25, "New York")
print(alice.name)  # Alice
print(alice.age)   # 25
print(alice)       # Person(name='Alice', age=25, city='New York')

# OrderedDict：有序字典
od = OrderedDict()
od["z"] = 1
od["a"] = 2
od["m"] = 3
print(list(od.keys()))  # ['z', 'a', 'm']

# ChainMap：链式映射
d1 = {"a": 1, "b": 2}
d2 = {"b": 3, "c": 4}
cm = ChainMap(d1, d2)
print(cm["a"])  # 1 (from d1)
print(cm["b"])  # 2 (from d1, not d2)
print(cm["c"])  # 4 (from d2)

9.7 堆（Heap）

Java 堆

PriorityQueue<Integer> minHeap = new PriorityQueue<>();
minHeap.add(5);
minHeap.add(3);
minHeap.add(8);
minHeap.add(1);

while (!minHeap.isEmpty()) {
    System.out.println(minHeap.poll());  // 1, 3, 5, 8
}

// 最大堆
PriorityQueue<Integer> maxHeap = new PriorityQueue<>((a, b) -> b - a);
maxHeap.add(5);
maxHeap.add(3);
maxHeap.add(8);
maxHeap.add(1);

while (!maxHeap.isEmpty()) {
    System.out.println(maxHeap.poll());  // 8, 5, 3, 1
}

Python 堆

import heapq

# 最小堆
min_heap = []
heapq.heappush(min_heap, 5)
heapq.heappush(min_heap, 3)
heapq.heappush(min_heap, 8)
heapq.heappush(min_heap, 1)

while min_heap:
    print(heapq.heappop(min_heap))  # 1, 3, 5, 8

# 最大堆（通过取负值实现）
max_heap = []
heapq.heappush(max_heap, -5)
heapq.heappush(max_heap, -3)
heapq.heappush(max_heap, -8)
heapq.heappush(max_heap, -1)

while max_heap:
    print(-heapq.heappop(max_heap))  # 8, 5, 3, 1

# 堆化
nums = [5, 3, 8, 1, 2, 9]
heapq.heapify(nums)
print(nums)  # [1, 2, 8, 5, 3, 9]

# 前 k 大的元素
print(heapq.nlargest(3, nums))  # [9, 8, 5]

# 前 k 小的元素
print(heapq.nsmallest(3, nums))  # [1, 2, 3]

对比：

操作	Java	Python
最小堆	`PriorityQueue<Integer>()`	`heapq` 模块
最大堆	`PriorityQueue<Integer>((a, b) -> b - a)`	通过取负值实现
添加元素	`add()`	`heapq.heappush()`
弹出元素	`poll()`	`heapq.heappop()`
堆化	无直接方法	`heapq.heapify()`
前 k 大元素	需手动实现	`heapq.nlargest()`
前 k 小元素	需手动实现	`heapq.nsmallest()`

9.8 练习

列表练习：编写程序，将列表中的元素去重并排序
集合练习：编写程序，找出两个列表的交集、并集和差集
字典练习：编写程序，统计字符串中每个字符出现的次数
元组练习：使用 namedtuple 创建一个 Point 类，实现坐标运算
高级数据结构练习：使用 deque 实现一个简单的队列
堆练习：使用 heapq 模块找出列表中第 k 大的元素

9.9 小结

Python 的集合类型比 Java 更简洁，使用更方便
Python 的 list 相当于 Java 的 ArrayList，但语法更简洁
Python 的 set 相当于 Java 的 HashSet，支持集合运算
Python 的 dict 相当于 Java 的 HashMap，Python 3.7+ 保持插入顺序
Python 的 tuple 相当于 Java 的不可变列表
Python 的 collections 模块提供了丰富的高级数据结构
Python 的 heapq 模块提供了堆操作，比 Java 的 PriorityQueue 更灵活
Python 的集合类型支持推导式，使代码更简洁

通过本章的学习，你已经了解了 Java 和 Python 在集合与数据结构上的主要区别。接下来，我们将学习 Python 的模块与包。

9.1 集合类型对比​

Java 集合框架​

Python 集合类型​

9.2 列表（List）​

Java ArrayList​

Python list​

9.3 集合（Set）​

Java HashSet​

Python set​

9.4 字典（Map）​

Java HashMap​

Python dict​

9.5 元组（Tuple）​

Java 不可变列表​

Python tuple​

9.6 高级数据结构​

Java 高级数据结构​

Python 高级数据结构（collections 模块）​

9.7 堆（Heap）​

Java 堆​

Python 堆​

9.8 练习​

9.9 小结​

9.1 集合类型对比

Java 集合框架

Python 集合类型

9.2 列表（List）

Java ArrayList

Python list

9.3 集合（Set）

Java HashSet

Python set

9.4 字典（Map）

Java HashMap

Python dict

9.5 元组（Tuple）

Java 不可变列表

Python tuple

9.6 高级数据结构

Java 高级数据结构

Python 高级数据结构（collections 模块）

9.7 堆（Heap）

Java 堆

Python 堆

9.8 练习

9.9 小结