Python中如何为重复单词分配不同索引并精准定位其位置

碧海醫心 2025-12-30 00:00:00 次阅读

本文讲解如何在python中正确识别同一字符串中重复单词的多个出现位置，解决`list.index()`仅返回首次匹配索引导致无法区分同词多实例的问题，并提供基于`enumerate`和字典映射的高效解决方案。

在字符串处理中，一个常见误区是误用 list.index(value) 来定位重复元素——该方法永远只返回目标值第一次出现的索引，因此当对包含多个 "hello" 的列表反复调用 x.index("hello") 时，结果恒为 1（假设首次出现在索引1），导致 x.index(i) != x.index(j) 判断始终为 False，循环无输出。

根本原因在于：index() 是查找行为，而非位置绑定；它不感知当前遍历上下文，也无法反映“第几次出现”。要真正区分同词的不同实例，必须显式记录每个单词在原始序列中的实际位置。

推荐做法是使用 enumerate() 遍历分词结果，将单词作为键、其所有索引构成的列表作为值，构建位置映射字典：

s = "The hello hello substring string of this pan is amazing hello"
words = s.split()

# 构建 {word: [index1, index2, ...]} 字典
position_map = {}
for idx, word in enumerate(words):
    if word not in position_map:
        position_map[word] = []
    position_map[word].append(idx)

# 输出所有出现次数 ≥2 的单词及其全部索引（按题意格式）
for word, indices in position_map.items():
    if len(indices) > 1:
        # 格式化为 "hello,1,hello,2,hello,10"
        parts = [f"{word},{i}" for i in indices]
        print(",".join(parts))