别再死记硬背了!用Python手把手带你生成LR(0)和SLR(1)分析表(附完整代码)
2026/6/11 10:52:41 网站建设 项目流程

用Python实战解析LR(0)与SLR(1)分析表生成:从理论到代码的完整指南

编译原理中的语法分析阶段常让学习者望而生畏,尤其是LR分析表的构造过程。传统教材中抽象的数学描述和手工计算步骤,往往掩盖了算法本质的优雅性。本文将用Python代码还原LR(0)和SLR(1)分析表的构建全过程,通过可运行的代码实现和可视化输出,带您穿透理论迷雾。无论您是正在啃《龙书》的计算机系学生,还是希望深入理解语法分析原理的开发者,这篇实战指南都将提供一条"通过代码理解理论"的捷径。

1. 基础概念与文法表示

1.1 上下文无关文法的Python建模

在实现分析表前,我们需要用合适的数据结构表示文法。以下是一个包含拓广文法的类设计:

class Grammar: def __init__(self): self.productions = [] # 产生式列表 self.non_terminals = set() # 非终结符集合 self.terminals = set() # 终结符集合 self.start_symbol = None # 开始符号 def add_production(self, head, body): """添加产生式""" self.productions.append((head, body)) self.non_terminals.add(head) for symbol in body: if symbol.isupper() and symbol != "'": self.non_terminals.add(symbol) else: self.terminals.add(symbol)

示例文法初始化(以简单算术表达式为例):

grammar = Grammar() grammar.start_symbol = "E'" grammar.add_production("E'", ['E']) grammar.add_production('E', ['E', '+', 'T']) grammar.add_production('E', ['T']) grammar.add_production('T', ['T', '*', 'F']) grammar.add_production('T', ['F']) grammar.add_production('F', ['(', 'E', ')']) grammar.add_production('F', ['id'])

1.2 项目(Item)与项目集的表示

LR分析的核心是"项目"——在产生式右部某处加点的特殊标记。我们用Python类来表示:

class Item: def __init__(self, production, dot_pos=0): self.production = production # (head, body)元组 self.dot_pos = dot_pos # 点的位置 def next_symbol(self): """返回点后面的符号""" if self.dot_pos < len(self.production[1]): return self.production[1][self.dot_pos] return None def is_reduce_item(self): """判断是否为归约项目""" return self.dot_pos == len(self.production[1])

2. LR(0)分析表的完整实现

2.1 构建项目集规范族(Canonical Collection)

项目集规范族是通过闭包(closure)和转移(goto)操作构建的有限状态机:

def closure(items, grammar): """计算项目集的闭包""" closure_set = set(items) changed = True while changed: changed = False for item in list(closure_set): next_sym = item.next_symbol() if next_sym in grammar.non_terminals: for prod in grammar.productions: if prod[0] == next_sym: new_item = Item(prod, 0) if new_item not in closure_set: closure_set.add(new_item) changed = True return frozenset(closure_set) def goto(items, symbol, grammar): """计算转移函数""" new_items = set() for item in items: if item.next_symbol() == symbol: new_items.add(Item(item.production, item.dot_pos + 1)) return closure(new_items, grammar)

2.2 可视化项目集规范族

通过graphviz库可以直观展示状态转移图(需安装graphviz包):

from graphviz import Digraph def visualize_collection(cc, grammar, filename='lr0_collection'): dot = Digraph(comment='LR(0) Collection') for i, state in enumerate(cc): label = f"I{i}\n" + "\n".join( f"{p[0]} → {' '.join(p[1][:d])}•{' '.join(p[1][d:])}" for p, d in [(item.production, item.dot_pos) for item in state] ) dot.node(f'I{i}', label=label) transitions = set() for i, state in enumerate(cc): symbols = {item.next_symbol() for item in state if item.next_symbol()} for sym in symbols: j = cc.index(goto(state, sym, grammar)) transitions.add((i, sym, j)) for src, sym, dst in transitions: dot.edge(f'I{src}', f'I{dst}', label=str(sym)) dot.render(filename, view=True)

2.3 生成LR(0)分析表

将项目集规范族转换为分析表:

def build_lr0_table(cc, grammar): action_table = {} goto_table = {} for i, state in enumerate(cc): # 处理移进和转移 for item in state: sym = item.next_symbol() if sym: j = cc.index(goto(state, sym, grammar)) if sym in grammar.terminals: action_table[(i, sym)] = ('shift', j) else: goto_table[(i, sym)] = j # 处理归约 for item in state: if item.is_reduce_item(): if item.production[0] == grammar.start_symbol: action_table[(i, '$')] = ('accept',) else: for t in grammar.terminals | {'$'}: action_table[(i, t)] = ('reduce', item.production) return action_table, goto_table

3. SLR(1)分析表的进阶实现

3.1 FOLLOW集的计算

SLR(1)与LR(0)的关键区别在于使用FOLLOW集解决冲突:

def compute_follow_sets(grammar): follow = {nt: set() for nt in grammar.non_terminals} follow[grammar.start_symbol].add('$') changed = True while changed: changed = False for head, body in grammar.productions: for i, symbol in enumerate(body): if symbol in grammar.non_terminals: # 规则1:A → αBβ,将FIRST(β)-ε加入FOLLOW(B) first_beta = compute_first(body[i+1:], grammar) old_len = len(follow[symbol]) follow[symbol].update(first_beta - {None}) if None in first_beta: follow[symbol].update(follow[head]) if len(follow[symbol]) > old_len: changed = True return follow def compute_first(symbols, grammar): """计算符号串的FIRST集""" first = set() for symbol in symbols: if symbol in grammar.terminals: first.add(symbol) break else: # 非终结符 first_of_sym = set() for prod in grammar.productions: if prod[0] == symbol: if prod[1][0] in grammar.terminals: first_of_sym.add(prod[1][0]) elif prod[1][0] in grammar.non_terminals: first_of_sym.update(compute_first(prod[1], grammar) - {None}) first.update(first_of_sym - {None}) if None not in first_of_sym: break else: first.add(None) # 所有符号都能推导出ε return first

3.2 冲突检测与解决

SLR(1)通过检查FOLLOW集交集来解决冲突:

def is_slr1(cc, grammar, follow_sets): for state in cc: shift_reduce_conflict = False reduce_reduce_conflict = False # 检查移进-归约冲突 shift_terminals = set() reduce_terminals = set() for item in state: sym = item.next_symbol() if sym in grammar.terminals: shift_terminals.add(sym) elif item.is_reduce_item() and item.production[0] != grammar.start_symbol: reduce_terminals.update(follow_sets[item.production[0]]) if shift_terminals & reduce_terminals: print(f"状态I{cc.index(state)}存在移进-归约冲突") return False # 检查归约-归约冲突 reduce_items = [item for item in state if item.is_reduce_item() and item.production[0] != grammar.start_symbol] for i in range(len(reduce_items)): for j in range(i+1, len(reduce_items)): if (follow_sets[reduce_items[i].production[0]] & follow_sets[reduce_items[j].production[0]]): print(f"状态I{cc.index(state)}存在归约-归约冲突") return False return True

3.3 构建SLR(1)分析表

在LR(0)基础上加入FOLLOW集约束:

def build_slr1_table(cc, grammar, follow_sets): action_table = {} goto_table = {} for i, state in enumerate(cc): # 处理移进和转移 for item in state: sym = item.next_symbol() if sym: j = cc.index(goto(state, sym, grammar)) if sym in grammar.terminals: action_table[(i, sym)] = ('shift', j) else: goto_table[(i, sym)] = j # 处理归约(仅对FOLLOW集中的终结符) for item in state: if item.is_reduce_item(): if item.production[0] == grammar.start_symbol: action_table[(i, '$')] = ('accept',) else: for t in follow_sets[item.production[0]]: if (i, t) in action_table: print(f"冲突:状态{i}在符号{t}上已有动作") action_table[(i, t)] = ('reduce', item.production) return action_table, goto_table

4. 实际应用与调试技巧

4.1 测试案例分析

让我们用以下文法测试我们的实现:

test_grammar = Grammar() test_grammar.start_symbol = "S'" test_grammar.add_production("S'", ['S']) test_grammar.add_production('S', ['L', '=', 'R']) test_grammar.add_production('S', ['R']) test_grammar.add_production('L', ['*', 'R']) test_grammar.add_production('L', ['id']) test_grammar.add_production('R', ['L']) # 计算项目集规范族 initial_item = Item(("S'", ['S']), 0) cc = [closure({initial_item}, test_grammar)] changed = True while changed: changed = False for state in list(cc): symbols = {item.next_symbol() for item in state if item.next_symbol()} for sym in symbols: new_state = goto(state, sym, test_grammar) if new_state not in cc: cc.append(new_state) changed = True # 可视化 visualize_collection(cc, test_grammar, 'test_grammar_lr0') # 构建SLR(1)表 follow = compute_follow_sets(test_grammar) slr_action, slr_goto = build_slr1_table(cc, test_grammar, follow)

4.2 常见问题排查

当实现LR分析器时,可能会遇到以下典型问题:

  1. 无限循环的闭包计算

    • 检查文法是否包含左递归产生式
    • 确保在闭包计算时正确识别非终结符
  2. 分析表存在冲突

    • 使用is_slr1函数检测冲突类型
    • 对于移进-归约冲突,检查FOLLOW集是否确实不相交
    • 考虑升级到更强大的LR(1)或LALR(1)分析器
  3. 不正确的FIRST/FOLLOW集

    • 验证ε产生式的处理是否正确
    • 检查符号串FIRST集的计算是否考虑了所有情况
def debug_first_follow(grammar): print("FIRST集:") for nt in grammar.non_terminals: prods = [prod[1] for prod in grammar.productions if prod[0] == nt] first = set() for prod in prods: first.update(compute_first(prod, grammar)) print(f"{nt}: {first}") print("\nFOLLOW集:") follow = compute_follow_sets(grammar) for nt, fol in follow.items(): print(f"{nt}: {fol}")

4.3 性能优化建议

对于大型文法,可以考虑以下优化:

  1. 项目集哈希

    • 使用frozenset存储项目集
    • 实现高效的哈希比较,避免重复计算
  2. 增量式计算

    • 缓存FIRST和FOLLOW集计算结果
    • 只在必要时重新计算变更部分
  3. 表压缩技术

    • 使用稀疏矩阵存储分析表
    • 合并相同行/列减少存储空间

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询