Linux irq_set_affinity中断亲和力设置与chip_irq_set
中断亲和力(IRQ Affinity)是Linux内核将特定中断绑定到指定CPU核心执行的机制。通过设置irq_desc的affinity掩码,可以控制中断路由到哪些CPU,实现负载均衡和性能优化。核心接口是irq_set_affinity,定义在kernel/irq/manage.c。
irq_set_affinity函数的实现:
```c
int irq_set_affinity(unsigned int irq, const struct cpumask *cpumask)
{
struct irq_desc *desc = irq_to_desc(irq);
unsigned long flags;
int ret;
if (!desc)
return -EINVAL;
raw_spin_lock_irqsave(&desc->lock, flags);
ret = __irq_set_affinity_locked(irq_desc_get_irq_data(desc), cpumask, false);
raw_spin_unlock_irqrestore(&desc->lock, flags);
return ret;
}
```
__irq_set_affinity_locked是实际执行设置的核心函数:
```c
int __irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, bool force)
{
struct irq_chip *chip = irq_data_get_irq_chip(data);
struct irq_desc *desc = irq_data_to_desc(data);
const struct cpumask *prog_mask;
int ret = -EINVAL;
if (!chip || !chip->irq_set_affinity)
goto out;
if (!force && !irq_can_set_affinity(data))
goto out;
prog_mask = irq_data_get_affinity_mask(data);
if (cpumask_equal(mask, prog_mask) &&
cpumask_subset(prog_mask, cpu_online_mask))
return 0;
ret = chip->irq_set_affinity(data, mask, force);
if (!ret) {
cpumask_copy(prog_mask, mask);
irq_data_update_effective_affinity(data, mask);
}
out:
return ret;
}
```
函数首先验证芯片支持irq_set_affinity回调且中断允许设置亲和力(IRQ_NO_BALANCING标志未设置)。如果新掩码与当前掩码相同且在线CPU子集未变化,则直接返回0避免冗余的硬件操作。实际设置由chip->irq_set_affinity完成,成功后将prog_mask复制为最新掩码。
对于x86 IO-APIC,irq_set_affinity对应ioapic_set_affinity:
```c
static int ioapic_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force)
{
struct irq_cfg *cfg = data->chip_data;
unsigned int dest;
unsigned int vector;
int cpu;
cpu = cpumask_first_and(mask, cpu_online_mask);
if (cpu >= nr_cpu_ids)
return -EINVAL;
dest = apic->calc_dest(cpu);
vector = cfg->vector;
raw_spin_lock(&ioapic_lock);
__ioapic_write_entry(cfg->irq_2_pin, vector, dest, data->irq);
raw_spin_unlock(&ioapic_lock);
cpumask_copy(irq_data_get_effective_affinity_mask(data), cpumask_of(cpu));
return 0;
}
```
该实现首先从mask中选取第一个在线CPU,通过calc_dest计算APIC destination ID,将中断重定向到目标CPU的vector。IOAPIC的RTE(Redirection Table Entry)中包含dest字段和vector字段,硬件在中断产生时根据dest将中断发送到指定CPU的Local APIC。
用户空间的亲和力设置通过/proc/irq/{irq}/smp_affinity文件暴露:
```c
static ssize_t smp_affinity_write(struct file *file, const char __user *buf,
size_t count, loff_t *pos)
{
struct irq_desc *desc = file_inode(file)->i_private;
cpumask_var_t new_value;
int err;
if (!zalloc_cpumask_var(&new_value, GFP_KERNEL))
return -ENOMEM;
err = cpumask_parse_user(buf, count, new_value);
if (err)
goto out;
err = irq_set_affinity(desc->irq_data.irq, new_value);
if (!err)
err = count;
out:
free_cpumask_var(new_value);
return err;
}
```
irqbalance守护进程周期性扫描/proc/interrupts统计各CPU中断计数,动态调整smp_affinity。它计算每个中断在各CPU上的分布偏差,若偏差超过阈值,则调用irq_set_affinity将中断迁移到负载更低的CPU。默认策略是查找具有最低中断数的CPU核心。
effective_affinity与requested_affinity的区别值得注意。requested_affinity是用户或内核请求的CPU掩码,而effective_affinity是硬件实际路由的目标CPU。对于X86 MSI和MSI-X中断,一个中断请求只能路由到一个CPU,但用户请求可能包含多个CPU位,此时内核选取第一个在线CPU写入硬件,effective_affinity记录该单一CPU,requested_affinity保留原始多CPU掩码。
中断迁移(irq_migrate_all_off_this_cpu)在CPU热拔插时被调用,将即将离线的CPU上的所有中断迁移到其他在线CPU:
```c
void irq_migrate_all_off_this_cpu(void)
{
unsigned int irq;
struct irq_desc *desc;
for_each_irq_desc(irq, desc) {
bool affinity_broken = false;
const struct cpumask *affinity;
affinity = irq_data_get_affinity_mask(&desc->irq_data);
if (!cpumask_test_cpu(smp_processor_id(), affinity))
continue;
cpumask_clear_cpu(smp_processor_id(), affinity);
if (cpumask_empty(affinity))
cpumask_setall(affinity);
__irq_set_affinity_locked(&desc->irq_data, affinity, true);
}
}
```
该函数在CPU_DOWN_PREPARE阶段执行,确保所有中断被重新分配,避免离线CPU无法处理中断导致中断丢失。
调度域层面的中断负载均衡由内核线程irqbalance或内核参数irqaffinity控制。内核启动参数irqaffinity=0-3允许在启动阶段指定默认亲和掩码,通过init_irq_default_affinity设置到desc->irq_data.affinity。numa_node字段辅助将中断分配给与设备所在NUMA节点相同的CPU,减少跨节点内存访问延迟。
Linux irq_set_affinity中断亲和力设置与chip_irq_set