LKML-Evolution Dataset

A Large-Scale Dataset for Linux Kernel Patch Refinement
and Security Maintenance

Luyao Bai1, [Co-author Name]2
1University of Illinois Chicago (UIC)    2[Institution]

Dataset Overview

Linux kernel patches are rarely accepted in their initial form. Instead, developers typically submit an initial patch (v1), which is iteratively revised through maintainer and reviewer feedback before a final version is merged. This process of Patch Refinement is driven by complex constraints—including global kernel invariants, subsystem-specific semantic rules, and non-local design considerations—that are often difficult to infer from local code context alone.

To bridge the knowledge gap in kernel maintenance, we introduce the LKML-Evolution Dataset. We reconstructed the full lifecycle of over 440,000 patch series from the Linux Kernel Mailing List (LKML). Unlike existing datasets that focus on single commits, our dataset links initial submissions to their subsequent revisions (v2...vN) and final outcomes.

Crucially, the dataset includes rich metadata classifying patches into Security-Critical (crash/reliability fixes) and Non-Security categories, enabling targeted research into the "cost" and trajectory of fixing high-priority kernel bugs.

Statistics

447,463
Patch Series
593,114
Events (Threads)
23 GB
Text Data
2 Types
Security / General

Data Structure

The dataset is decoupled into two primary components: Series Graph (tracking version evolution) and Event Details (raw email content).

Series Data (Evolution Graph)

{
  "series_id": "lkml_2021_12_9_302",
  "subject": "ADD DM9051 ETHERNET DRIVER",
  "topic_type": "PATCH",
  "variants": {
    "v2": {
      "event_id": "lkml_2021_12_9_302",
      "message_count": 5
    },
    "v3": {
      "event_id": "lkml_2021_12_10_152",
      "message_count": 7
    },
    "v4": { ... }
  },
  "connections": [
    { "from": "v2", "to": "v3" },
    { "from": "v3", "to": "v4" }
  ]
}

Event Data (Content & Threads)

{
  "event_id": "lkml_2010_3_5_305",
  "root_url": "https://lkml.org/lkml/2010/3/5/305",
  "message_count": 6,
  "messages": [
    {
      "subject": "Re: [PATCH 1/1] integer overflow issue...",
      "content": "On Tue, Jun 19, 2018... I'd rather roll the patch back...",
      "saved_time": "2025-09-26 00:41:05"
    }
  ],
  "connections": [
    {
      "from": "https://lkml.org/lkml/2010/8/8/159",
      "to": "https://lkml.org/lkml/2010/8/8/169"
    }
  ]
}