LKML-Evolution Dataset
A Large-Scale Dataset for Linux Kernel Patch Refinement
and Security Maintenance
Dataset Overview
Linux kernel patches are rarely accepted in their initial form. Instead, developers typically submit an initial patch (v1), which is iteratively revised through maintainer and reviewer feedback before a final version is merged. This process of Patch Refinement is driven by complex constraints—including global kernel invariants, subsystem-specific semantic rules, and non-local design considerations—that are often difficult to infer from local code context alone.
To bridge the knowledge gap in kernel maintenance, we introduce the LKML-Evolution Dataset. We reconstructed the full lifecycle of over 440,000 patch series from the Linux Kernel Mailing List (LKML). Unlike existing datasets that focus on single commits, our dataset links initial submissions to their subsequent revisions (v2...vN) and final outcomes.
Crucially, the dataset includes rich metadata classifying patches into Security-Critical (crash/reliability fixes) and Non-Security categories, enabling targeted research into the "cost" and trajectory of fixing high-priority kernel bugs.
Statistics
Data Structure
The dataset is decoupled into two primary components: Series Graph (tracking version evolution) and Event Details (raw email content).
Series Data (Evolution Graph)
{
"series_id": "lkml_2021_12_9_302",
"subject": "ADD DM9051 ETHERNET DRIVER",
"topic_type": "PATCH",
"variants": {
"v2": {
"event_id": "lkml_2021_12_9_302",
"message_count": 5
},
"v3": {
"event_id": "lkml_2021_12_10_152",
"message_count": 7
},
"v4": { ... }
},
"connections": [
{ "from": "v2", "to": "v3" },
{ "from": "v3", "to": "v4" }
]
}
Event Data (Content & Threads)
{
"event_id": "lkml_2010_3_5_305",
"root_url": "https://lkml.org/lkml/2010/3/5/305",
"message_count": 6,
"messages": [
{
"subject": "Re: [PATCH 1/1] integer overflow issue...",
"content": "On Tue, Jun 19, 2018... I'd rather roll the patch back...",
"saved_time": "2025-09-26 00:41:05"
}
],
"connections": [
{
"from": "https://lkml.org/lkml/2010/8/8/159",
"to": "https://lkml.org/lkml/2010/8/8/169"
}
]
}