hexa.ninja mascot

SMART monitoring with nux

By Thomas Sileo11 min read

A quick demo of nux features.

I’ve been worrying about my NAS drives lately and wanted to start monitoring disks health using SMART (Self-Monitoring, Analysis and Reporting Technology).

nux is a new part of my self-hosted infrastructure that ingests events (and provides notifications). It’s a perfect fit to track disks health.

nux comes with a suite of tools from a basic CLI (nux-notify) to tools for monitoring cron jobs (nux-wrap), but for monitoring disks, I’m going to demo nux-analyze.

nux-analyze is a tool that uses Lua scripts to extract events from log files and any structured data.

Step 1: Getting the SMART Data

smartctl (from smartmontools) is a CLI tool that can read SMART data from the drive.

-H outputs the “health status”, the passed boolean flag:

sudo smartctl --json -H /dev/sda1
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      4
    ],
    "pre_release": false,
    "svn_revision": "5530",
    "platform_info": "x86_64-linux-6.14.0-37-generic",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "--json",
      "-H",
      "/dev/sda1"
    ],
    "drive_database_version": {
      "string": "7.3/5528"
    },
    "exit_status": 0
  },
  "local_time": {
    "time_t": 1771961410,
    "asctime": "Tue Feb 24 20:30:10 2026 CET"
  },
  "device": {
    "name": "/dev/sda1",
    "info_name": "/dev/sda1 [SAT]",
    "type": "sat",
    "protocol": "ATA"
  },
  "smart_status": {
    "passed": true
  }
}

-A outputs vendor-specific attributes and values, i.e. metrics

sudo smartctl --json -A /dev/sda1                                                 
{                                                            
  "json_format_version": [                                                                                                 
    1,                                                       
    0                                                        
  ],                                                         
  "smartctl": {                                              
    "version": [                                             
      7,                                                                                                                   
      4                                                                                                                    
    ],                                                       
    "pre_release": false,                                                                                                  
    "svn_revision": "5530",                                                                                                
    "platform_info": "x86_64-linux-6.14.0-37-generic",                                                                     
    "build_info": "(local build)",                                                                                         
    "argv": [                                                                                                              
      "smartctl",                                            
      "--json",                                              
      "-A",                                                                                                                
      "/dev/sda1"                                            
    ],                                                       
    "drive_database_version": {                              
      "string": "7.3/5528"                                                                                                 
    },                                                                                                                     
    "exit_status": 0                                         
  },                                                         
  "local_time": {                                            
    "time_t": 1771961544,                                    
    "asctime": "Tue Feb 24 20:32:24 2026 CET"                                                                              
  },                                                         
  "device": {              
    "name": "/dev/sda1",                                     
    "info_name": "/dev/sda1 [SAT]",                          
    "type": "sat",                                           
    "protocol": "ATA"  
  },                                                                                                                       
  "ata_smart_attributes": {
    "revision": 10,
    "table": [
      {
        "id": 1,
        "name": "Raw_Read_Error_Rate",
        "value": 80,
        "worst": 64,
        "thresh": 44,
        "when_failed": "",
        "flags": {
          "value": 15,
          "string": "POSR-- ", 
          "prefailure": true,
          "updated_online": true,
          "performance": true, 
          "error_rate": true,
          "event_count": false,
          "auto_keep": false
        },
        "raw": {
          "value": 102659272,
          "string": "102659272"
        }
      },
      [...]
    ]
  },
  "power_on_time": {
    "hours": 5425
  },
  "power_cycle_count": 7,
  "temperature": {
    "current": 37
  }
}

Reading SMART data requires root, so I chose to run a script as root that dumps the output to a log file (JSONL is the format for new-line-delimited JSON objects):

#!/bin/bash
for disk in /dev/sd? /dev/nvme?n?; do
    [ -e "$disk" ] || continue
    sudo smartctl -A -H --json "$disk" 2>/dev/null | \
        jq -c --arg disk "$disk" '{disk: $disk} + .' 
done > /var/log/smart.jsonl

Now that the data is in a JSONL log file, we can write a nux-analyze script to process it.

Step 2: Generating Events for nux

Lua scripts for nux have a send helper to send events to nux. It also expects 3 functions:

  • init to setup variables/shared state
  • on_line triggered for each input line, in our case, a JSON object per line with the SMART data for each drive
    • this function can aggregate data
  • finish used to compute the summary

You can see the full documentation for nux-analyze in the repo.

One thing to know before diving into the script is that ATA (traditional HDD/SSD) and NVMe drives expose different health data. The script handles both.

For NVMe, the important bit is percentage_used, a field that represents the estimated consumed lifespan of the drive.

For ATA attributes, I relied on Backblaze’s analysis and picked the ones they found most predictive of failures: reallocated sectors, uncorrectable errors, pending sectors, and a few others.

Also, one gotcha worth knowing: the raw value for attribute 194 (temperature) on some drives packs historical min/max into the high bytes, so you need to mask it (raw % 256) (extracting the last byte) to get the actual current temperature. In my initial testing I got crazy-high values before figuring this out.

Here is an excerpt of the script (full version):

-- analyzers/smart.lua
-- Analyzes smartctl --json output, tracks attribute changes, alerts on degradation.

local CRITICAL_ATTRS = {
    [5]   = "Reallocated_Sector_Ct",
    [10]  = "Spin_Retry_Count",
    [187] = "Reported_Uncorrect",
    [188] = "Command_Timeout",
    [194] = "Temperature_Celsius",
    [197] = "Current_Pending_Sector",
    [198] = "Offline_Uncorrectable",
    [199] = "UDMA_CRC_Error_Count",
}

function init(ctx)
    ctx.disks = {}
    ctx.lines_seen = 0
end

function on_line(ctx, line, n)
    if type(line) ~= "table" then return end
    ctx.lines_seen = ctx.lines_seen + 1

    local disk = line.disk or "unknown"
    local short = disk:gsub("/dev/", "")
    local info = { disk = disk, warnings = {}, fields = {} }

    -- Overall health
    local passed = true
    if line.smart_status and line.smart_status.passed == false then
        passed = false
        send({
            event_type = "disk.smart_failed",
            level = "error",
            title = string.format("SMART health FAILED: %s", disk),
            message = string.format("SMART overall health test failed for %s", disk),
            source = ctx.source,
            priority = 5,
            fields = { disk = disk },
        })
    end

    -- Check critical ATA attributes
    [...]

    -- NVMe health
    if line.nvme_smart_health_information_log then
        local nvme = line.nvme_smart_health_information_log
        local pct = nvme.percentage_used or 0
        local spare = nvme.available_spare or 100
        local temp = nvme.temperature or 0
        local media_errors = nvme.media_errors or 0
        info.fields.percentage_used = pct
        info.fields.available_spare = spare
        info.fields.temperature = temp
        info.fields.media_errors = media_errors

        local prev_media_key = short .. "_nvme_media_errors"
        local prev_media = ctx.state[prev_media_key] or 0
        if media_errors > 0 then
            table.insert(info.warnings, string.format("NVMe media_errors = %d", media_errors))
        end
        if media_errors > prev_media and prev_media > 0 then
            send({
                event_type = "disk.smart_nvme_errors",
                level = "error",
                title = string.format("NVMe %s media errors increasing: %d -> %d", disk, prev_media, media_errors),
                message = string.format("NVMe %s media_errors went from %d to %d", disk, prev_media, media_errors),
                source = ctx.source,
                priority = 5,
                fields = { disk = disk, previous = prev_media, current = media_errors },
            })
        end
        ctx.state[prev_media_key] = media_errors

        if pct > 90 then
            table.insert(info.warnings, string.format("NVMe %d%% used", pct))
            send({
                event_type = "disk.smart_nvme_wear",
                level = "warn",
                title = string.format("NVMe %s at %d%% used", disk, pct),
                message = string.format("NVMe %s percentage_used = %d%%, available_spare = %d%%", disk, pct, spare),
                source = ctx.source,
                priority = 4,
                fields = { disk = disk, percentage_used = pct, available_spare = spare },
            })
        end
        if spare < 20 then
            table.insert(info.warnings, string.format("NVMe spare at %d%%", spare))
            send({
                event_type = "disk.smart_nvme_spare",
                level = "error",
                title = string.format("NVMe %s spare low: %d%%", disk, spare),
                message = string.format("NVMe %s available_spare = %d%%", disk, spare),
                source = ctx.source,
                priority = 5,
                fields = { disk = disk, percentage_used = pct, available_spare = spare },
            })
        end
        if temp > 70 then
            table.insert(info.warnings, string.format("NVMe temp %d C", temp))
            send({
                event_type = "disk.smart_nvme_temp",
                level = "warn",
                title = string.format("NVMe %s temperature high: %d C", disk, temp),
                message = string.format("NVMe %s temperature = %d C (threshold: 70 C)", disk, temp),
                source = ctx.source,
                priority = 4,
                fields = { disk = disk, temperature = temp },
            })
        end
    end

    info.passed = passed
    ctx.disks[disk] = info
end

function finish(ctx)
    if ctx.lines_seen == 0 then
        return
    end

    local all_ok = true
    local lines = {}
    local disk_count = 0
    local summary_fields = {}

    for disk, info in pairs(ctx.disks) do
        disk_count = disk_count + 1
        local short = disk:gsub("/dev/", "")

        if #info.warnings > 0 then
            all_ok = false
            table.insert(lines, string.format("%s: %s", disk, table.concat(info.warnings, ", ")))
        else
            table.insert(lines, string.format("%s: OK", disk))
        end

        summary_fields[short] = info.fields
    end

    if disk_count == 0 then
        return
    end

    summary_fields.disk_count = disk_count
    local message = table.concat(lines, "\n")

    if not all_ok then
        send({
            event_type = "disk.smart_summary",
            level = "warn",
            title = string.format("SMART: warnings detected across %d disks", disk_count),
            message = message,
            source = ctx.source,
            priority = 3,
            fields = summary_fields,
        })
    else
        send({
            event_type = "disk.smart_summary",
            level = "info",
            title = string.format("SMART: %d disks OK", disk_count),
            message = message,
            source = ctx.source,
            priority = 1,
            fields = summary_fields,
        })
    end
end

Now we just need to run it through nux-analyze:

nux-analyze --reset --script smart.lua /var/log/smart.jsonl

By default, nux-analyze processes log files incrementally, saving its cursor position between runs.

--reset reprocesses the whole file (the script is overriding the log file each time, and contains one line per drive).

And here is the result in the UI:

A screenshot of the nux UI with the SMART monitoring event expanded.

That’s it! All my drives are now monitored properly and I’ll sleep better at night.

If something goes wrong, I’ll get a desktop notification and a push notification on my phone thanks to nux.

Even if SMART monitoring won’t be 100% accurate in predicting drive failures, it’s way better than ignoring the metrics.

The full script with instructions is available in the nux repo. It also showcases my cron setup for running this daily.