Completeness Analysis
The completeness report answers one question: does your time series contain all the data it should? For sensor data, completeness isn't just about null values — it's about temporal continuity. A perfectly formatted file can still be massively incomplete if large stretches of time are simply absent from the record.
SENTINEL approaches completeness through the lens of time intervals (the gaps between consecutive readings) rather than counting null cells. This reflects how industrial sensor data actually fails: not with NaN placeholders, but with silence.
What the Report Shows
When you open the Completeness modal, you see:
- Overall coverage percentage — the headline number: what fraction of expected readings are present
- Coverage over time — a bar chart showing data density across the recording period
- What's happening — a plain-English description of the gap pattern
- Key numbers — sample rate, sampling consistency, total gaps, and longest gap
- Gap thresholds — an interactive slider to classify gaps by severity
- Technical details — the full statistical breakdown, interval distribution chart, gap list, and sparse periods (collapsed by default)
- What to do next — prioritised recommendations
How Overall Completeness is Calculated
For regularly sampled data
When SENTINEL detects a consistent sampling rate (see Expected Interval Detection below), completeness is computed as:
where:
Example: A sensor recording at 1 Hz for exactly one hour should produce 3,601 readings (0 s through 3,600 s inclusive). If the file contains 3,312 readings, completeness = 3312 / 3601 = 92.0%.
For irregularly sampled data
When no consistent sampling rate is detected, SENTINEL falls back to a gap-time estimate:
where gap time is the sum of durations of all detected gaps (see Gap Detection). This is an approximation — it assumes that every gap represents missing data rather than an intentional pause — but it provides a useful lower-bound estimate.
Delta-T Analysis
The foundation of the completeness report is delta-T (Δt) analysis: the study of the time intervals between consecutive data points.
Given a time series with timestamps \(t_0, t_1, t_2, \ldots, t_n\), the Δt sequence is:
Each \(\Delta t\) is expressed in seconds. Readings where the value is absent are excluded before computing intervals — this handles multi-channel files where one channel may be missing readings while others continue.
Analysing the distribution of these intervals reveals far more than any single summary statistic:
- A perfectly regular sensor produces a spike at a single Δt value
- Occasional dropouts produce a main spike plus a small tail at multiples of the expected interval
- Sensors with variable duty cycles produce multiple peaks
- Data with systematic outages produces a heavy right tail or a bimodal distribution
Statistical Metrics
SENTINEL computes the following statistics over the Δt sequence:
Central tendency
| Metric | Formula | What it tells you |
|---|---|---|
| Mean | \(\bar{\Delta t} = \frac{1}{n}\sum_{k=1}^{n} \Delta t_k\) | Average interval; sensitive to large gaps |
| Median | Middle value of the sorted sequence | Typical interval; robust to outliers |
For gap-affected data, the median is a better estimate of the "true" sampling rate than the mean. A large gap inflates the mean but leaves the median unchanged.
Spread
| Metric | Formula | What it tells you |
|---|---|---|
| Standard deviation | \(\sigma = \sqrt{\frac{1}{n}\sum_{k=1}^{n}(\Delta t_k - \bar{\Delta t})^2}\) | Overall spread |
| Q1 / Q3 | 25th / 75th percentile | Central range of intervals |
| IQR | \(Q_3 - Q_1\) | Interquartile range; robust spread measure |
Coefficient of Variation (CV)
CV measures relative variability — how large the spread is compared to the mean. It is the primary indicator of sampling regularity:
| CV range | Sampling pattern | Meaning |
|---|---|---|
| < 0.05 | Very consistent | Near-perfectly uniform sampling |
| 0.05 – 0.15 | Consistent | Small timing jitter, typical of digital sensors |
| 0.15 – 0.30 | Moderate | Noticeable variability; may affect some algorithms |
| ≥ 0.30 | Variable | Highly irregular; resampling likely required |
A regular clock-based sensor will have CV ≈ 0. A sensor with occasional late transmissions might have CV ≈ 0.05–0.10. A sensor whose polling interval drifts significantly might reach CV ≈ 0.3 or above.
Distribution shape
| Metric | What it indicates |
|---|---|
| Skewness > 1.0 | Long-tailed distribution — most intervals are short but some are very long (typical of occasional gaps) |
| Multimodal | Multiple distinct sampling rates present in the same file (e.g., two sensors merged, or a sensor that switches between fast and slow modes) |
| Number of modes | Count of distinct peaks in the interval distribution |
Expected Interval Detection
SENTINEL attempts to detect whether the data has a consistent underlying sampling rate, even if individual intervals vary slightly due to clock jitter or transmission delays.
The method uses Kernel Density Estimation (KDE) on the Δt sequence to smooth the interval distribution and find its dominant peak. If a single clear peak is found, that peak's location is taken as the expected_interval. The detection is confirmed only when the CV is below the regularity threshold (< 0.05), ensuring we don't report a spurious "expected interval" for genuinely irregular data.
The expected_interval is used to:
- Compute the overall completeness percentage
- Set the window size for sparse period detection
- Drive the coverage timeline chart
If no expected interval is detected, the completeness report still runs — it uses the median interval as a proxy where needed.
Gap Detection
A gap is any Δt interval that significantly exceeds the typical sampling rate. SENTINEL uses Tukey's fence method (IQR-based) by default to set the gap threshold:
Any interval exceeding this threshold is classified as a gap. The threshold adapts automatically to the data — it is not a fixed constant.
Why IQR-based? The Tukey fence is robust to outliers: it uses percentile-based statistics (\(Q_1\), \(Q_3\)) rather than the mean or standard deviation, which can themselves be distorted by the very gaps we are trying to detect.
Two alternative detection methods are available:
| Method | Threshold formula | When to use |
|---|---|---|
| IQR (default) | \(Q_3 + 1.5 \times \text{IQR}\) | Most datasets; robust to skewed distributions |
| Standard deviation | \(\bar{\Delta t} + 3\sigma\) | Symmetric, approximately normal Δt distributions |
| Percentile | 95th percentile | When you want a fixed fraction of intervals flagged |
Gap properties
Each detected gap records:
- Start timestamp — the last data point before the gap
- End timestamp — the first data point after the gap
- Duration — end_timestamp − start_timestamp in seconds
- Classification — isolated or clustered (see below)
Gap classification: isolated vs. clustered
Once gaps are detected, SENTINEL classifies each one based on whether other gaps occur nearby:
- Isolated — the gap stands alone; there are no other gaps within a time window of
2 × median_intervalbefore or after it - Clustered — the gap is part of a group of gaps concentrated in a short period
Clustering indicates that something systematic was happening during that stretch — a sensor fault, a network outage, a scheduled maintenance window — rather than a random individual dropout.
Missing Pattern Classification
SENTINEL classifies the overall character of the data loss into one of five categories:
No gaps detected
No gaps were detected above the threshold. The data is temporally complete.
Random dropouts
Gaps are isolated and scattered without a detectable timing pattern. There is no evidence that the probability of missing data depends on when it occurs or what the values are. (Statisticians refer to this as Missing Completely At Random, or MCAR.)
What this means in practice: Individual transmission failures, occasional sensor resets, or infrequent logging errors. The underlying process continued normally; only the recording was intermittently interrupted.
Recommended next step: Linear interpolation is appropriate for short isolated gaps. The missing data can be treated as a straightforward estimation problem.
Detection criterion: More than 50% of detected gaps are classified as isolated.
Irregular coverage
Gaps occur more frequently during certain periods, but the missingness is not directly linked to the recorded values. The cause is external — machine state, environmental conditions, operator activity — but is not directly observable in the data itself. (Also known as Missing At Random, or MAR.)
What this means in practice: A sensor that drops packets during periods of high radio interference; a logger that slows down when connected to a slow network; gaps that correlate with shift changes.
Recommended next step: Treat isolated and clustered gaps differently. Understand the external conditions before attempting gap-filling.
Detection criterion: Neither the random dropout nor systematic outage criteria are met.
Systematic outages
The majority of gaps are clustered in time, indicating that missingness is caused by a recurring or persistent failure mode. The gaps are not random — they happen for a reason that is at least partly visible in the data. (Also known as Missing Not At Random, or MNAR.)
What this means in practice: Sensor failures triggered by extreme readings (e.g., a temperature sensor that stops recording above 150°C); scheduled maintenance windows; systematic communication failures in a specific time-of-day pattern.
Recommended next step: Do not fill across clustered gaps without first understanding the cause. Each cluster should be investigated separately. Simple interpolation will produce physically implausible values across a long outage.
Detection criterion: More than 50% of detected gaps are classified as clustered.
Extended outage
A special case where at least one gap exceeds 50 times the median sampling interval. For a 1 Hz sensor, this means a gap of more than 50 seconds. For a 1-minute sensor, more than 50 minutes.
Extended outage gaps are qualitatively different from ordinary missing data — they represent periods where the sensor was almost certainly off, disconnected, or physically inaccessible. Standard interpolation methods produce meaningless results across an extended outage.
Detection criterion: Any single gap duration \(> 50 \times\) median \(\Delta t\).
Sparse Period Detection
Gaps capture individual missing intervals. But a dataset can have no single gap above the threshold while still losing significant amounts of data — if many small gaps are spread across a specific time window, each one just below the detection threshold.
Sparse period detection addresses this with a sliding window scan:
- A window of size
Wseconds slides across the time range in steps ofW/2(50% overlap) - Within each window, SENTINEL counts the actual number of data points and compares to the expected count (
W / expected_interval) - Windows where actual < 85% of expected are flagged as sparse
- Overlapping flagged windows are merged into contiguous sparse periods
The window size W defaults to max(300, expected_interval × 60) — at least 5 minutes, or 60 sample-periods, whichever is larger. This ensures the window is wide enough to contain a statistically meaningful number of expected points.
What sparse periods reveal
A sparse period is shown with a completeness percentage indicating how much data was present in that window. For example:
- 62% complete in 09:00–10:00 means only 62% of the expected hourly readings arrived in that window
- 34% complete in 14:30–15:00 indicates a severe localised outage
Sparse periods appear in the completeness panel as a warning badge and are listed in the Technical Details section of the report. They are also visible as low bars in the coverage timeline chart.
Coverage Over Time Chart
The coverage chart provides an immediate visual answer to: when was the data missing?
The time range is divided into up to 60 equal-width buckets (fewer for short datasets, ensuring each bucket contains enough points to be meaningful). For each bucket:
Bars are colour-coded:
| Colour | Coverage | Interpretation |
|---|---|---|
| Green | ≥ 85% | Good coverage — close to or at full density |
| Orange | 50 – 84% | Degraded coverage — significant data loss |
| Red | < 50% | Severe coverage — more than half the expected data is absent |
The dashed green reference line marks the 85% target — the same threshold used by sparse period detection.
Reading the chart:
- A uniformly green chart indicates consistent data collection throughout the recording period
- A red cluster at a specific time indicates an outage during that window — investigate what was happening then
- A gradual orange-to-red fade at the edges can indicate that the sensor was powering up or down
- Alternating green/orange bars may indicate a sensor duty cycle (intentional sleep mode)
Quality Levels
SENTINEL combines all completeness metrics into an overall quality rating:
Excellent
- CV < 0.05 (very consistent sampling)
- Zero gaps detected
- Regular sampling pattern confirmed
- No sparse periods detected
The data is temporally complete and ready for analysis without any preprocessing.
Good
Either: - The global statistics are excellent but at least one localised sparse window was detected (worst window ≥ 70% complete), or - CV < 0.15, gap rate < 5%, and worst sparse window ≥ 50% complete
Minor issues exist but the data is suitable for most analytical methods.
Fair
- CV < 0.30
- Gap rate < 15%
- Worst sparse window ≥ 30% complete
Noticeable completeness issues. Some analyses will be unreliable without gap-filling or resampling.
Poor
Everything else — high CV, many gaps, or severe sparse periods. Significant preprocessing is required.
Gap Threshold Slider
The interactive slider in the report lets you adjust the classification boundary between "normal" and "warning" gaps, and between "warning" and "critical" gaps. This is useful for applying domain knowledge:
- T1 (Warning) — gaps longer than this are flagged as requiring attention
- T2 (Critical) — gaps longer than this are flagged as operationally significant failures
For example, for a 1 Hz vibration sensor: - T1 = 5 s (more than 5 missing readings is worth investigating) - T2 = 60 s (more than a minute offline is a serious problem)
For a 15-minute weather station: - T1 = 20 min (one missed reading is borderline) - T2 = 60 min (four consecutive missed readings is a failure)
The slider operates on a log scale so that fine-grained control is possible at short intervals and coarse control at long intervals. The counts update in real time using pre-computed histogram bins; exact counts are recalculated server-side after a brief debounce delay.
Recommendations
The report generates a prioritised list of recommendations based on what was found. The logic follows this order of concern:
- Extended outage — flagged first because it requires manual investigation before any processing
- Systematic outages — cluster-pattern gaps that need separate treatment per cluster
- Sparse periods — localised density loss with specific time windows identified
- Excellent data — no further action needed
- Gap severity — specific guidance based on whether gaps are minor, moderate, or severe
- Sampling pattern — resampling recommendations for irregular or multimodal data
- High CV — general consistency warnings
Technical Reference
Default configuration
| Parameter | Default | Description |
|---|---|---|
| Gap detection method | IQR | Tukey's fence: \(Q_3 + 1.5 \times \text{IQR}\) |
| IQR multiplier | 1.5 | Standard Tukey's fence coefficient |
| Cluster window multiplier | \(2\times\) | A gap is clustered if another gap exists within \(2 \times\) median \(\Delta t\) |
| Skewness threshold | 1.0 | \(\Delta t\) skewness \(> 1.0\) → long-tailed distribution classification |
| Multimodal sensitivity | 0.1 | Peak prominence threshold for multimodal detection |
| Sparse window threshold | 85% | Completeness ratio below which a window is flagged as sparse |
| Extended outage multiplier | \(50\times\) | Any gap \(> 50 \times\) median \(\Delta t\) triggers an extended outage classification |
| Coverage chart buckets | 60 | Maximum time buckets in the coverage timeline chart |
How the analysis works
When you upload a time series or click Re-run Analysis, SENTINEL performs the following steps in sequence:
- Interval extraction — Computes the time gap between every pair of consecutive readings.
- Statistical profiling — Calculates central tendency, spread, and distribution shape across all intervals.
- Gap threshold determination — Sets the boundary between normal timing variation and a true gap, adapting automatically to your data.
- Gap detection and classification — Identifies intervals that exceed the threshold and labels each as isolated or clustered.
- Sparse period scan — Sweeps a sliding window across the timeline to catch stretches of reduced density that individually fall below the single-gap threshold.
- Missing pattern classification — Assigns an overall character to the data loss (random dropouts, irregular coverage, systematic outages, or extended outage).
- Overall completeness calculation — Produces the headline coverage percentage.
- Coverage chart generation — Divides the timeline into equal-width buckets and calculates per-bucket completeness for the bar chart.
Stored results and re-running analysis
Analysis results are saved automatically after each run. This means reloading the dashboard does not trigger a new analysis — your results are recalled instantly. Use Re-run Analysis to recalculate with the latest analysis engine whenever you want fresh results.
If you see "N/A" for the completeness percentage or a "not available" notice where the coverage chart should appear, the stored result was generated by an older version of SENTINEL. Click Re-run Analysis to update it. All other report sections — the gap list, interval distribution chart, sparse periods, and recommendations — remain fully functional in the meantime.