ArcSight & Security · Track 03

How ArcSight data volume metrics inflate a finding

Not every ArcSight license is metered on events per second. Several are metered on data volume, usually gigabytes per day, and volume metrics inflate a finding through double counting, raw versus indexed confusion, and non production data that should never have been on the meter.

Where ArcSight Logger and certain ingestion tiers are licensed on GB per day, the audit compares a measured daily volume to the entitled volume. That sounds simple, and the simplicity is the problem. A single event can be counted at multiple points in the pipeline, raw bytes can be confused with stored bytes, and lab or test traffic can be folded into the production figure. Each of these moves the volume number upward, and each is contestable.

Where the volume number comes from

OpenText measures ArcSight data volume by sampling the pipeline and extrapolating to a daily figure. The compliance team prepares an entitlement and support review, then runs the measurement, and the resulting GB per day number anchors the finding. Because the remedy for a shortfall is the deemed acquisition of licenses at then current list price, plus back maintenance and the cost of the audit, every extra gigabyte on the meter carries a stacked cost. A volume figure that is even modestly overstated translates into a finding that is overstated several times over.

The three ways volume inflates

Double counting across the pipeline

A modern ArcSight deployment moves data through collectors, connectors, an event broker or transformation hub, and one or more storage destinations. If the measurement counts the same event as it passes more than one of these stages, the daily volume is inflated by the architecture itself. The defensible figure is the volume of distinct data ingested, not the sum of every byte that crossed every hop.

Raw versus indexed and compressed storage

Data volume can be expressed as raw input bytes, as parsed and normalized bytes, or as compressed stored bytes, and these differ substantially. A finding that prices raw input against an entitlement defined in stored terms, or the reverse, is comparing two different quantities. We establish which unit the contract describes and hold the measurement to that unit.

Non production and lab data on the meter

Test environments, proof of concept clusters, and short lived investigative pipelines often run on the same software, and their traffic can be swept into the production volume figure. Non production use is a recognized line of challenge, and volume generated by environments that are not in production service should not be priced as if it were.

The mechanic

One event traversing a collector, an event broker, and two storage tiers can be counted four times. Add raw bytes measured against a stored entitlement and a lab cluster on the same meter, and a compliant deployment reports as a large overage.

How we take the volume number apart

The defense mirrors our work on the EPS line. We reconstruct the data flow independently, map every counting point, and rebuild the daily volume from distinct ingested data measured in the contractual unit. We then strip non production traffic, correct any double counting, and reconcile the result against the entitlement. In practice the corrected figure is materially lower than the figure the vendor script produces, because the script is built to capture the largest defensible reading rather than the most accurate one.

This is the same discipline that took our banking ArcSight engagement, case file E-03, from a $6.0M finding to a $1.8M settlement. The headline there was EPS and connectors, but volume questions follow the identical pattern: identify the unit, isolate what should not be on the meter, and reprice to the defensible number.

Protecting the volume evidence

The evidence that wins a volume dispute is your own pipeline telemetry, and it is most useful when collected on your terms. During the seven day notice window, route everything through a single controlled channel and do not let a measurement script run unsupervised across environments you have not scoped. If you are unsure whether a number reflects raw or stored data, treat it as unproven and ask the vendor to identify the unit before you concede anything.

Why volume metrics are harder to reconstruct than EPS

Events per second can be read from a single throughput counter, but data volume is distributed across the whole pipeline, which is what makes it easy to overstate and harder to defend without preparation. A daily volume figure is the sum of what every stage handled, and unless someone maps the flow, there is no way to tell whether a gigabyte was counted once or four times. That ambiguity favors the vendor, because the measurement script reports a single large number and leaves the buyer to prove it is wrong. The reconstruction we build flips that burden: it shows the flow explicitly, identifies every counting point, and demonstrates how much of the reported volume is duplication rather than distinct ingested data.

The compression question compounds the difficulty. ArcSight stores data far smaller than it arrives, sometimes by a large factor, so a figure expressed in stored terms and a figure expressed in raw terms can differ by an order of magnitude. A finding that quietly mixes the two, pricing raw input against a stored entitlement, can look enormous while describing a deployment that is fully compliant. We resolve this by fixing the unit against the contract first and refusing to let the measurement float between definitions.

Building the corrected daily volume

Our reconstruction proceeds in a defined order so the result is defensible rather than merely lower.

Map the pipeline. We document every stage data passes through, from collector to connector to event broker to each storage destination, and mark where the measurement sampled.
Identify the contractual unit. We read the entitlement to establish whether volume is raw, normalized, or stored, and convert all readings to that single unit.
Remove duplication. We reduce the figure to distinct ingested data, eliminating bytes counted more than once as a single event crossed multiple stages.
Strip non production. We separate lab, staging, and proof of concept traffic from production volume and remove it where the metric is scoped to production use.
Reconcile and reprice. We compare the corrected daily volume to the entitlement and reprice the finding to the gap that actually exists, if any remains.

By the end of that sequence the reported volume and the defensible volume are usually far apart, and the difference is not a negotiation concession. It is the correction of a measurement that counted more than the contract permits.

Have an ArcSight finding on the table?

Volume findings are reducible because the meter almost always counts more than the contract allows it to. We reconstruct the effective license position before any vendor script runs, then challenge the finding line by line. To put a defense team between you and the vendor, open a case or download the ArcSight EPS defense briefing.

Get The Number Down →

Related field notes

These notes from the ArcSight and Security audit defense cluster go deeper on the mechanics referenced above, and each links back to the complete OpenText audit defense playbook for 2026.

If you have received an OpenText or Micro Focus audit notice, the first seven days shape every week that follows. OpenText Audit Defense is an independent, buyer side practice founded in 2020 by former vendor compliance leadership. We have defended more than 200 audits, cut the average finding by 68 percent, and mitigated more than $90M in claims against vendor positions. We do not resell OpenText software and we are not affiliated with OpenText Corporation. To open a case, use the contact form on this site.