Making the most of your metadata

The importance of metadata in an analytics system cannot be overstated. It gives context to the details of customer behavior and quality measurements, and it makes it possible to create relevant aggregates in the system, to understand the measurement data in different contexts.

One of the important metadata categories in the system is the asset name-related metadata for the consumed video assets. This data includes the live/VoD specification, channel name, VoD asset name, and service name to which the asset belongs to.

For example, live/CNN/free channels, or VoD/Citizen Kane/nPVR.

Having correct asset-related metadata makes it possible to understand service and content popularity per provider, as well as the quality of experience per service or channel, and therefore identify troublesome VoD-assets.

Make sure the metadata is correct

Agama integration certification process usually assures that the correct metadata is provided by the integration, but there are several reasons a system may be populated with incorrect information.

An old end-of-life device with a partial integration can’t be updated on backend change, or correct information is not available to the application when making the integration – to mention a few. For this reason, the Agama system provides an advanced mechanism for making the most out of the available metadata from the device integration and from external sources.

Asset name washing

The basic principle for the washing mechanism is the following:

Use URI to categorize a session
Optionally pass-through sessions that have the metadata specified, if information from the device is the preferred source of information
Extract asset name and service name specification from URI
Optionally override live/VoD flag
Map channel names to URIs from external inventory sources such as channel name inventory

The picture below illustrates the steps that the system takes when metadata washing is configured.

Hands-on configuration

To get an understanding of how configuration is done, here’s an example. The yaml file snippets are a part of the data-transformation.yaml file, which is a part of the RTN-service configuration and must be identical on all nodes.

In this system, we conclude that integrations report incorrect asset-names and live/VoD-specification. Therefore, we chose to discard the information set by the devices and create new metadata based on the session URI.

First, we specify that we want to create new metadata disregarding the integration.

prefer-asset-id-from-uri: true

After this, we proceed to specify the rules for creating the needed metadata.

The rules

The first rule in priority order aims to match all live sessions and override asset-name and live/VoD metadata coming from the client.

higher priority is executed first, while the enabled flag enables the rule
match part of the rule matches on session URI to decide if the rule should be applied
asset part extracts a part of the URI for later mapping with the external file as this rule is for live channels
the type part, if present, overrides live/VoD specification that comes from the client
the server section extracts server name/ip for server-based aggregates

The next rule in priority order aims to match all VOD sessions and based on the URI call them PLTV-<assetid> or TVOD-<assetid>.

most of the specification works in exactly the same way as in the previous rule
asset replace specification is new. It extracts groups from the URI and creates an asset name based on the substrings in the URI

The final rule, executed at the end, aims to catch any unexpected URIs that didn’t match any previous rule. We don’t expect to end up here, but it is good protection for not accidentally creating tens of thousands of aggregates by mistake.

Mapping the file

After applying regex rules, the system will optionally try to map results from regex output with a list of mappings provided in a csv-file.

Configuration for enabling this feature:

This configuration allows you to specify the mapping file, presence of a header in file and column names to use for matching as well as the result column. Being able to specify columns allows you to use csv files that have more columns than the required ones and sometimes spares the administrator the work of modifying a file that is already available from an internal inventory.

To sum up

In conclusion, there are several topics to be considered in order to clean up the asset-related metadata and get consistent and correct information in the system:

Have correct channel names available in the system
Maintain correct playlist type specification
Users can differentiate between PLTV and TVOD assets in the system
Backend changes will not risk creating excessive amounts of data in the system thanks to a “catch all”-rule.

This is all that is needed to have consistent aggregates in your system and easily identify problematic VoD assets.

About Aner Gusic

Aner is a Senior Engineer and Product Expert, focusing on customer success, as well as on product development projects, in order to deliver professional solutions for video analytics and monitoring. Aner has more than 15 years of experience in IT and telecom industry, working in both technical and leadership positions.