

The size of the files in an index layer is inversely proportional to the number of partitions produced by the Data Archiving Library. In an index layer, the more attributes and attribute values there are, the more likely it is that the small files problem will manifest itself. In HERE platform indexing, the Data Archiving Library can cause this problem through excessive partitioning. The problem is that when data is broken down into a large number of small or very small files, processing them becomes very inefficient. The small files problem is a well-known problem in the big data domain. Limit the Number of Attributes and Attribute Values The maximum number of additional attributes is three. Note that you must always include a timewindow attribute. name: eventTime, type: timewindow (with desired time slice duration).name: tileId, type: heretile (with desired zoom level).Therefore, you would design the index layer with following indexing attributes: In this use case, you would query your indexed data on multiple characteristics like event type, geolocation and timestamp. You plan to index vehicle sensor data and you are interested in understanding different events occurring in different geographic locations at different times. One way to think about indexing attributes is to consider the characteristics by which you want to query your indexed data.įor example, consider the following use case. Note that these indexing attributes cannot be modified once an index layer is created. The most important design consideration should be selecting the indexing attributes when creating an index layer. If your stream layer retention period is less than or equal to aggregation.window-seconds value, data loss could occur. This configuration will ensure fault tolerance in case your pipeline experiences brief failure. The recommendation is to always set the stream layer retention higher than aggregation.window-seconds value.įor example: If your conds value is 1800 (30 minutes), then you should configure your stream layer retention to at least 120 minutes.
#Desired timeslice in seconds archive#
The time necessary to archive your data will vary depending upon the configuration of your pipeline. Your archiving pipeline will stream data continuously, batch data in memory based on the indexing attribute values and aggregation.window-seconds value, and then archive batched data periodically.

Configure your stream layer retention period to be greater than the "aggregation.window-seconds" valueĮnsuring your stream layer retention period is long enough is an important design consideration for your pipeline because it correlates to the level of fault tolerance you build into your workflow. Because datasets have unique characteristics, you should experiment with different settings to find the optimal design for your application. This topic discusses some of the factors that can impact the performance of your indexing process.
