Optimal Path Seriation

Optimal path is a seriation technique used in OptiPath and described in a paper by Brett Shepardson and Fred Shepardson which is being prepared for publication. The OptiPath software includes additional traditional seriation techniques such as occurrence seriation, nominal seriation, discrete seriation and frequency seriation. All of these take advantage of OptiPath's heuristic optimization algorithms. There is also an option to build your own custom seriation technique using OptiPath's algorithms. We actually recommend the custom approach (if you don't want to use the optimal path technique). The seriation technique can be set in the Seriations table. 

The object of seriation is to determine an ordering of a number of objects or "artifacts" that replicates as closely as possible their historical order of creation. As with all seriation techniques, Optimal path seriation relies on the assumption that the artifacts to be seriated share characteristics or features whose measures evolve gradually over time. The ordering of artifacts that produces the most gradual evolution of all features for all artifacts is considered the optimal seriation.

OptiPath minimizes the weighted sum of the squared rates of change from one item to the next in a seriation. The weights are the durations of the rates. The model is:
 

This problem is a variant of the Hamiltonian path problem with time windows. There are no known algorithms that will solve large problems optimally in a reasonable amount of time. OptiPath uses a heuristic technique that produces good answers with moderate effort.

If there are no earliest or latest dates assigned to individual artifacts, this model simplifies to one of minimizing the total distance of the path determined by the seriation.

This problem can be solved as a Hamiltonian path problem. Although this problem is easier than the problem with time windows, there are still no known algorithms that will solve large problems optimally in a reasonable amount of time. OptiPath uses a heuristic technique that produces good answers with moderate effort, less than that required for the problem with time windows.

For optimal path seriation, OptiPath uses the Optimal Path Seriation model with measured data and Euclidean distance. We choose Euclidean distance (and normalized data) because this metric penalizes large distances disproportionately more than small distances. Using Euclidean distance, the distance between two artifacts which differ by one unit in each of two features (the square root of two) is less than the distance between two artifacts which differ by two units in only one feature (two); whereas they would both be equal (two) using Manhattan distance or Hamming distance. The justification for this is that it will tend to discourage abrupt changes in any one feature from one artifact to another, preferring two features to change gradually than one feature to change abruptly. This is in keeping with the underlying assumption of seriation that change is smooth and gradual. If you prefer to use Manhattan distance you should set the seriation parameter Technique to Custom and reset the feature parameter Metric to Manhattan.

The following are the default feature parameter settings for optimal path seriation. To change them you must first select the Custom Seriation technique. With Custom seriation OptiPath allows you to simultaneously treat some features as occurrences/absences and others as classifications or as ranked or measured data - see Data in the Features table.

Data = Measured
Ranks = 0
Metric = Euclidean
Normalize = Yes
Transition = 5
Earlier = Unknown
Later = Unknown
Blanks = Unknown
Zeroes = Value

These settings result in any numerical entry in the Data being interpreted as indicating the presence of a feature (class or style). If you do not want zeroes to be interpreted as values, you should set the seriation parameter Technique to Custom and reset the feature parameter Zeroes. If you do not want blanks to be interpreted as unknown, you should set the seriation parameter Technique to Custom and reset the feature parameter Blanks.  To see the effect of your settings, look at processed data in the Values table.

In optimal path seriation, careful thought should be put into setting the feature parameters appropriately - see Setting the Earlier, Later, Blanks, Zeroes and Transition Parameters.