Features

Features are measurable attributes shared by artifacts or assemblages. It is assumed in seriation that the evolution of a feature's measure over time is gradual. If you have chosen a seriation Technique other than Custom, you will not be able to modify some of the feature parameters.

On the Features dialog you may add, edit and delete features.

There are a number of attributes associated with each feature:

Index - an integer value assigned to a feature that allows you to sort features in a list.

Feature - a name that uniquely identifies the feature. Feature names are restricted to 50 characters. The feature name is required for each feature. The following names are reserved for use by OptiPath and cannot be used as feature names: Index, Name, Assemblage, Type, Description, Earliest, Latest, Exclude, Order, Date, Distance.

Description - an optional description that can be entered for each feature. There is no limit to the length of a description.

Data - allows the user to specify the format of the data. The options are Measured, Ranked and Classed. If you have chosen a seriation Technique other than Custom, you will not be able to modify the Data parameter.

Ranks - indicates the limit on the number of ranks or classes allowed for ranked or classed data (see Data above). If Ranks is equal to zero, there is no limit on the number of ranks. For Measured data Ranks must be 0. For Classed data Ranks must be 0 or 1. For Ranked data Ranks may be any non-negative integer. Setting Ranks equal to 1 implies binary data - an artifact either possesses or does not possess a feature - all non-blank entries are considered to indicate the presence of the feature, while all blank entries indicate the absence. For more information see Ranks. If you have chosen a seriation Technique other than Custom, you will not be able to modify the Ranks parameter.

Metric - allows the user to specify the metric (or distance function) to be used in computing distances (in "feature space") between artifacts or assemblages. The options are Euclidean distance, Manhattan distance and Hamming distance. Euclidean distance is the normal distance that we deal with in everyday life. Manhattan distance is computed by taking the sum of the distances for each feature taken one at a time. While Euclidean gives you the shortest distance between two points "as the crow flies", Manhattan distance is like walking along city blocks in New York - the distance walked is the sum of the distances walked along streets and avenues separately. With Manhattan distance you don't have the option of cutting diagonally through city blocks. Hamming distance, like Manhattan distance, is the sum of the distances computed feature by feature, where the distance for each feature is restricted to be either zero or one. The feature distance between two artifacts (or assemblages) is zero if the two have the same feature value, one otherwise (regardless of how different they are). The effect of Hamming distance is simply to count up in how many features two artifacts (or assemblages) differ. If you have chosen a seriation Technique other than Custom, you will not be able to modify the Metric parameter.

Normalize - allows the user to specify that the values (and hence the distances) for this feature should be normalized. This is done by converting values to standard deviations. If all features are normalized, then each contributes equally to the total distance between artifacts (or assemblages). Otherwise, a feature whose values range from zero to a thousand would generally have a much greater contribution to distance than a feature that ranges from zero to ten. Normalization is particularly useful if different features are expressed in very different units (for example, weight and length, or tons and ounces) or if different distance functions (Euclidean, Manhattan or Hamming) are used for different features, particularly for features using Euclidean or Manhattan distance if another feature uses Hamming distance. The intention of normalizing data is to make one feature as comparable as possible to another as far as their contribution to the total distance between artifacts is concerned. It is probably not a very good idea to normalize features using Hamming distance (Differences). If you have chosen a seriation Technique other than Custom, you will not be able to modify the Normalize parameter.

Weight - indicates a relative weight to be given to this feature. Weights are relative. Weights of 1, 2 and 3 on three features are equivalent to weights of 3, 6 and 9. For weights to be considered, the seriation parameter Weights must be selected (this allows you to turn weights on and off without having to edit the weight for each feature). If the seriation parameter Weights is not selected, a weight of 1 is used for each feature rather than the values set in the feature parameters.

Transition - is a penalty to be applied to this feature each time the feature transitions from absent to present or vice versa in a seriation. Many practitioners believe that a stylistic feature is likely to appear only once in the archaeological record - once it is extinguished it is unlikely to reappear. The implication is that a seriation with a feature present for a number of consecutive artifacts and then absent for a number of subsequent artifacts and then present again for even later artifacts is less realistic than a seriation where the artifacts having a feature are not interrupted by some that do not. The transition penalty is a means of enforcing this. The larger the penalty, the less likely OptiPath is to create a seriation with interrupted occurrences of a feature. If you have chosen a seriation Technique other than Custom, you will not be able to modify the Transition parameter.

Earlier - indicates how this feature should be considered for artifacts earlier than the earliest artifact in the data set. The options are Absent, Zero and Unknown. Earlier artifacts could be considered to be absent or their feature values could be assumed to be zero or unknown. Each assumption can lead to different results in seriation. For more information see Earlier and Setting the Earlier, Later, Blanks and Zeroes Parameters. If you have chosen a seriation Technique other than Custom, you will not be able to modify the Earlier parameter.

Later - indicates how artifacts later than the earliest in the data set should be considered for this feature. The options are Unknown, Absent and Zero. Later artifacts could be considered to be absent or their feature values could be assumed to be zero or unknown. Each assumption can lead to different results in seriation. For more information see Later and Setting the Earlier, Later, Blanks and Zeroes Parameters. If you have chosen a seriation Technique other than Custom, you will not be able to modify the Later parameter.

Blanks - indicates how blank values in the data set should be considered for this feature. The options are Unknown, Absent and Zero. Blanks could be considered to indicate the feature is absent or that it has a value of zero or unknown. Each assumption can lead to different results in seriation. For more information see Blanks and Setting the Earlier, Later, Blanks and Zeroes Parameters. If you have chosen a seriation Technique other than Custom, you will not be able to modify the Blanks parameter.

Zeroes - indicates how zero values in the data set should be considered for this feature. The options are Value, Absent, Value & Absent and Unknown. A zero could be considered to indicate the value of the measure of a feature, or it could indicate the feature is absent, or that it both has a value of zero and is absent, or it is unknown. Each assumption can lead to different results in seriation. For more information see Zeroes and Setting the Earlier, Later, Blanks and Zeroes Parameters. If you have chosen a seriation Technique other than Custom, you will not be able to modify the Zeroes parameter.

Exclude - this parameter allows you to exclude a feature temporarily from your seriation. The alternative is to delete the feature in which case all information and data related to the feature will be lost to this seriation permanently. In contrast, using Exclude allows you to resurrect the feature and its data by clearing the Exclude field.