HRAF API 1 Examples

3D PCA

Finds terms that are closely related to the query and simplifies their complex numerical descriptions into the three most salient features. These features are plotted in a 3D space to illustrate both groupings as well as the strength of relationships. Using an indexer improves the speed of the computation, but lowers its accuracy.

Average of Similar Words

Finds a list of terms most similar to the positive query, if provided, and most dissimilar to the negative query, if provided. Utilizing the results of this first search, a second search identifies terms similar not just to the initial query word(s), but to the entire set of terms in the first result, extending the exploration to include additional, abstract, or indirectly related terms. Using an indexer improves the speed of the computation, but lowers its accuracy.

Hypergraph

Constructs a hypergraph by first generating an initial list of terms most similar to the query. Each term from this initial list is then used to produce additional lists of similar terms, with overlapping terms being used to connect the groups within the hypergraph. Raising the threshold ensures a more closely related set of terms, while lowering the threshold broadens the diversity within the groups. Using an indexer improves the speed of the computation, but lowers its accuracy.

Wordsnetwork

Builds a network (graph) of related terms by first finding a list of terms similar to the query. Each term from this initial list is then used to produce additional lists of similar terms. Lists of similar terms are then generated for each term in the secondary set of lists. Relationships are illustrated with red lines in the first iteration, green lines in the second iteration, and blue lines in the third. Raising the threshold ensures a more closely related set of terms, while lowering the threshold broadens the diversity of the depicted concepts. Using an indexer improves the speed of the computation, but lowers its accuracy.

Ethnoword PCA

Finds relationships between a query and its most closely related terms, identifying terms that have the most and least influence on these relationships to produce tables of positive and negative correlation. Using an indexer improves the speed of the computation, but lowers its accuracy.

OCM Suggester

Suggests OCM subject identifiers for user-submitted text based on the similarity of the vocabulary to the eHRAF World Cultures corpus, ranking the proposed subjects by the probability they are associated with the words of the user-submitted text.

Subtopics

Suggests subtopics for user-submitted text based on similarity to the vocabulary associated with a specific OCM identifier, providing two examples per subtopic identified. Examples include both a list of salient terms for the topic as well as matching paragraphs from the corpus. Where possible, specific names and numbers within lists of salient terms have been substituted for generic (named-entity recognition) labels.

Combination

Identifies dimensions of a query by first generating an initial list of terms most similar to the query. Each term from this initial list is then used to produce an additional list of similar terms: a dimension. Combinations of the dimensions are explored by subtracting the dimensions from one another to reveal contrasting features. Raising the threshold ensures a more closely related set of terms, while lowering the threshold broadens the diversity within the dimensions. The 'Top Number' parameter sets a maximum number of terms per dimension. Using an indexer improves the speed of the computation, but lowers its accuracy.