Modules - words

Introduction

The words module provides access to the WordNet lexical database from Princeton University. The database contains huge amounts of information about English words, particularly their part of speech (noun, verb, adverb, or adjective). This information can be very useful when building Natural Language Processing systems in Plang.

Note: the words module is optional in Plang and will only be available if WordNet was present on the system when Plang was built. Many GNU/Linux distributions have WordNet packages already. You can try installing wordnet and wordnet-devel (or wordnet-dev) to see if your distribution already has WordNet. If not, download and build it from the sources at the link above. You will need the "devel" package installed to build Plang with WordNet.

The WordNet library and database is distributed under a BSD-style license. The Plang words module that wraps the WordNet library is distributed under the GNU Lesser General Public License, Version 3 (LGPLv3).

The only language supported by WordNet is English. Primarily American English, but there are also British English spellings in the database (e.g. "colour", "organise", etc). Other languages have different parts of speech and sentence structure, so will require separate add-on modules to handle them in Plang.

Word testing

To use the Plang module, first import the words module into your application:

 :- import(words).

The names of the predicates in the module are prefixed with "words::" such as words::adjective/1 and words::noun/1. The part of speech testing predicates take a single argument and succeed or fail depending upon whether the word has a specific part of speech:

 :- import(words).
 :- import(stdout).

 dump_word(Word)
 {
     stdout::write(Word);
     stdout::write(":");
     if (words::adjective(Word))
         stdout::write(" adjective");
     if (words::adverb(Word))
         stdout::write(" adverb");
     if (words::noun(Word))
         stdout::write(" noun");
     if (words::verb(Word))
         stdout::write(" verb");
     stdout::writeln();
 }

The predicates in the words module can also be used in definite clause grammar (DCG) rules to help parse sentences in English:

 sentence --> noun_phrase, verb_phrase.
 noun_phrase --> det, words::noun.
 verb_phrase --> words::verb, noun_phrase.
 det --> ["the"].
 det --> ["a"].

Multi-word entries

The WordNet database contains a large number of multi-word entries such as "bobby_fischer", "hand_out", "short_and_sweet", etc. These are also recognized by the part of speech testing predicates. Either a space or an underscore can be used as the word separator. It is possible to modify a set of DCG rules to recognize multi-word forms in a regular sentence as shown in the following example:

 noun_phrase --> det, noun.

 noun([Word|Out], Out)
 {
     words::noun(Word);
 }
 noun([Word1,Word2|Out], Out)
 {
     Word is Word1 + "_" + Word2;
     words::noun(Word);
 }
 noun([Word1,Word2,Word3|Out], Out)
 {
     Word is Word1 + "_" + Word2 + "_" + Word3;
     words::noun(Word);
 }

Advanced queries

WordNet has a large number of queries that can be performed on words. Most produce a list of related words that are in some relationship with the word being queried. In Plang, advanced queries on the WordNet database can be performed with words::search/5, words::overview/2, and words::description/5. For example, the following code queries for the parts of the noun "hand" according to all senses of the word:

 words::search("hand", noun, haspartptr, allsenses, Result);

The Result will be a list that includes words like "finger", "palm", etc. An overview of the word, similar to a dictionary entry, can be obtained with words::description/5:

 words::description("hand", noun, overview, allsenses, Description);

A dictionary-like entry that lists all parts of speech and senses of a word can be retrieved with words::overview/2:

 words::word("hand", Description);

The permitted query types are based on the names given to them by WordNet:

 antptr, hyperptr, hypoptr, entailptr, simptr, ismemberptr,
 isstuffptr, ispartptr, hasmemberptr, hasstuffptr, haspartptr,
 meronym, holonym, causeto, pplptr, seealsoptr, pertptr,
 attribute, verbgroup, derivation, classification, class, syns,
 freq, frames, coords, relatives, hmeronym, hholonym, wngrep,
 overview, classif_category, classif_usage, classif_regional,
 class_category, class_usage, class_regional, instance, instances

Some familiarity with WordNet's terminology will be required to correctly format an advanced query on the database.

Base forms of words

Many words in the English language are variations on other words with the addition of a suffix. For example, "eat" and "eating". The WordNet database stores information on the base forms but not the suffixed forms. The word testing predicates will check for base forms, but the advanced queries will not.

Use the base_forms/3 predicate to explicitly fetch a list of all the base forms of a word with respect to a specific part of speech:

 base_forms("eating", verb, BaseForms)

The base_forms/3 predicate will fail if the word does not have a base form. That is, it will succeed for "eating", but not for "eat". The base_form/3 predicate will return the first base form, or the word itself if the word does not have a base form:

 base_form("eating", verb, Base1)     Base1 will be set to "eat"
 base_form("eat", verb, Base2)        Base2 will be set to "eat"
 base_form("bobby", verb, Base3)      fails - "bobby" is not a verb
 base_form("bobby", noun, Base4)      Base4 will be set to "bobby"

words::adjective/1, words::adverb/1, words::base_form/3, words::base_forms/3, words::description/5, words::noun/1, words::overview/2, words::search/5, words::verb/1


words::adjective/1 - tests a word to determine if it is an adjective.

Usage
words::adjective(Word)
Description
If Word is an atom or string whose name or the name of one of its base forms is registered in the WordNet database as an adjective, then succeed. Fail otherwise. The Word will be converted to lower case, with spaces replaced with underscores, before testing.
There are also arity-2 and arity-3 versions of this predicate in the module that can be used in definite clause grammar rules to recognize adjectives:
 noun_phrase --> det, words::adjective, words::noun.
 noun_phrase(np(D,adj(A),n(N))) --> det(D), words::adjective(A), words::noun(N).
In the second example above, A will be unified with the adjective to assist with building a parse tree for the sentence.
Examples
 words::adjective(animated)       succeeds
 words::adjective("Animated")     succeeds
 words::adjective(harpsichord)    fails
 words::adjective(X)              fails
 words::adjective(15)             fails
See Also
words::adverb/1, words::base_form/3, words::noun/1, words::verb/1

words::adverb/1 - tests a word to determine if it is an adverb.

Usage
words::adverb(Word)
Description
If Word is an atom or string whose name or the name of one of its base forms is registered in the WordNet database as an adverb, then succeed. Fail otherwise. The Word will be converted to lower case, with spaces replaced with underscores, before testing.
There are also arity-2 and arity-3 versions of this predicate in the module that can be used in definite clause grammar rules to recognize adverbs: adverbs:
 verb_phrase --> words::adverb, words::verb, noun_phrase.
 verb_phrase(vp(adv(A),v(V),NP)) --> words::adverb(A), words::verb(V), noun_phrase(NP).
In the second example above, A will be unified with the adverb to assist with building a parse tree for the sentence.
Examples
 words::adverb(fitfully)          succeeds
 words::adverb("Fitfully")        succeeds
 words::adverb(harpsichord)       fails
 words::adverb(X)                 fails
 words::adverb(15)                fails
See Also
words::adjective/1, words::base_form/3, words::noun/1, words::verb/1

words::base_form/3 - fetches the base form of a word or the word itself if there is no base form.

Usage
words::base_form(Word, PartOfSpeech, BaseForm)
Description
Word is an atom or string that is used to query the WordNet database for the base forms within the specified PartOfSpeech. For example, the verb base form of "eating" is "eat".
PartOfSpeech should be one of the atoms adjective, adverb, noun, or verb, indicating the part of speech to search for.
BaseForm is unified with a string representing the first base form of Word. If Word does not have any base forms, but it is a valid word according to PartOfSpeech, then BaseForm is unified with Word. Fails if Word is not a valid word according to PartOfSpeech.
Errors
Examples
 words::base_form("eating", verb, Result)     succeeds
 words::base_form("eating", verb, "eat")      succeeds
 words::base_form("eat", verb, "eat")         succeeds
 words::base_form(eat, verb, eat)             succeeds
 words::base_form(bobby, verb, Result)        fails
See Also
words::base_forms/3, words::description/5, words::overview/2, words::search/5

words::base_forms/3 - fetches the base forms of a word.

Usage
words::base_forms(Word, PartOfSpeech, Result)
Description
Word is an atom or string that is used to query the WordNet database for the base forms within the specified PartOfSpeech. For example, the verb base form of "eating" is "eat".
PartOfSpeech should be one of the atoms adjective, adverb, noun, or verb, indicating the part of speech to search for.
Result is unified with a list of base form strings. The predicate fails if Word does not have any base forms or the list does not unify with Result.
Errors
Examples
 words::base_forms("eating", verb, Result)
 words::base_forms("eating", verb, ["eat"])
See Also
words::base_form/3, words::description/5, words::overview/2, words::search/5

words::description/5 - fetches the description of a word.

Usage
words::description(Word, PartOfSpeech, Query, Sense, Result)
Description
Word is an atom or string that is used to query the WordNet database for a descriptive text about the word. The Word will be converted to lower case, with spaces replaced with underscores, before searching.
PartOfSpeech should be one of the atoms adjective, adverb, noun, or verb, indicating the part of speech to search for.
Query should be an atom representing a valid WordNet query type.
Sense should be an integer greater than or equal to 1 that indicates which sense of the word should be queried for. If Sense is the atom allsenses, then all senses will be queried.
Result is unified with the description, represented as a string. If a description that matches Word, PartOfSpeech, Query, and Sense cannot be found in the WordNet database, then words::description/5 fails.
Errors
Examples
 words::description("hand", noun, overview, allsenses, Description);
 stdout::writeln(Description);

 The noun hand has 14 senses (first 8 from tagged texts)

 1. (215) hand, manus, mitt, paw -- (the (prehensile) extremity of
 the superior limb; "he had the hands of a surgeon"; "he extended
 his mitt")
 2. (5) hired hand, hand, hired man -- (a hired laborer on a farm
 or ranch; "the hired hand fixed the railing"; "a ranch hand")
 ...
See Also
words::base_forms/3, words::overview/2, words::search/5

words::noun/1 - tests a word to determine if it is a noun.

Usage
words::noun(Word)
Description
If Word is an atom or string whose name or the name of one of its base forms is registered in the WordNet database as a noun, then succeed. Fail otherwise. The Word will be converted to lower case, with spaces replaced with underscores, before testing.
There are also arity-2 and arity-3 versions of this predicate in the module that can be used in definite clause grammar rules to recognize nouns:
 noun_phrase --> det, words::adjective, words::noun.
 noun_phrase(np(D,adj(A),n(N))) --> det(D), words::adjective(A), words::noun(N).
In the second example above, N will be unified with the noun to assist with building a parse tree for the sentence.
Examples
 words::noun(harpsichord)         succeeds
 words::noun("Harpsichord")       succeeds
 words::noun("Bobby Fischer")     succeeds
 words::noun(fitfully)            fails
 words::noun(X)                   fails
 words::noun(15)                  fails
See Also
words::adjective/1, words::adverb/1, words::base_form/3, words::verb/1

words::overview/2 - fetches an overview description of the word.

Usage
words::overview(Word, Result)
Description
Word is an atom or string that is used to query the WordNet database for a descriptive text about the word. The Word will be converted to lower case, with spaces replaced with underscores, before searching. Result is unified with the description, represented as a string.
This predicate is a wrapper around description/5 that produces a human-readable dictionary-like entry in Result for all parts of speech and senses that Word is a member of.
Errors
Examples
 words::overview("hand", Description);
 stdout::writeln(Description);

 The noun hand has 14 senses (first 8 from tagged texts)

 1. (215) hand, manus, mitt, paw -- (the (prehensile) extremity of
 the superior limb; "he had the hands of a surgeon"; "he extended
 his mitt")
 2. (5) hired hand, hand, hired man -- (a hired laborer on a farm
 or ranch; "the hired hand fixed the railing"; "a ranch hand")
 ...
 The verb hand has 2 senses (first 1 from tagged texts)

 1. (25) pass, hand, reach, pass on, turn over, give -- (place into
 the hands or custody of; "hand me the spoon, please"; "Turn the
 files over to me, please"; "He turned over the prisoner to
 his lawyers")
 2. hand -- (guide or conduct or usher somewhere; "hand the
 elderly lady into the taxi")
See Also
words::base_forms/3, words::description/5, words::search/5

words::search/5 - searches the database for other words related to a search word.

Usage
words::search(Word, PartOfSpeech, Query, Sense, Result)
Description
Word is an atom or string that is used to query the WordNet database for a descriptive text about the word. The Word will be converted to lower case, with spaces replaced with underscores, before searching.
PartOfSpeech should be one of the atoms adjective, adverb, noun, or verb, indicating the part of speech to search for.
Query should be an atom representing a valid WordNet query type. Query can also be the special atom synset which fetches the members of the WordNet synonym set for Sense that contains Word.
Sense should be an integer greater than or equal to 1 that indicates which sense of the word should be queried for. If Sense is the atom allsenses, then all senses will be queried.
Result is unified with a list of strings, corresponding to the other words that are related to Word due to Query. If there are no words that match the query, then words::search/5 fails.
The Word itself may appear in the Result list, and each member of Result will be unique (no duplicates).
Errors
Examples
 words::search("walk", verb, antptr, allsenses, List);
 stdout::writeln(List);

 words::search("hand", noun, synset, 1, List2);
 stdout::writeln(List2);
See Also
words::base_forms/3, words::description/5, words::overview/2

words::verb/1 - tests a word to determine if it is a verb.

Usage
words::verb(Word)
Description
If Word is an atom or string whose name or the name of one of its base forms is registered in the WordNet database as a verb, then succeed. Fail otherwise. The Word will be converted to lower case, with spaces replaced with underscores, before testing.
There are also arity-2 and arity-3 versions of this predicate in the module that can be used in definite clause grammar rules to recognize verbs:
 verb_phrase --> words::adverb, words::verb, noun_phrase.
 verb_phrase(vp(adv(A),v(V),NP)) --> words::adverb(A), words::verb(V), noun_phrase(NP).
In the second example above, V will be unified with the verb to assist with building a parse tree for the sentence.
Examples
 words::verb(annoy)               succeeds
 words::verb("Annoy")             succeeds
 words::verb("hand_out")          succeeds
 words::verb(harpsichord)         fails
 words::verb(X)                   fails
 words::verb(15)                  fails
See Also
words::adjective/1, words::adverb/1, words::base_form/3, words::noun/1

Generated on 26 May 2011 for plang by  doxygen 1.6.1