Introduction to Nettlesome
Nettlesome is a Python library that lets you use readable English phrases as semantic tags for your data. Once you’ve created tags, Nettlesome can automate the process of comparing the tags to determine whether they have the same meaning, whether one tag implies another, whether one tag contradicts another, or whether one tag is consistent with another.
This tutorial will show you how to use some of Nettlesome’s basic
classes: Predicate,
Statement, and Comparison.
“Predicates” as descriptors
A Predicate should contain an
English-language phrase in the past tense. The use of the past
tense reflects Nettlesome’s original use case of legal
analysis, which is usually backward-looking, determining the legal
effect of past acts or past conditions.
To create a useful Predicate, you might
remember from grade school that
in a simple sentence the “subject” and any “objects” are typically
nouns, while the “predicate” is a verb phrase that describes an action
or relationship of those nouns. In Nettlesome you can consider marking the
subject and object nouns as “terms” that should be represented by placeholders
in the Predicate's template string. Generally,
concepts that are part of your data model are good choices to designate
as “terms”, while other nouns that aren’t relevant enough to be part
of your data model don’t need to be terms.
For instance, this example might be a good Predicate
for a data model about determining whether an individual is an owner or
employee of a company:
>>> from nettlesome import Predicate
>>> account_for_company = Predicate(content="$applicant opened a bank account for $company")
Because $applicant and $company are marked as placeholders for
Terms, you’ll be able to add more data about
those entities later. But
notice that the words “bank account” haven’t been replaced by a placeholder.
That means you won’t be able to link this Predicate
to a Term representing the bank account. That should be
okay as long as your data model
doesn’t need to include specific details about bank accounts.
Predicates also have a truth attribute that can be used to establish
that one Predicate contradicts another.
>>> no_account_for_company = Predicate(
>>> "$applicant opened a bank account for $company",
>>> truth=False)
>>> str(no_account_for_company)
'it was false that $applicant opened a bank account for $company'
The truth attribute will become more significant as we build up to
create more complex objects.
Formatting predicates
Placeholders for terms in a Predicate should
be marked up using
the placeholder syntax for a string.Template, which is a part of the
Python standard library.
The placeholders you choose always need to $start_with_a_dollar_sign
and they need to be valid Python identifiers, which means they can’t
contain spaces. If a placeholder needs to be adjacent to a
non-whitespace character, you also need to
${wrap_it_in_curly_braces}.
Don’t use capitalization or end punctuation to signal the beginning or
end of the phrase in a Predicate,
because the phrase may be used in a
context where it’s only part of a longer sentence.
The use of different placeholders doesn’t
cause Predicates to be considered to have
different meanings. The example below demonstrates this using
the means() method, which tests
whether two Nettlesome objects have the same meaning.
>>> account_for_partnership = Predicate(content="$applicant opened a bank account for $partnership")
>>> account_for_company.means(account_for_partnership)
True
If you need to mention the same term more than once
in a Predicate, use
the same placeholder for that term each time. If you later create a
Statement object using the
same Predicate, you will only include each
unique Term once, in the order they first appear.
In this example, a Predicate's template has
two placeholders referring to the identical Term.
Even though the rest of the text is the same, the
reuse of the same Term means that
the Predicate has a different meaning.
>>> account_for_self = Predicate(content="$applicant opened a bank account for $applicant")
>>> account_for_self.means(account_for_company)
False
Linking predicates to entities
Basically, a Statement is
a Predicate plus
the Terms that need to be included to
make the Statement a complete phrase.
>>> from nettlesome import Statement, Entity
>>> statement = Statement(
>>> predicate=account_for_company,
>>> terms=[Entity(name="Sarah"), Entity(name="Acme Corporation")])
>>> str(statement)
'the statement that <Sarah> opened a bank account for <Acme Corporation>'
An Entity is a Term
representing a person or thing. If you’re lucky
enough to be able to run effective Named Entity Recognition techniques
on your dataset, you may already have good candidates for the
Entity objects that should be included in
your Statements. The data
model of an Entity in Nettlesome includes
just a name attribute, an attribute indicating whether
the Entity should be considered
generic, and a plural attribute mainly used to determine whether
the word “was” after the Entity should be
replaced with “were”.
>>> not_at_school = Predicate(content="$group were at school", truth=False)
>>> plural_statement = Statement(not_at_school, terms=[Entity(name="the students", plural=True)])
>>> str(plural_statement)
'the statement it was false that <the students> were at school'
>>> singular_statement = Statement(not_at_school, terms=[Entity(name="Lee", plural=False)])
>>> str(singular_statement)
'the statement it was false that <Lee> was at school'
Generic Terms
The generic attribute is more subtle than the plural attribute.
An Entity should be marked as generic if
it’s really being used as a
stand-in for a broader category. For instance, in singular_statement
above, the fact that <Lee> is generic indicates that
the Statement
isn’t really about a specific incident when Lee was not at school.
Instead, it’s more about the concept of someone not being at school. In
Nettlesome, when angle brackets appear around the string representation
of an object, that’s an indication that the object is generic.
If two Statements have different
generic Entities but they’re otherwise the
same, they’re still considered to have the same meaning as one another.
That’s the case even if one of the Entities is
plural while the other is singular.
>>> plural_statement.means(singular_statement)
True
However, sometimes you need to label an Entity as being somehow sui
generis, so that Statements about that Entity aren’t really applicable
to other, generic Entities. In that case, you can set the Entity’s
generic attribute to False and it’ll no longer be found to have the
same meaning as generic Entities.
>>> harry_statement = Statement(not_at_school, terms=Entity(name="Harry Potter", generic=False))
>>> harry_statement.means(singular_statement)
False
By default, Entities are generic and Statements are not generic. Both of these defaults can be changed when you create instances of the respective classes.
Comparing quantitative statements
The Comparison class extends the concept
of a Predicate. A Comparison
still contains a truth value and a template string, but that template
should be used to identify a quantity that will be compared to an
expression using a sign such as an equal sign or a greater-than sign.
This expression must be a constant: either an integer, a floating point
number, or a physical Quantity expressed in units that can be parsed
using the pint library.
>>> from nettlesome import Comparison
>>> weight_in_pounds = Comparison(
>>> "the weight of ${driver}'s vehicle was",
>>> sign=">",
>>> expression="26000 pounds")
>>> pounds_statement = Statement(weight_in_pounds, terms=Entity(name="Alice"))
>>> str(pounds_statement)
"the statement that the weight of <Alice>'s vehicle was greater than 26000 pound"
Statements including Comparisons
will handle unit conversions when
applying operations like implies()
or contradicts().
>>> weight_in_kilos = Comparison(
>>> "the weight of ${driver}'s vehicle was",
>>> sign="<=",
>>> expression="3000 kilograms")
>>> kilos_statement = Statement(weight_in_kilos, terms=Entity(name="Alice"))
>>>> str(kilos_statement)
"the statement that the weight of <Alice>'s vehicle was no more than 3000 kilogram"
>>> pounds_statement.contradicts(kilos_statement)
True
Formatting comparisons
To encourage consistent phrasing, the template string in every
Comparison object must end with the word “was”.
If you phrase a Comparison with an inequality sign using
truth=False, Nettlesome will silently modify your statement so it
can have truth=True with a different sign. In this example, the
user’s input indicates that it’s false that the weight of marijuana
possessed by a defendant was an ounce or more. Nettlesome interprets
this to mean it’s true that the weight was less than one ounce.
>>> drug_comparison_with_upper_bound = Comparison(
>>> "the weight of marijuana that $defendant possessed was",
>>> sign=">=",
>>> expression="1 ounce",
>>> truth=False)
>>> str(drug_comparison_with_upper_bound)
'that the weight of marijuana that $defendant possessed was less than 1 ounce'
An expression can also be a Python datetime.date.
>>> license_date = Comparison(
>>> "the date $dentist became a licensed dentist was",
>>> sign="<",
>>> expression=date(1990, 1, 1))
>>> str(license_date)
'that the date $dentist became a licensed dentist was less than 1990-01-01'
When the number needed for a Comparison is neither
a date nor a physical quantity that
can be described with physical units like “pounds” or “meters”, you should
phrase the text in the template string to explain what the number
describes. The template string will still need to end with the word
“was”. The value of the expression parameter should be an integer or a
floating point number, not a string to be parsed.
>>> three_children = Comparison(
>>> "the number of children in ${taxpayer}'s household was",
>>> sign="=",
>>> expression=3)
>>> str(three_children)
"that the number of children in ${taxpayer}'s household was exactly equal to 3"
Comparing groups of statements
If you pass a list of Statements to
the FactorGroup constructor, you can then check to see whether
those Statements, taken as a group, implies another Statement or group of Statements.
Here, the use of placeholders that are identical except for a digit on
the end indicates to Nettlesome that the positions of the Entities in those places should
be considered interchangeable. (In this example, if site1 is a
certain distance away from site2, then site2 must also be the
same distance away from site1.)
>>> from nettlesome import FactorGroup
>>> more_than_100_yards = Comparison(
>>> "the distance between $site1 and $site2 was",
>>> sign=">",
>>> expression="100 yards")
>>> less_than_1_mile = Comparison(
>>> "the distance between $site1 and $site2 was",
>>> sign="<",
>>> expression="1 mile")
>>> protest_facts = FactorGroup(
>>> [Statement(
>>> more_than_100_yards,
>>> terms=[Entity(name="the political convention"), Entity(name="the police cordon")]),
>>> Statement(
>>> less_than_1_mile,
>>> terms=[Entity(name="the police cordon"), Entity(name="the political convention")])])
>>> str(protest_facts)
"FactorGroup(['the statement that the distance between <the political convention> and <the police cordon> was greater than 100 yard', 'the statement that the distance between <the police cordon> and <the political convention> was less than 1 mile'])"
>>> more_than_50_meters = Comparison(
>>> "the distance between $site1 and $site2 was",
>>> sign=">",
>>> expression="50 meters")
>>> less_than_2_km = Comparison(
>>> "the distance between $site1 and $site2 was",
>>> sign="<=",
>>> expression="2 km")
>>> speech_zone_facts = FactorGroup(
>>> [Statement(
>>> more_than_50_meters,
>>> terms=[Entity(name="the free speech zone"), Entity(name="the courthouse")]),
>>> Statement(
>>> less_than_2_km,
>>> terms=[Entity(name="the free speech zone"), Entity(name="the courthouse")])])
>>> protest_facts.implies(speech_zone_facts)
True