PSL 2.2.1 Release

We are happy to announce the release of PSL version 2.2.1! We have made great improvements to PSL in the areas of usability and performance. In this changelog, you will find a list of the major changes in 2.2.1 as well as information on migrating from 2.1.0.

For those of you that learn better by example, check out the PSL examples repository. The master branch is always compatible with the most resent stable release, while the develop branch stays up-to-date with our development work.

Infrastructure
PSL Interfaces
CLI Improvements
Performance
Miscellanea

Infrastructure

The 2.2.1 release comes with a few changes to the PSL development cycle and artifact deployment.

Artifacts Moved to Maven Central

Starting with this release, PSL releases and artifacts will now be hosted through Maven Central. Maven Central is the default remote repository for Maven. With PSL deployed there, there is no longer a need to use the old Maven repository at: http://maven.linqs.org/maven/repositories/psl-releases/. Old builds will continue to be hosted at the old repository for the foreseeable future. To find the new builds you can go to the org.linqs group on Maven Central. The development versions are labeled as CANARY releases.

Because PSL is now hosted on Maven central, you can now remove the maven.linqs.org repository from your Maven configuration. In most cases, this means that you can remove the following section from your pom.xml files:

<repositories>
    <repository>
        <releases>
            <enabled>true</enabled>
            <updatePolicy>daily</updatePolicy>
            <checksumPolicy>fail</checksumPolicy>
        </releases>
        <id>psl-releases</id>
        <name>PSL Releases</name>
        <url>http://maven.linqs.org/maven/repositories/psl-releases/</url>
        <layout>default</layout>
    </repository>
</repositories>

Wiki Hosted on psl.linqs.org

The PSL Wiki https://github.com/linqs/psl/wiki and PSL Development Wiki https://github.com/eriq-augustine/psl/wiki have been moved to psl.linqs.org/wiki. All stable and development releases going forward will have a version of the wiki available as either live webpages (for newer releases) or downloadable archives (for older releases).

API Reference Hosted on psl.linqs.org

Along with the Wiki, the API reference will now also be hosted at psl.linqs.org/api. All stable and development releases going forward will have a version of the API reference available as either live webpages (for newer releases) or downloadable archives (for older releases).

Development Repo Now linqs/psl

PSL development has been moved from eriq-augustine/psl to the canonical PSL repository: linqs/psl. Any new pull requests should be submitted there.

Issues Moved to linqs/psl

Along with pull requests, issues have been moved to the canonical PSL repository: linqs/psl/issues. All old issues (along the with their comments and labels) have been migrated to this repository and any new issues should be submitted there.

PSL Interfaces

The PSL 2.2.1 release comes with two new interfaces, and one deprecation.

New Python Interface

Commit: a38cffe5

PSL 2.2.1 comes with the first official release of the PSL Python interface. This package is called pslpython and is available on PyPi. Therefore, it can be installed via pip:

pip install pslpython

The source for the interface is available in the main PSL repository.

Fully implemented examples can be found in the psl-examples repository. Below is a simplified example of the Python interface:

import os

from pslpython.model import Model
from pslpython.partition import Partition
from pslpython.predicate import Predicate
from pslpython.rule import Rule

model = Model('sample-model')

# Add predicates.
predicate = Predicate('Foo', closed = True, size = 2)
model.add_predicate(predicate)

predicate = Predicate('Bar', closed = False, size = 2)
model.add_predicate(predicate)

# Add rules.
model.add_rule(Rule('0.20: Foo(A, B) -> Bar(A, B) ^2'))
model.add_rule(Rule('0.01: !Bar(A, B) ^2'))

# Load data.
path = os.path.join('data', 'foo_obs.txt')
model.get_predicate('Foo').add_data_file(Partition.OBSERVATIONS, path)

path = os.path.join('data', 'bar_targets.txt')
model.get_predicate('Bar').add_data_file(Partition.TARGETS, path)

# Run inference.
results = model.infer()

# Write out results.
out_dir = 'inferred-predicates'
os.makedirs(out_dir, exist_ok = True)

out_path = os.path.join(out_dir, "bar.txt")
results[model.get_predicate('Bar')].to_csv(out_path, sep = "\t", header = False, index = False)

In addition to creating models in Python, you can use the PSL python package to invoke the PSL CLI interface directory. Instead of invoking the PSL jar:

java -jar psl.jar --model test.psl --data test.data

You can use the pslpython package already installed via pip:

python -m pslpython.cli --model test.psl --data test.data

Additionally, any arguments supported by the CLI interface can be passed to pslpython.cli as well:

python -m pslpython.cli --model test.psl --data test.data --postgres myDB -D log4j.threshold=DEBUG

New Java Interface

Commit: 7e305dfe

PSL 2.2.1 comes with a new Java interface. This interface works much like the Groovy interface with some slight differences. Fully implemented examples of the Java interface can be found in the psl-examples repository.

Instead of using org.linqs.psl.groovy.PSLModel, org.linqs.psl.java.PSLModel is used. The methods for the PSLModel class are now explicitly named, instead of being overloads of the same add() method. For example, instead of model.add predicate: "Foo", ..., you will use model.addPredicate("Foo", ...). The full API for the PSLModel class can be found here.

The Groovy interface allows rules to be specified as part of the Groovy syntax. However, rules in the Java interface must be specified as a String.

To access predicates in the Java interface, you can no longer just reference them by name with no context. Now, you can ask the model for a predicate by name. In Groovy:

Inserter inserter = dataStore.getInserter(Foo, obsPartition);

In Java:

Inserter inserter = dataStore.getInserter(model.getStandardPredicate("Foo"), obsPartition);

Groovy Interface Deprecated

Commit: 476dfd1e

With the addition of the new Java interface, the Groovy interface has officially been deprecated. It will be removed from the next release of PSL. Dropping support for Groovy will allow us to support a wider range of Java versions (instead of just 7 and 8).

CLI Improvements

Functional Predicates

Commit: ac2f9a30

Functional predicates are now supported in the CLI. To use these, a function key needs to be specified in the predicate definition, e.g:

  Knows/2: open
  Likes/2: closed
  Lived/2: closed
  SimName:
    - function: org.foo.bar.SimNameExternalFunction

In this case, the functional predicate is SimName and it is implemented by the SimNameExternalFunction class. SimName can then be used in a rule like:

1.0: SimName(P1, P2) & Lived(P1, L) & Lived(P2, L) & (P1 != P2) -> Knows(P1, P2) ^2

Multiple Evaluators

Commit: d63a8f7e

The CLI can now use multiple evaluators in one run. This can be done by passing by multiple evaluators to the --eval argument:

java -jar psl.jar --model test.psl --data test.data --eval DiscreteEvaluator ContinuousEvaluator

Output Ground Rules

Commits: 62fbd2cc , b901e937

The CLI accepts two new arguments that can be used to see the ground rules being processed.
-gr/--groundrules can be used to output the ground rules before inference is run. This will show the ground rules as early as possible. While --satisfaction will output ground rules along with their satisfaction value after inference is run.

If you are concerned about an issue with your rules/data and want to see the ground rules created, then --groundrules is the option you should use. If you are curious about the value that different rules are taking, then --satisfaction is the option you should use.

Either option can be specified without arguments, and the results will be output to stdout. You can also specify an optional path with either argument and the results will be output there.

Skip Database Commit

Commit: 3ced4b20

If you do not need the results of inference saved into the database, then you can save time by skipping the writing of results to the database using the --skipAtomCommit argument.

Remove Extra Quoting

Commit: cbe7fd8a

Constants are no longer quoted in the inferred predicate output produced by the CLI. This may break existing scripts that parse this output, but now files output by the CLI will match the format consumed by the CLI (by default).

run.sh Takes Arguments

The run.sh scripts in CLI implementations for psl-examples now takes arguments that are passed directly to the CLI. Specifying these arguments is equivalent to adding these arguments to the ADDITIONAL_PSL_OPTIONS constant. For example:

./run.sh -D log4j.threshold=DEBUG --postgres psl

Performance

Reduced Memory

A lot of effort was put into reducing the memory burden of PSL for the 2.2.1 release. Both in terms of allocations and total persisted memory. We have observed the total memory consumption in PSL drop between 17.5% and 45.7% (depending on the exact model and data). Below you can see an example of the same model and data in PSL 2.1.0 (left) vs PSL 2.2.1 (right). The blue portion of the graph is the actual memory being used.

Smaller Types

Commit: 9a34ce23

Where possible, standard types have been replaced by their shorter sibling (int replaced with short, double replaced with float, etc). This allows us to trade unused precision for memory and speed (depending on the system architecture).

Matrix Operations

Commit: 8f034fa8

We have added the FloatMatrix class to handle low-level matrix operations. This classes uses the Netlib library to call into the low-level BLAS and LAPACK libraries. This allows us to easily perform efficient matrix operations.

Streaming Grounding Results

Commit: 8f4de846

Grounding results can now be streamed from the database using the QueryResultIterable class. This allows the user to iterate through the grounding results without needing to keep them all in memory at the same time.

Runtime Statistics

Commit: df58a390

A new class, RuntimeStats, has been introduced to keep track of JVM statistics throughout the lifetime of a PSL program. Setting the configuration option runtimestats.collect to true will enable the statistics collection. These collected stats are currently output to the INFO log level when the JVM terminates.

Currently, memory information is automatically collected. In addition, the user can call the static logDiskRead() and logDiskWrite() methods to keep track of I/O operations.

Using the statistics looks like:

linqs@comp:~/code/psl-examples/simple-acquaintances/cli$ ./run.sh -D runtimestats.collect=true
Running PSL Inference
  [main] INFO  org.linqs.psl.cli.Launcher  - Running PSL CLI Version 2.2.1-a573763
... < Omitted in the changelog for brevity > ...
[main] INFO  org.linqs.psl.application.inference.InferenceApplication  - Inference complete.
[main] INFO  org.linqs.psl.application.inference.InferenceApplication  - Writing results to Database.
[main] INFO  org.linqs.psl.application.inference.InferenceApplication  - Results committed to database.
[main] INFO  org.linqs.psl.cli.Launcher  - Inference Complete
[main] INFO  org.linqs.psl.cli.Launcher  - Starting evaluation with class: org.linqs.psl.evaluation.statistics.DiscreteEvaluator.
[main] INFO  org.linqs.psl.cli.Launcher  - Evaluation results for KNOWS -- Accuracy: 0.915254, F1: 0.933333, Positive Class Precision: 0.945946, Positive Class Recall: 0.921053, Negative Class Precision: 0.863636, Negative Class Recall: 0.904762
[main] INFO  org.linqs.psl.cli.Launcher  - Evaluation complete.
[Thread-1] INFO  org.linqs.psl.util.RuntimeStats  - Total Memory (bytes) -- Min:    504889344, Max:    504889344, Mean:    504889344, Count:            6
[Thread-1] INFO  org.linqs.psl.util.RuntimeStats  - Free Memory (bytes)  -- Min:    403775464, Max:    494319512, Mean:    437039418, Count:            6
[Thread-1] INFO  org.linqs.psl.util.RuntimeStats  - Used Memory (bytes)  -- Min:     10569832, Max:    101113880, Mean:     67849925, Count:            6
[Thread-1] INFO  org.linqs.psl.util.RuntimeStats  - Max Memory (bytes)   -- Min:   7475298304, Max:   7475298304, Mean:   7475298304, Count:            6
[Thread-1] INFO  org.linqs.psl.util.RuntimeStats  - IO Reads (bytes)     -- Min:            0, Max:            0, Mean:            0, Count:            0, Total:            0
[Thread-1] INFO  org.linqs.psl.util.RuntimeStats  - IO Writes (bytes)    -- Min:            0, Max:            0, Mean:            0, Count:            0, Total:            0
linqs@comp:~/code/psl-examples/simple-acquaintances/cli$

Miscellanea

Simple Class Names

Commit: 00b60321

In any case where a classname is used as a configuration option or argument, you can now specify the classes shortname instead of its fully-qualified name. For example, instead of:

java -jar psl.jar --model test.psl --data test.data --eval org.linqs.psl.evaluation.statistics.DiscreteEvaluator

You can do:

java -jar psl.jar --model test.psl --data test.data --eval DiscreteEvaluator

New Weight Learning Method: GPP

With the release of PSL 2.2.1, we are adding a new weight learning method called Gaussian Process Prior (GPP), which is based on Bayesian optimization.

GPP is a search-based weight learning method that works by approximating a user defined metric and evaluating the PSL model on a number of strategically chosen weights. Although GPP tends to work better with a smoother metric, it can work well with any metric. The best metric to use depends on the specific problem being modeled. A major benefit of GPP, and all search-based methods, is that the metric being optimized can be the same as the metric that results are evaluated on. In contrast, likelihood-based methods (e.g. Maximum Likelihood MPE) maximize the likelihood, which may not be strongly correlated with the desired evaluation metrics. For further details about GPP, please see the paper: BOWL Bayesian Optimization for Weight Learning in Probabilistic Soft Logic.

To use GPP in PSL, you need to choose the right weight learning class and an evaluator that GPP will use to evaluate weight configurations. The class for GPP is org.linqs.psl.application.learning.weight.bayesian.GaussianProcessPrior. You can choose an evaulator by setting the weightlearning.evaluator configuration option to any Evaluator.

For example, you can use the following command to use GPP in the CLI:

./run.sh --learn GaussianProcessPrior -D weightlearning.evaluator=DiscreteEvaluator

GPP has four main configuration options that the user should be aware of:

weightlearning.evaluator
Domain: Any Evaluator
Default: ContinuousEvaluator
The user defined metric function that GPP uses to evaluate and optimize weights. The best evaluator to use depends on your specific problem.
gpp.maxiterations
Domain: Integer in (0, ∞)
Default: 25
The maximum number of times that BOWL will conduct evaluations before choosing the best set of weights. 100 iterations is typically enough for even difficult domains. Keep in mind that the time taken to perform full learning in BOWL grows quadratically with the number of iterations. Therefore, if you choose a large number (such as 500), learning might take days to finish.
gpp.explore
Domain: Float in (0.0, ∞)
Default: 2.0
This determines how the weights will be chosen for evaluation. A lower value implies that the weights chosen for evaluation will be clustered around one region whereas a higher value will lead to exploration of weights that are as distinct as possible.
gppker.reldep
Domain: Float in (0.0, ∞)
Default: 1.0
The relative dependence value given in GPP. The exploration space increases as the number of rules in the model increase. A smaller value would imply that weights are very distinct and related, hence requiring fewer iterations.

Made Data Loading Errors More Clear

Commit: 1e889f49

File paths were added to several data loading errors to make them more clear.

Removed Date ConstantType

Commit: df5e1b64

The ConstantType.Date type have been removed as predicate argument type. Instead, users should use ConstantType.String with the date represented as a string (we suggest ISO 8601), or users should use ConstantType.Integer and represent the date as an Unix/Epoch time.