PSL 2.2.1 Release
We are happy to announce the release of PSL version
We have made great improvements to PSL in the areas of usability and performance.
In this changelog, you will find a list of the major changes in
2.2.1 as well as information on migrating from
For those of you that learn better by example, check out the PSL examples repository.
master branch is always compatible with the most resent stable release,
develop branch stays up-to-date with our development work.
- PSL Interfaces
- CLI Improvements
The 2.2.1 release comes with a few changes to the PSL development cycle and artifact deployment.
Artifacts Moved to Maven Central
Starting with this release, PSL releases and artifacts will now be hosted through Maven Central.
Maven Central is the default remote repository for Maven.
With PSL deployed there, there is no longer a need to use the old Maven repository at:
Old builds will continue to be hosted at the old repository for the foreseeable future.
To find the new builds you can go to the org.linqs group on Maven Central.
The development versions are labeled as
Because PSL is now hosted on Maven central,
you can now remove the maven.linqs.org repository from your Maven configuration.
In most cases, this means that you can remove the following section from your
<repositories> <repository> <releases> <enabled>true</enabled> <updatePolicy>daily</updatePolicy> <checksumPolicy>fail</checksumPolicy> </releases> <id>psl-releases</id> <name>PSL Releases</name> <url>http://maven.linqs.org/maven/repositories/psl-releases/</url> <layout>default</layout> </repository> </repositories>
Wiki Hosted on psl.linqs.org
The PSL Wiki https://github.com/linqs/psl/wiki and PSL Development Wiki https://github.com/eriq-augustine/psl/wiki have been moved to psl.linqs.org/wiki. All stable and development releases going forward will have a version of the wiki available as either live webpages (for newer releases) or downloadable archives (for older releases).
API Reference Hosted on psl.linqs.org
Along with the Wiki, the API reference will now also be hosted at psl.linqs.org/api. All stable and development releases going forward will have a version of the API reference available as either live webpages (for newer releases) or downloadable archives (for older releases).
Development Repo Now linqs/psl
PSL development has been moved from eriq-augustine/psl to the canonical PSL repository: linqs/psl. Any new pull requests should be submitted there.
Issues Moved to linqs/psl
Along with pull requests, issues have been moved to the canonical PSL repository: linqs/psl/issues. All old issues (along the with their comments and labels) have been migrated to this repository and any new issues should be submitted there.
The PSL 2.2.1 release comes with two new interfaces, and one deprecation.
New Python Interface
PSL 2.2.1 comes with the first official release of the PSL Python interface.
This package is called
pslpython and is available on PyPi.
Therefore, it can be installed via pip:
pip install pslpython
The source for the interface is available in the main PSL repository.
Fully implemented examples can be found in the psl-examples repository. Below is a simplified example of the Python interface:
import os from pslpython.model import Model from pslpython.partition import Partition from pslpython.predicate import Predicate from pslpython.rule import Rule model = Model('sample-model') # Add predicates. predicate = Predicate('Foo', closed = True, size = 2) model.add_predicate(predicate) predicate = Predicate('Bar', closed = False, size = 2) model.add_predicate(predicate) # Add rules. model.add_rule(Rule('0.20: Foo(A, B) -> Bar(A, B) ^2')) model.add_rule(Rule('0.01: !Bar(A, B) ^2')) # Load data. path = os.path.join('data', 'foo_obs.txt') model.get_predicate('Foo').add_data_file(Partition.OBSERVATIONS, path) path = os.path.join('data', 'bar_targets.txt') model.get_predicate('Bar').add_data_file(Partition.TARGETS, path) # Run inference. results = model.infer() # Write out results. out_dir = 'inferred-predicates' os.makedirs(out_dir, exist_ok = True) out_path = os.path.join(out_dir, "bar.txt") results[model.get_predicate('Bar')].to_csv(out_path, sep = "\t", header = False, index = False)
In addition to creating models in Python, you can use the PSL python package to invoke the PSL CLI interface directory. Instead of invoking the PSL jar:
java -jar psl.jar --model test.psl --data test.data
You can use the
pslpython package already installed via pip:
python -m pslpython.cli --model test.psl --data test.data
Additionally, any arguments supported by the CLI interface can be passed to
pslpython.cli as well:
python -m pslpython.cli --model test.psl --data test.data --postgres myDB -D log4j.threshold=DEBUG
New Java Interface
PSL 2.2.1 comes with a new Java interface. This interface works much like the Groovy interface with some slight differences. Fully implemented examples of the Java interface can be found in the psl-examples repository.
Instead of using
org.linqs.psl.java.PSLModel is used.
The methods for the PSLModel class are now explicitly named, instead of being overloads of the same
For example, instead of
model.add predicate: "Foo", ..., you will use
The full API for the PSLModel class can be found here.
The Groovy interface allows rules to be specified as part of the Groovy syntax. However, rules in the Java interface must be specified as a String.
To access predicates in the Java interface, you can no longer just reference them by name with no context. Now, you can ask the model for a predicate by name. In Groovy:
Inserter inserter = dataStore.getInserter(Foo, obsPartition);
Inserter inserter = dataStore.getInserter(model.getStandardPredicate("Foo"), obsPartition);
Groovy Interface Deprecated
With the addition of the new Java interface, the Groovy interface has officially been deprecated. It will be removed from the next release of PSL. Dropping support for Groovy will allow us to support a wider range of Java versions (instead of just 7 and 8).
Functional predicates are now supported in the CLI.
To use these, a
function key needs to be specified in the predicate definition, e.g:
Knows/2: open Likes/2: closed Lived/2: closed SimName: - function: org.foo.bar.SimNameExternalFunction
In this case, the functional predicate is
SimName and it is implemented by the
SimName can then be used in a rule like:
1.0: SimName(P1, P2) & Lived(P1, L) & Lived(P2, L) & (P1 != P2) -> Knows(P1, P2) ^2
The CLI can now use multiple evaluators in one run.
This can be done by passing by multiple evaluators to the
java -jar psl.jar --model test.psl --data test.data --eval DiscreteEvaluator ContinuousEvaluator
Output Ground Rules
The CLI accepts two new arguments that can be used to see the ground rules being processed.
--groundrules can be used to output the ground rules before inference is run.
This will show the ground rules as early as possible.
--satisfaction will output ground rules along with their satisfaction value after inference is run.
If you are concerned about an issue with your rules/data and want to see the ground rules created,
--groundrules is the option you should use.
If you are curious about the value that different rules are taking,
--satisfaction is the option you should use.
Either option can be specified without arguments, and the results will be output to stdout. You can also specify an optional path with either argument and the results will be output there.
Skip Database Commit
If you do not need the results of inference saved into the database,
then you can save time by skipping the writing of results to the database using the
Remove Extra Quoting
Constants are no longer quoted in the inferred predicate output produced by the CLI. This may break existing scripts that parse this output, but now files output by the CLI will match the format consumed by the CLI (by default).
run.sh Takes Arguments
run.sh scripts in CLI implementations for
psl-examples now takes arguments that are passed directly to the CLI.
Specifying these arguments is equivalent to adding these arguments to the
./run.sh -D log4j.threshold=DEBUG --postgres psl
A lot of effort was put into reducing the memory burden of PSL for the 2.2.1 release. Both in terms of allocations and total persisted memory. We have observed the total memory consumption in PSL drop between 17.5% and 45.7% (depending on the exact model and data). Below you can see an example of the same model and data in PSL 2.1.0 (left) vs PSL 2.2.1 (right). The blue portion of the graph is the actual memory being used.
Where possible, standard types have been replaced by their shorter sibling
int replaced with
double replaced with
This allows us to trade unused precision for memory and speed (depending on the system architecture).
We have added the
FloatMatrix class to handle low-level matrix operations.
This classes uses the Netlib library to call into the low-level BLAS and LAPACK libraries.
This allows us to easily perform efficient matrix operations.
Streaming Grounding Results
Grounding results can now be streamed from the database using the
This allows the user to iterate through the grounding results without needing to keep them all in memory at the same time.
A new class, RuntimeStats,
has been introduced to keep track of JVM statistics throughout the lifetime of a PSL program.
Setting the configuration option
true will enable the statistics collection.
These collected stats are currently output to the
INFO log level when the JVM terminates.
Currently, memory information is automatically collected.
In addition, the user can call the static
logDiskWrite() methods to keep track of I/O operations.
Using the statistics looks like:
linqs@comp:~/code/psl-examples/simple-acquaintances/cli$ ./run.sh -D runtimestats.collect=true Running PSL Inference 0 [main] INFO org.linqs.psl.cli.Launcher - Running PSL CLI Version 2.2.1-a573763 ... < Omitted in the changelog for brevity > ... 308 [main] INFO org.linqs.psl.application.inference.InferenceApplication - Inference complete. 1308 [main] INFO org.linqs.psl.application.inference.InferenceApplication - Writing results to Database. 1340 [main] INFO org.linqs.psl.application.inference.InferenceApplication - Results committed to database. 1340 [main] INFO org.linqs.psl.cli.Launcher - Inference Complete 1349 [main] INFO org.linqs.psl.cli.Launcher - Starting evaluation with class: org.linqs.psl.evaluation.statistics.DiscreteEvaluator. 1368 [main] INFO org.linqs.psl.cli.Launcher - Evaluation results for KNOWS -- Accuracy: 0.915254, F1: 0.933333, Positive Class Precision: 0.945946, Positive Class Recall: 0.921053, Negative Class Precision: 0.863636, Negative Class Recall: 0.904762 1368 [main] INFO org.linqs.psl.cli.Launcher - Evaluation complete. 1377 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - Total Memory (bytes) -- Min: 504889344, Max: 504889344, Mean: 504889344, Count: 6 1377 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - Free Memory (bytes) -- Min: 403775464, Max: 494319512, Mean: 437039418, Count: 6 1377 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - Used Memory (bytes) -- Min: 10569832, Max: 101113880, Mean: 67849925, Count: 6 1377 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - Max Memory (bytes) -- Min: 7475298304, Max: 7475298304, Mean: 7475298304, Count: 6 1378 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - IO Reads (bytes) -- Min: 0, Max: 0, Mean: 0, Count: 0, Total: 0 1378 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - IO Writes (bytes) -- Min: 0, Max: 0, Mean: 0, Count: 0, Total: 0 linqs@comp:~/code/psl-examples/simple-acquaintances/cli$
Simple Class Names
In any case where a classname is used as a configuration option or argument, you can now specify the classes shortname instead of its fully-qualified name. For example, instead of:
java -jar psl.jar --model test.psl --data test.data --eval org.linqs.psl.evaluation.statistics.DiscreteEvaluator
You can do:
java -jar psl.jar --model test.psl --data test.data --eval DiscreteEvaluator
New Weight Learning Method: GPP
With the release of PSL 2.2.1, we are adding a new weight learning method called Gaussian Process Prior (GPP), which is based on Bayesian optimization.
GPP is a search-based weight learning method that works by approximating a user defined metric and evaluating the PSL model on a number of strategically chosen weights. Although GPP tends to work better with a smoother metric, it can work well with any metric. The best metric to use depends on the specific problem being modeled. A major benefit of GPP, and all search-based methods, is that the metric being optimized can be the same as the metric that results are evaluated on. In contrast, likelihood-based methods (e.g. Maximum Likelihood MPE) maximize the likelihood, which may not be strongly correlated with the desired evaluation metrics. For further details about GPP, please see the paper: BOWL Bayesian Optimization for Weight Learning in Probabilistic Soft Logic.
To use GPP in PSL, you need to choose the right weight learning class and an evaluator that GPP will use to evaluate weight configurations.
The class for GPP is
You can choose an evaulator by setting the
weightlearning.evaluator configuration option to any Evaluator.
For example, you can use the following command to use GPP in the CLI:
./run.sh --learn GaussianProcessPrior -D weightlearning.evaluator=DiscreteEvaluator
GPP has four main configuration options that the user should be aware of:
Domain: Any Evaluator
The user defined metric function that GPP uses to evaluate and optimize weights. The best evaluator to use depends on your specific problem.
Domain: Integer in (0, ∞)
The maximum number of times that BOWL will conduct evaluations before choosing the best set of weights. 100 iterations is typically enough for even difficult domains. Keep in mind that the time taken to perform full learning in BOWL grows quadratically with the number of iterations. Therefore, if you choose a large number (such as 500), learning might take days to finish.
Domain: Float in (0.0, ∞)
This determines how the weights will be chosen for evaluation. A lower value implies that the weights chosen for evaluation will be clustered around one region whereas a higher value will lead to exploration of weights that are as distinct as possible.
Domain: Float in (0.0, ∞)
The relative dependence value given in GPP. The exploration space increases as the number of rules in the model increase. A smaller value would imply that weights are very distinct and related, hence requiring fewer iterations.
Made Data Loading Errors More Clear
File paths were added to several data loading errors to make them more clear.
Removed Date ConstantType
ConstantType.Date type have been removed as predicate argument type.
Instead, users should use
ConstantType.String with the date represented as a string (we suggest ISO 8601),
or users should use
ConstantType.Integer and represent the date as an Unix/Epoch time.