PSL 2.2.1 Release
We are happy to announce the release of PSL version 2.2.1
!
We have made great improvements to PSL in the areas of usability and performance.
In this changelog, you will find a list of the major changes in 2.2.1
as well as information on migrating from 2.1.0
.
For those of you that learn better by example, check out the PSL examples repository.
The master
branch is always compatible with the most resent stable release,
while the develop
branch stays up-to-date with our development work.
Infrastructure
The 2.2.1 release comes with a few changes to the PSL development cycle and artifact deployment.
Artifacts Moved to Maven Central
Starting with this release, PSL releases and artifacts will now be hosted through Maven Central.
Maven Central is the default remote repository for Maven.
With PSL deployed there, there is no longer a need to use the old Maven repository at:
http://maven.linqs.org/maven/repositories/psl-releases/.
Old builds will continue to be hosted at the old repository for the foreseeable future.
To find the new builds you can go to the org.linqs group on Maven Central.
The development versions are labeled as CANARY
releases.
Because PSL is now hosted on Maven central,
you can now remove the maven.linqs.org repository from your Maven configuration.
In most cases, this means that you can remove the following section from your pom.xml
files:
<repositories>
<repository>
<releases>
<enabled>true</enabled>
<updatePolicy>daily</updatePolicy>
<checksumPolicy>fail</checksumPolicy>
</releases>
<id>psl-releases</id>
<name>PSL Releases</name>
<url>http://maven.linqs.org/maven/repositories/psl-releases/</url>
<layout>default</layout>
</repository>
</repositories>
Wiki Hosted on psl.linqs.org
The PSL Wiki https://github.com/linqs/psl/wiki and PSL Development Wiki https://github.com/eriq-augustine/psl/wiki have been moved to psl.linqs.org/wiki. All stable and development releases going forward will have a version of the wiki available as either live webpages (for newer releases) or downloadable archives (for older releases).
API Reference Hosted on psl.linqs.org
Along with the Wiki, the API reference will now also be hosted at psl.linqs.org/api. All stable and development releases going forward will have a version of the API reference available as either live webpages (for newer releases) or downloadable archives (for older releases).
Development Repo Now linqs/psl
PSL development has been moved from eriq-augustine/psl to the canonical PSL repository: linqs/psl. Any new pull requests should be submitted there.
Issues Moved to linqs/psl
Along with pull requests, issues have been moved to the canonical PSL repository: linqs/psl/issues. All old issues (along the with their comments and labels) have been migrated to this repository and any new issues should be submitted there.
PSL Interfaces
The PSL 2.2.1 release comes with two new interfaces, and one deprecation.
New Python Interface
Commit: a38cffe5
PSL 2.2.1 comes with the first official release of the PSL Python interface.
This package is called pslpython
and is available on PyPi.
Therefore, it can be installed via pip:
pip install pslpython
The source for the interface is available in the main PSL repository.
Fully implemented examples can be found in the psl-examples repository. Below is a simplified example of the Python interface:
import os
from pslpython.model import Model
from pslpython.partition import Partition
from pslpython.predicate import Predicate
from pslpython.rule import Rule
model = Model('sample-model')
# Add predicates.
predicate = Predicate('Foo', closed = True, size = 2)
model.add_predicate(predicate)
predicate = Predicate('Bar', closed = False, size = 2)
model.add_predicate(predicate)
# Add rules.
model.add_rule(Rule('0.20: Foo(A, B) -> Bar(A, B) ^2'))
model.add_rule(Rule('0.01: !Bar(A, B) ^2'))
# Load data.
path = os.path.join('data', 'foo_obs.txt')
model.get_predicate('Foo').add_data_file(Partition.OBSERVATIONS, path)
path = os.path.join('data', 'bar_targets.txt')
model.get_predicate('Bar').add_data_file(Partition.TARGETS, path)
# Run inference.
results = model.infer()
# Write out results.
out_dir = 'inferred-predicates'
os.makedirs(out_dir, exist_ok = True)
out_path = os.path.join(out_dir, "bar.txt")
results[model.get_predicate('Bar')].to_csv(out_path, sep = "\t", header = False, index = False)
In addition to creating models in Python, you can use the PSL python package to invoke the PSL CLI interface directory. Instead of invoking the PSL jar:
java -jar psl.jar --model test.psl --data test.data
You can use the pslpython
package already installed via pip:
python -m pslpython.cli --model test.psl --data test.data
Additionally, any arguments supported by the CLI interface can be passed to pslpython.cli
as well:
python -m pslpython.cli --model test.psl --data test.data --postgres myDB -D log4j.threshold=DEBUG
New Java Interface
Commit: 7e305dfe
PSL 2.2.1 comes with a new Java interface. This interface works much like the Groovy interface with some slight differences. Fully implemented examples of the Java interface can be found in the psl-examples repository.
Instead of using org.linqs.psl.groovy.PSLModel
, org.linqs.psl.java.PSLModel
is used.
The methods for the PSLModel class are now explicitly named, instead of being overloads of the same add()
method.
For example, instead of model.add predicate: "Foo", ...
, you will use model.addPredicate("Foo", ...)
.
The full API for the PSLModel class can be found here.
The Groovy interface allows rules to be specified as part of the Groovy syntax. However, rules in the Java interface must be specified as a String.
To access predicates in the Java interface, you can no longer just reference them by name with no context. Now, you can ask the model for a predicate by name. In Groovy:
Inserter inserter = dataStore.getInserter(Foo, obsPartition);
In Java:
Inserter inserter = dataStore.getInserter(model.getStandardPredicate("Foo"), obsPartition);
Groovy Interface Deprecated
Commit: 476dfd1e
With the addition of the new Java interface, the Groovy interface has officially been deprecated. It will be removed from the next release of PSL. Dropping support for Groovy will allow us to support a wider range of Java versions (instead of just 7 and 8).
CLI Improvements
Functional Predicates
Commit: ac2f9a30
Functional predicates are now supported in the CLI.
To use these, a function
key needs to be specified in the predicate definition, e.g:
Knows/2: open
Likes/2: closed
Lived/2: closed
SimName:
- function: org.foo.bar.SimNameExternalFunction
In this case, the functional predicate is SimName
and it is implemented by the SimNameExternalFunction
class.
SimName
can then be used in a rule like:
1.0: SimName(P1, P2) & Lived(P1, L) & Lived(P2, L) & (P1 != P2) -> Knows(P1, P2) ^2
Multiple Evaluators
Commit: d63a8f7e
The CLI can now use multiple evaluators in one run.
This can be done by passing by multiple evaluators to the --eval
argument:
java -jar psl.jar --model test.psl --data test.data --eval DiscreteEvaluator ContinuousEvaluator
Output Ground Rules
The CLI accepts two new arguments that can be used to see the ground rules being processed.
-gr
/--groundrules
can be used to output the ground rules before inference is run.
This will show the ground rules as early as possible.
While --satisfaction
will output ground rules along with their satisfaction value after inference is run.
If you are concerned about an issue with your rules/data and want to see the ground rules created,
then --groundrules
is the option you should use.
If you are curious about the value that different rules are taking,
then --satisfaction
is the option you should use.
Either option can be specified without arguments, and the results will be output to stdout. You can also specify an optional path with either argument and the results will be output there.
Skip Database Commit
Commit: 3ced4b20
If you do not need the results of inference saved into the database,
then you can save time by skipping the writing of results to the database using the --skipAtomCommit
argument.
Remove Extra Quoting
Commit: cbe7fd8a
Constants are no longer quoted in the inferred predicate output produced by the CLI. This may break existing scripts that parse this output, but now files output by the CLI will match the format consumed by the CLI (by default).
run.sh Takes Arguments
The run.sh
scripts in CLI implementations for psl-examples
now takes arguments that are passed directly to the CLI.
Specifying these arguments is equivalent to adding these arguments to the ADDITIONAL_PSL_OPTIONS
constant.
For example:
./run.sh -D log4j.threshold=DEBUG --postgres psl
Performance
Reduced Memory
A lot of effort was put into reducing the memory burden of PSL for the 2.2.1 release. Both in terms of allocations and total persisted memory. We have observed the total memory consumption in PSL drop between 17.5% and 45.7% (depending on the exact model and data). Below you can see an example of the same model and data in PSL 2.1.0 (left) vs PSL 2.2.1 (right). The blue portion of the graph is the actual memory being used.
Smaller Types
Commit: 9a34ce23
Where possible, standard types have been replaced by their shorter sibling
(int
replaced with short
, double
replaced with float
, etc).
This allows us to trade unused precision for memory and speed (depending on the system architecture).
Matrix Operations
Commit: 8f034fa8
We have added the FloatMatrix
class to handle low-level matrix operations.
This classes uses the Netlib library to call into the low-level BLAS and LAPACK libraries.
This allows us to easily perform efficient matrix operations.
Streaming Grounding Results
Commit: 8f4de846
Grounding results can now be streamed from the database using the QueryResultIterable
class.
This allows the user to iterate through the grounding results without needing to keep them all in memory at the same time.
Runtime Statistics
Commit: df58a390
A new class, RuntimeStats,
has been introduced to keep track of JVM statistics throughout the lifetime of a PSL program.
Setting the configuration option runtimestats.collect
to true
will enable the statistics collection.
These collected stats are currently output to the INFO
log level when the JVM terminates.
Currently, memory information is automatically collected.
In addition, the user can call the static logDiskRead()
and logDiskWrite()
methods to keep track of I/O operations.
Using the statistics looks like:
linqs@comp:~/code/psl-examples/simple-acquaintances/cli$ ./run.sh -D runtimestats.collect=true
Running PSL Inference
0 [main] INFO org.linqs.psl.cli.Launcher - Running PSL CLI Version 2.2.1-a573763
... < Omitted in the changelog for brevity > ...
308 [main] INFO org.linqs.psl.application.inference.InferenceApplication - Inference complete.
1308 [main] INFO org.linqs.psl.application.inference.InferenceApplication - Writing results to Database.
1340 [main] INFO org.linqs.psl.application.inference.InferenceApplication - Results committed to database.
1340 [main] INFO org.linqs.psl.cli.Launcher - Inference Complete
1349 [main] INFO org.linqs.psl.cli.Launcher - Starting evaluation with class: org.linqs.psl.evaluation.statistics.DiscreteEvaluator.
1368 [main] INFO org.linqs.psl.cli.Launcher - Evaluation results for KNOWS -- Accuracy: 0.915254, F1: 0.933333, Positive Class Precision: 0.945946, Positive Class Recall: 0.921053, Negative Class Precision: 0.863636, Negative Class Recall: 0.904762
1368 [main] INFO org.linqs.psl.cli.Launcher - Evaluation complete.
1377 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - Total Memory (bytes) -- Min: 504889344, Max: 504889344, Mean: 504889344, Count: 6
1377 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - Free Memory (bytes) -- Min: 403775464, Max: 494319512, Mean: 437039418, Count: 6
1377 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - Used Memory (bytes) -- Min: 10569832, Max: 101113880, Mean: 67849925, Count: 6
1377 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - Max Memory (bytes) -- Min: 7475298304, Max: 7475298304, Mean: 7475298304, Count: 6
1378 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - IO Reads (bytes) -- Min: 0, Max: 0, Mean: 0, Count: 0, Total: 0
1378 [Thread-1] INFO org.linqs.psl.util.RuntimeStats - IO Writes (bytes) -- Min: 0, Max: 0, Mean: 0, Count: 0, Total: 0
linqs@comp:~/code/psl-examples/simple-acquaintances/cli$
Miscellanea
Simple Class Names
Commit: 00b60321
In any case where a classname is used as a configuration option or argument, you can now specify the classes shortname instead of its fully-qualified name. For example, instead of:
java -jar psl.jar --model test.psl --data test.data --eval org.linqs.psl.evaluation.statistics.DiscreteEvaluator
You can do:
java -jar psl.jar --model test.psl --data test.data --eval DiscreteEvaluator
New Weight Learning Method: GPP
With the release of PSL 2.2.1, we are adding a new weight learning method called Gaussian Process Prior (GPP), which is based on Bayesian optimization.
GPP is a search-based weight learning method that works by approximating a user defined metric and evaluating the PSL model on a number of strategically chosen weights. Although GPP tends to work better with a smoother metric, it can work well with any metric. The best metric to use depends on the specific problem being modeled. A major benefit of GPP, and all search-based methods, is that the metric being optimized can be the same as the metric that results are evaluated on. In contrast, likelihood-based methods (e.g. Maximum Likelihood MPE) maximize the likelihood, which may not be strongly correlated with the desired evaluation metrics. For further details about GPP, please see the paper: BOWL Bayesian Optimization for Weight Learning in Probabilistic Soft Logic.
To use GPP in PSL, you need to choose the right weight learning class and an evaluator that GPP will use to evaluate weight configurations.
The class for GPP is org.linqs.psl.application.learning.weight.bayesian.GaussianProcessPrior
.
You can choose an evaulator by setting the weightlearning.evaluator
configuration option to any Evaluator.
For example, you can use the following command to use GPP in the CLI:
./run.sh --learn GaussianProcessPrior -D weightlearning.evaluator=DiscreteEvaluator
GPP has four main configuration options that the user should be aware of:
-
weightlearning.evaluator
Domain: Any Evaluator
Default: ContinuousEvaluator
The user defined metric function that GPP uses to evaluate and optimize weights. The best evaluator to use depends on your specific problem. -
gpp.maxiterations
Domain: Integer in (0, ∞)
Default: 25
The maximum number of times that BOWL will conduct evaluations before choosing the best set of weights. 100 iterations is typically enough for even difficult domains. Keep in mind that the time taken to perform full learning in BOWL grows quadratically with the number of iterations. Therefore, if you choose a large number (such as 500), learning might take days to finish. -
gpp.explore
Domain: Float in (0.0, ∞)
Default: 2.0
This determines how the weights will be chosen for evaluation. A lower value implies that the weights chosen for evaluation will be clustered around one region whereas a higher value will lead to exploration of weights that are as distinct as possible. -
gppker.reldep
Domain: Float in (0.0, ∞)
Default: 1.0
The relative dependence value given in GPP. The exploration space increases as the number of rules in the model increase. A smaller value would imply that weights are very distinct and related, hence requiring fewer iterations.
Made Data Loading Errors More Clear
Commit: 1e889f49
File paths were added to several data loading errors to make them more clear.
Removed Date ConstantType
Commit: df5e1b64
The ConstantType.Date
type have been removed as predicate argument type.
Instead, users should use ConstantType.String
with the date represented as a string (we suggest ISO 8601),
or users should use ConstantType.Integer
and represent the date as an Unix/Epoch time.