PSL 2.1.0 Release

We are happy to announce the release of PSL version 2.1.0! We have made great improvements to PSL in the areas of performance, expressively, and usability.

Here you will find a list of the major changes in 2.1.0 as well as information on migrating from 2.0.0.

For those of you that learn better by example, check out the PSL examples repository. The master branch is always compatible with the most resent stable release, while the develop branch stays up-to-date with our development work.

Database
CLI
Utilities
Infrastructure

Database

The database is the center of the grounding strategy for PSL. Since grounding is typically the single most costly operation when running PSL, we have put a substantial amount of work into improving our database interactions.

PostgreSQL Database Backend

In fall of 2017, we added support for a PostgreSQL database backend. However, we have not yet publicized it. Those who have a large number of ground rules (> 1 million) will most likely benefit from using Postgres over H2 (the default). All the details are on the PSL wiki page for Postgres.

Java/Groovy users just need to use org.linqs.psl.database.rdbms.driver.PostgreSQLDriver instead of org.linqs.psl.database.rdbms.driver.H2DatabaseDriver.

CLI users just need to pass the --postgres flag with an optional argument that is the name of the database to use.

Removed MySQL support

Support for the MySQL database backend has been removed. Instead, users who need a faster database backend should use the Postgres database backend.

Removed the Queries Class

The Queries class has been removed and all the functionality has been directly moved to the ReadableDatabase interface (implemented by the standard RDBMSDatabse). So the same set of methods are available directly from your database.

Removed InserterUtils

InserterUtils (from the psl-dataloading package) has been removed. Instead, you can now get the same functionality (loadDelimitedData() and loadDelimitedDataTruth()) from the Inserter class. This means that Groovy users no longer will need the psl-dataloading dependency in you pom file.

Instead of something like:

Inserter inserter = dataStore.getInserter(Foo, obsPartition);
InserterUtils.loadDelimitedData(inserter, Paths.get(DATA_PATH, "foo_obs.txt").toString());

You can do this:

Inserter inserter = dataStore.getInserter(Foo, obsPartition);
inserter.loadDelimitedData(Paths.get(DATA_PATH, "foo_obs.txt").toString());

The loadDelimitedDataAutomatic() method has also been added to the Inserter class. This method has previously existed in some versions of PSL, but has not been publicized or documented. This method will peek at the first line in the file and decide between loadDelimitedData() and loadDelimitedDataTruth().

Bulk Dataloading in Postgres

Users using the Postgres database backend will be loading data via the bulk-loading COPY command. This command is specialized for fast data loading from structured sources.

No user intervention is required. Calling one of the loadDelimitedData* methods will automatically use COPY if it is available, and fallback if it is not.

Deferred Indexing

In PSL, we will now defer indexing data tables until a database is actually requested via DataStore.getDatabase(). In typical use cases, people will construct a DataStore, insert the data, and then request a database for weightlearning/inference/evaluation. Because we now defer indexing, data loading will be speed up and our index statistics will be more up-to-date (resulting in generally better query plans).

No user intervention is required for this change. However, users should be aware of this change so that they do not needlessly create databases before inserting the data. (If you do so everything will still work as it does now, you just will not see improvements in data loading speed.) Pre-existing data tables are assumed to already be indexed.

No More Predicate Deserialization

A rarely used feature in past version of PSL was predicate deserialization. With this feature, users could use an existing database without specifying the program’s predicates. PSL would reconstruct the predicates from the structure of the database. Unfortunately, supporting this feature significantly increases the complexity of other new features. Therefore, we dropped predicate deserialization in favor of a simpler database structure.

Users who previously relied on predicate deserialization can instead just specify predicates explicitly.

UniqueID Storage Types

The raw UniqueID type have been replaced by UniqueIntID and UniqueStringID. This allows PSL to choose a better underlying storage type and make operations more efficient. Using UniqueStringID will replicate the previous behavior and should work for any data. However, using UniqueIntID will almost always be faster than UniqueStringID.

If you need to get a unique id in Java/Groovy, then you can just directly call new on it:

UniqueIntID myIntId = new UniqueIntID(4);
UniqueStringID myStringId = new UniqueStringID("foo");

Long Predicate Names Allowed

In previous versions of PSL, predicate names were limited to a certain length (which was decided by the database backend). PSL now allows arbitrarily long predicate names. If the name is too long for the database backend, then a hash of the name will be used instead.

CLI

The PSL CLI has been improved to the point that it will be sufficient for most users. Inference, weight learning, and evaluation are supported all in the CLI. All examples in the PSL examples repository have implementations in both the CLI and Groovy interfaces. To get quick information about usage, you can invoke the CLI with the --help flag.

CLI Configuration Properties

You can now pass any PSL configuration options on the CLI command line. Just use the -D option and specify the key-value pair. For example, you can run PSL with debug logging like this:

java -jar psl.jar --infer --data example.data --model example.psl -D log4j.threshold=DEBUG

Complete CLI Predicate Typing

In the CLI, you can now specify precise typing information for each predicate. Previously, all predicate arguments in the CLI were typed as UniqueStringID. The CLI data file wiki page contains all the information.

Literals in String Rules

Literals in string rules have been expanded to generally accept all characters. All literals (even numeric ones) must be quoted by single or double quotes. PSL will convert the literal to its underlying storage type.

If a literal contains the same quote used at the start/end, then is must be escaped with a backslash. Backslashes must always be escaped.

The following escapes are understood:

\' - Literal Single Quote
\" - Literal Double Quote
\\ - Literal Backslash
\t - Tab
\n - Newline
\r - Carriage Return

Utilities

Getting the PSL Version

You can now get the version of PSL you are using.

Java/Groovy:

org.linqs.psl.util.Version.get()

CLI:

java -jar psl.jar --version

Random in PSL

Random values can now be statically fetched from the org.linqs.psl.util.RandUtils class. This class is thread-safe. Seeds can be set using the RandUtils.seed method or by setting the random.seed property.

Parallelization in PSL

PSL now leverages multithreading in grounding and term generation. By default, PSL uses all the available threads on the machine. To set the number of threads, use the parallel.numthreads property.

Those writing code for PSL can take advantage of our parallelization infrastructure in the org.linqs.psl.util.Parallel class.

Infrastructure

Configuration Reworked

Both the ConfigManager and ConfigBundle have been removed and replaced with the static-only Config class. The same methods are all supported.

Instead of:

import org.linqs.psl.config.ConfigBundle;
import org.linqs.psl.config.ConfigManager;

public static void main(String[] args) {
   ConfigManager manager = ConfigManager.getManager();
   ConfigBundle config = manager.getBundle("mybundle");

   System.out.println(config.getString("key", "default"));
}

You can now do:

import org.linqs.psl.config.Config;

public static void main(String[] args) {
   System.out.println(Config.getString("key", "default"));
}

New Canary Naming System

The purpose of having a single build named CANARY is so that anyone can always grab just that build and be confident they have the latest published build. However, the lack of version number can be confusing. So as of now, we will always push two canary builds: one versioned and one unversioned. The unversioned one will continue to be called CANARY. The versioned one will have a suffix matching the next release and the number of canary builds pushed. For example, this first versioned canary is called CANARY-2.1.0, the next one is CANARY-2.1.1 and so on.

This way, those that want development improvement but a more static environment can use a versioned canary. To check what canary versions are available, just look at our maven repository. I have retroactively versioned the canary released on December 8th as CANARY-2.1.0.

For more details on the canary builds, checkout the PSL canary wiki page.

Inference API Rework

Now all inference methods inherit from org.linqs.psl.application.inference.InferenceApplication. The inference() method (previously mpeInference()) now does not return anything. If you need information about how inference ended, then you can query for the components of inference using:

getGroundRuleStore()
getTermStore()
getReasoner()

Moved PSL Evaluation

To support evaluation in the CLI and more general scoring in search-based weight learning, the psl-evaluation package has been moved from the psl-utils repository into the psl-core package. For most users, the only change is to remove psl-evaluation from their pom.xml file.

Any users using more obscure evaluation methods from the old psl-evaluation can still find them in the psl-evaluation-extended package (still in psl-utils). Over time, this package will be phased out as we standardize all the evaluation methods.

Evaluation Infrastructure Changes

All the PSL evaluation infrastructure has been reworked. Now all evaluation classes (previously called Comparator now called Evaluator) all follow the same interface.

Java/Groovy users can follow the general pattern:

Evaluator eval = new ContinuousEvaluator();
eval.compute(resultsDB, truthDB, MyTargetPredicate);
System.out.println(eval.getAllStats());

CLI users can use the --eval option and specify the type of evaluator to use:

java -jar psl.jar --infer --data example.data --model example.psl --eval org.linqs.psl.evaluation.statistics.ContinuousEvaluator

The PSL examples show how to use the evaluation infrastructure (in both Groovy and CLI).

External Function Support

External functions have been reworked to not rely on the backend database. This means that all database backends now support external functions with no additional work.

In addition, the database abstraction passed to external functions has been changed from a ReadonlyDatabase to a ReadableDatabase. This allows external functions to access more database functionality.