Python API reference

Major commands

lm_zoo.get_predictions(model: lm_zoo.models.Model, sentences: List[str], backend=None)

Compute token-level predictive distributions from a language model for the given natural language sentences. Returns a h5py File object with the following structure:

/sentence/<i>/predictions: N_tokens_i * N_vocabulary numpy ndarray of

log-probabilities (rows are log-probability distributions)

/sentence/<i>/tokens: sequence of integer token IDs corresponding to

indices in /vocabulary

/vocabulary: byte-encoded ndarray of vocabulary items (decode with

numpy.char.decode(vocabulary, "utf-8"))

Parameters
  • model – lm-zoo model reference

  • sentences – list of natural language sentence strings (not pre tokenized)

lm_zoo.get_surprisals(model: lm_zoo.models.Model, sentences: List[str], backend=None)

Compute word-level surprisals from a language model for the given natural language sentences. Returns a data frame with a MultiIndex `(sentence_id, token_id) (both one-indexed) and columns token and surprisal.

The surprisal of a token \(w_i\) is the negative logarithm of that token’s probability under a language model’s predictive distribution:

\[S(w_i) = -\log_2 p(w_i \mid w_1, w_2, \ldots, w_{i-1})\]

Note that surprisals are computed on the level of tokens, not words. Models that insert extra tokens (e.g., an end-of-sentence token as above) or which tokenize on the sub-word level (e.g. GPT2) will not have a one-to-one mapping between rows of surprisal output from this command and words.

There is guaranteed to be a one-to-one mapping, however, between the rows of this file and the tokens produced by lm-zoo tokenize.

lm_zoo.run_model_command(model: lm_zoo.models.Model, command_str, backend=None, pull=False, mounts=None, stdin=None, stdout=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, progress_stream=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, raise_errors=True)

Run the given shell command inside a container instantiating the given model.

Parameters
  • backend – Backend platform on which to execute the model. May be any of the string keys of lm_zoo.backends.BACKEND_DICT, or a Backend class.

  • mounts – List of bind mounts described as tuples (guest_path, host_path, mode), where mode is one of ro, rw

  • raise_errors – If True, monitor command status/output and raise errors when necessary.

Returns

Docker API response as a Python dictionary. The key StatusCode may be of interest.

lm_zoo.spec(model: lm_zoo.models.Model, backend=None)

Get a language model specification as a dict.

lm_zoo.tokenize(model: lm_zoo.models.Model, sentences: List[str], backend=None)

Tokenize natural-language text according to a model’s preprocessing standards.

sentences should be a list of natural-language sentences.

This command returns a list of tokenized sentences, with each sentence a list of token strings. For each sentence, there is a one-to-one mapping between the tokens output by this command and the tokens used by the get-surprisals command.

lm_zoo.unkify(model: lm_zoo.models.Model, sentences: List[str], backend=None)

Detect unknown words for a language model for the given natural language text.

sentences should be a list of natural-language sentences.

Returns

A list of sentence masks, each a list of 0 and 1 values. These values correspond one-to-one with the model’s tokenization of the sentence (as returned by lm-zoo.tokenize). The value 0 indicates that the corresponding token is in the model’s vocabulary; the value 1 indicates that the corresponding token is an unknown word for the model.

Models

class lm_zoo.models.DockerModel(reference)

Represents a model reference stored on Docker Hub.

class lm_zoo.models.OfficialModel(model_dict)

Represents a model stored in the official registry.

classmethod from_dict(model_dict)

Initialize a Model instance from a registry dict entry.

class lm_zoo.models.SingularityModel(repository, reference)

Represents a model reference stored in a Singularity repository.

Backends

class lm_zoo.backends.Backend

Abstract class defining an interface between LM Zoo models and a containerization platform.

lm_zoo.backends.get_backend(backend_ref: Union[str, Type[lm_zoo.backends.Backend]])

Load a Backend instance for the given reference (string or class).

lm_zoo.backends.get_compatible_backend(model: lm_zoo.models.Model, preferred_backends: Union[Type[lm_zoo.backends.Backend], List[Type[lm_zoo.backends.Backend]], None] = None)

Get a compatible backend for the given model.

class lm_zoo.backends.docker.DockerBackend
class lm_zoo.backends.singularity.SingularityBackend