Python API reference¶
Major commands¶
-
lm_zoo.
get_predictions
(model: lm_zoo.models.Model, sentences: List[str], backend=None)¶ Compute token-level predictive distributions from a language model for the given natural language sentences. Returns a h5py
File
object with the following structure:- /sentence/<i>/predictions: N_tokens_i * N_vocabulary numpy ndarray of
log-probabilities (rows are log-probability distributions)
- /sentence/<i>/tokens: sequence of integer token IDs corresponding to
indices in
/vocabulary
- /vocabulary: byte-encoded ndarray of vocabulary items (decode with
numpy.char.decode(vocabulary, "utf-8")
)
- Parameters
model – lm-zoo model reference
sentences – list of natural language sentence strings (not pre tokenized)
-
lm_zoo.
get_surprisals
(model: lm_zoo.models.Model, sentences: List[str], backend=None)¶ Compute word-level surprisals from a language model for the given natural language sentences. Returns a data frame with a MultiIndex
`(sentence_id, token_id)
(both one-indexed) and columnstoken
andsurprisal
.The surprisal of a token \(w_i\) is the negative logarithm of that token’s probability under a language model’s predictive distribution:
\[S(w_i) = -\log_2 p(w_i \mid w_1, w_2, \ldots, w_{i-1})\]Note that surprisals are computed on the level of tokens, not words. Models that insert extra tokens (e.g., an end-of-sentence token as above) or which tokenize on the sub-word level (e.g. GPT2) will not have a one-to-one mapping between rows of surprisal output from this command and words.
There is guaranteed to be a one-to-one mapping, however, between the rows of this file and the tokens produced by
lm-zoo tokenize
.
-
lm_zoo.
run_model_command
(model: lm_zoo.models.Model, command_str, backend=None, pull=False, mounts=None, stdin=None, stdout=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, progress_stream=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, raise_errors=True)¶ Run the given shell command inside a container instantiating the given model.
- Parameters
backend – Backend platform on which to execute the model. May be any of the string keys of lm_zoo.backends.BACKEND_DICT, or a Backend class.
mounts – List of bind mounts described as tuples (guest_path, host_path, mode), where mode is one of
ro
,rw
raise_errors – If
True
, monitor command status/output and raise errors when necessary.
- Returns
Docker API response as a Python dictionary. The key
StatusCode
may be of interest.
-
lm_zoo.
spec
(model: lm_zoo.models.Model, backend=None)¶ Get a language model specification as a dict.
-
lm_zoo.
tokenize
(model: lm_zoo.models.Model, sentences: List[str], backend=None)¶ Tokenize natural-language text according to a model’s preprocessing standards.
sentences should be a list of natural-language sentences.
This command returns a list of tokenized sentences, with each sentence a list of token strings. For each sentence, there is a one-to-one mapping between the tokens output by this command and the tokens used by the
get-surprisals
command.
-
lm_zoo.
unkify
(model: lm_zoo.models.Model, sentences: List[str], backend=None)¶ Detect unknown words for a language model for the given natural language text.
sentences should be a list of natural-language sentences.
- Returns
A list of sentence masks, each a list of
0
and1
values. These values correspond one-to-one with the model’s tokenization of the sentence (as returned bylm-zoo.tokenize
). The value0
indicates that the corresponding token is in the model’s vocabulary; the value1
indicates that the corresponding token is an unknown word for the model.
Models¶
-
class
lm_zoo.models.
DockerModel
(reference)¶ Represents a model reference stored on Docker Hub.
-
class
lm_zoo.models.
OfficialModel
(model_dict)¶ Represents a model stored in the official registry.
-
classmethod
from_dict
(model_dict)¶ Initialize a Model instance from a registry dict entry.
-
classmethod
-
class
lm_zoo.models.
SingularityModel
(repository, reference)¶ Represents a model reference stored in a Singularity repository.
Backends¶
-
class
lm_zoo.backends.
Backend
¶ Abstract class defining an interface between LM Zoo models and a containerization platform.
-
lm_zoo.backends.
get_backend
(backend_ref: Union[str, Type[lm_zoo.backends.Backend]])¶ Load a Backend instance for the given reference (string or class).
-
lm_zoo.backends.
get_compatible_backend
(model: lm_zoo.models.Model, preferred_backends: Union[Type[lm_zoo.backends.Backend], List[Type[lm_zoo.backends.Backend]], None] = None)¶ Get a compatible backend for the given model.
-
class
lm_zoo.backends.docker.
DockerBackend
¶
-
class
lm_zoo.backends.singularity.
SingularityBackend
¶