Python API reference¶
Major commands¶
- 
lm_zoo.get_predictions(model: lm_zoo.models.Model, sentences: List[str], backend=None)¶
- Compute token-level predictive distributions from a language model for the given natural language sentences. Returns a h5py - Fileobject with the following structure:- /sentence/<i>/predictions: N_tokens_i * N_vocabulary numpy ndarray of
- log-probabilities (rows are log-probability distributions) 
- /sentence/<i>/tokens: sequence of integer token IDs corresponding to
- indices in - /vocabulary
- /vocabulary: byte-encoded ndarray of vocabulary items (decode with
- numpy.char.decode(vocabulary, "utf-8"))
 - Parameters
- model – lm-zoo model reference 
- sentences – list of natural language sentence strings (not pre tokenized) 
 
 
- 
lm_zoo.get_surprisals(model: lm_zoo.models.Model, sentences: List[str], backend=None)¶
- Compute word-level surprisals from a language model for the given natural language sentences. Returns a data frame with a MultiIndex - `(sentence_id, token_id)(both one-indexed) and columns- tokenand- surprisal.- The surprisal of a token \(w_i\) is the negative logarithm of that token’s probability under a language model’s predictive distribution: \[S(w_i) = -\log_2 p(w_i \mid w_1, w_2, \ldots, w_{i-1})\]- Note that surprisals are computed on the level of tokens, not words. Models that insert extra tokens (e.g., an end-of-sentence token as above) or which tokenize on the sub-word level (e.g. GPT2) will not have a one-to-one mapping between rows of surprisal output from this command and words. - There is guaranteed to be a one-to-one mapping, however, between the rows of this file and the tokens produced by - lm-zoo tokenize.
- 
lm_zoo.run_model_command(model: lm_zoo.models.Model, command_str, backend=None, pull=False, mounts=None, stdin=None, stdout=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, stderr=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, progress_stream=<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>, raise_errors=True)¶
- Run the given shell command inside a container instantiating the given model. - Parameters
- backend – Backend platform on which to execute the model. May be any of the string keys of lm_zoo.backends.BACKEND_DICT, or a Backend class. 
- mounts – List of bind mounts described as tuples (guest_path, host_path, mode), where mode is one of - ro,- rw
- raise_errors – If - True, monitor command status/output and raise errors when necessary.
 
- Returns
- Docker API response as a Python dictionary. The key - StatusCodemay be of interest.
 
- 
lm_zoo.spec(model: lm_zoo.models.Model, backend=None)¶
- Get a language model specification as a dict. 
- 
lm_zoo.tokenize(model: lm_zoo.models.Model, sentences: List[str], backend=None)¶
- Tokenize natural-language text according to a model’s preprocessing standards. - sentences should be a list of natural-language sentences. - This command returns a list of tokenized sentences, with each sentence a list of token strings. For each sentence, there is a one-to-one mapping between the tokens output by this command and the tokens used by the - get-surprisalscommand.
- 
lm_zoo.unkify(model: lm_zoo.models.Model, sentences: List[str], backend=None)¶
- Detect unknown words for a language model for the given natural language text. - sentences should be a list of natural-language sentences. - Returns
- A list of sentence masks, each a list of - 0and- 1values. These values correspond one-to-one with the model’s tokenization of the sentence (as returned by- lm-zoo.tokenize). The value- 0indicates that the corresponding token is in the model’s vocabulary; the value- 1indicates that the corresponding token is an unknown word for the model.
 
Models¶
- 
class lm_zoo.models.DockerModel(reference)¶
- Represents a model reference stored on Docker Hub. 
- 
class lm_zoo.models.OfficialModel(model_dict)¶
- Represents a model stored in the official registry. - 
classmethod from_dict(model_dict)¶
- Initialize a Model instance from a registry dict entry. 
 
- 
classmethod 
- 
class lm_zoo.models.SingularityModel(repository, reference)¶
- Represents a model reference stored in a Singularity repository. 
Backends¶
- 
class lm_zoo.backends.Backend¶
- Abstract class defining an interface between LM Zoo models and a containerization platform. 
- 
lm_zoo.backends.get_backend(backend_ref: Union[str, Type[lm_zoo.backends.Backend]])¶
- Load a Backend instance for the given reference (string or class). 
- 
lm_zoo.backends.get_compatible_backend(model: lm_zoo.models.Model, preferred_backends: Union[Type[lm_zoo.backends.Backend], List[Type[lm_zoo.backends.Backend]], None] = None)¶
- Get a compatible backend for the given model. 
- 
class lm_zoo.backends.docker.DockerBackend¶
- 
class lm_zoo.backends.singularity.SingularityBackend¶
