Package Api Documentation for mlconjug

API Reference for the classes in mlconjug.mlconjug.py

MLConjug Main module.

This module declares the main classes the user interacts with.
The module defines the classes needed to interface with Machine Learning models.
mlconjug.mlconjug.extract_verb_features(verb, lang, ngram_range)[source]
Custom Vectorizer optimized for extracting verbs features.
The Vectorizer subclasses sklearn.feature_extraction.text.CountVectorizer .
As in Indo-European languages verbs are inflected by adding a morphological suffix, the vectorizer extracts verb endings and produces a vector representation of the verb with binary features.
To enhance the results of the feature extration, several other features have been included:
The features are the verb’s ending n-grams, starting n-grams, length of the verb, number of vowels, number of consonants and the ratio of vowels over consonants.
Parameters:
  • verb – string. Verb to vectorize.
  • lang – string. Language to analyze.
  • ngram_range – tuple. The range of the ngram sliding window.
Returns:

list. List of the most salient features of the verb for the task of finding it’s conjugation’s class.

class mlconjug.mlconjug.Conjugator(language='fr', model=None)[source]
This is the main class of the project.
The class manages the Verbiste data set and provides an interface with the scikit-learn pipeline.
If no parameters are provided, the default language is set to french and the pre-trained french conjugation pipeline is used.
The class defines the method conjugate(verb, language) which is the main method of the module.
Parameters:
  • language – string. Language of the conjugator. The default language is ‘fr’ for french.
  • model – mlconjug.Model or scikit-learn Pipeline or Classifier implementing the fit() and predict() methods. A user provided pipeline if the user has trained his own pipeline.
conjugate(verb, subject='abbrev')[source]
This is the main method of this class.
It first checks to see if the verb is in Verbiste.
If it is not, and a pre-trained scikit-learn pipeline has been supplied, the method then calls the pipeline to predict the conjugation class of the provided verb.
Returns a Verb object or None.
Parameters:
  • verb – string. Verb to conjugate.
  • subject – string. Toggles abbreviated or full pronouns. The default value is ‘abbrev’. Select ‘pronoun’ for full pronouns.
Returns:

Verb object or None.

set_model(model)[source]

Assigns the provided pre-trained scikit-learn pipeline to be able to conjugate unknown verbs.

Parameters:model – scikit-learn Classifier or Pipeline.
class mlconjug.mlconjug.DataSet(verbs_dict)[source]
This class holds and manages the data set.
Defines helper methodss for managing Machine Learning tasks like constructing a training and testing set.
Parameters:verbs_dict – A dictionary of verbs and their corresponding conjugation class.
construct_dict_conjug()[source]
Populates the dictionary containing the conjugation templates.
Populates the lists containing the verbs and their templates.
split_data(threshold=8, proportion=0.5)[source]

Splits the data into a training and a testing set.

Parameters:
  • threshold – int. Minimum size of conjugation class to be split.
  • proportion – float. Proportion of samples in the training set. Must be between 0 and 1.
class mlconjug.mlconjug.Model(vectorizer=None, feature_selector=None, classifier=None, language=None)[source]

Bases: object

This class manages the scikit-learn pipeline.
The Pipeline includes a feature vectorizer, a feature selector and a classifier.
If any of the vectorizer, feature selector or classifier is not supplied at instance declaration, the __init__ method will provide good default values that get more than 92% prediction accuracy.
Parameters:
  • vectorizer – scikit-learn Vectorizer.
  • feature_selector – scikit-learn Classifier with a fit_transform() method
  • classifier – scikit-learn Classifier with a predict() method
  • language – language of the corpus of verbs to be analyzed.
train(samples, labels)[source]

Trains the pipeline on the supplied samples and labels.

Parameters:
  • samples – list. List of verbs.
  • labels – list. List of verb templates.
predict(verbs)[source]

Predicts the conjugation class of the provided list of verbs.

Parameters:verbs – list. List of verbs.
Returns:list. List of predicted conjugation groups.

API Reference for the classes in mlconjug.PyVerbiste.py

PyVerbiste.

A Python library for conjugating verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon).
It contains conjugation data generated by machine learning models using the python library mlconjug.
More information about mlconjug at https://pypi.org/project/mlconjug/
The conjugation data conforms to the XML schema defined by Verbiste.
class mlconjug.PyVerbiste.ConjugManager(language='default')[source]

This is the class handling the mlconjug json files.

Parameters:language – string. | The language of the conjugator. The default value is fr for French. | The allowed values are: fr, en, es, it, pt, ro.
_load_verbs(verbs_file)[source]

Load and parses the verbs from the json file.

Parameters:verbs_file – string or path object. Path to the verbs json file.
_load_conjugations(conjugations_file)[source]

Load and parses the conjugations from the xml file.

Parameters:conjugations_file – string or path object. Path to the conjugation xml file.
_detect_allowed_endings()[source]
Detects the allowed endings for verbs in the supported languages.
All the supported languages except for English restrict the form a verb can take.
As English is much more productive and varied in the morphology of its verbs, any word is allowed as a verb.
Returns:set. A set containing the allowed endings of verbs in the target language.
is_valid_verb(verb)[source]
Checks if the verb is a valid verb in the given language.
English words are always treated as possible verbs.
Verbs in other languages are filtered by their endings.
Parameters:verb – string. The verb to conjugate.
Returns:bool. True if the verb is a valid verb in the language. False otherwise.
get_verb_info(verb)[source]

Gets verb information and returns a VerbInfo instance.

Parameters:verb – string. Verb to conjugate.
Returns:VerbInfo object or None.
get_conjug_info(template)[source]

Gets conjugation information corresponding to the given template.

Parameters:template – string. Name of the verb ending pattern.
Returns:OrderedDict or None. OrderedDict containing the conjugated suffixes of the template.
class mlconjug.PyVerbiste.Verbiste(language='default')[source]

Bases: mlconjug.PyVerbiste.ConjugManager

This is the class handling the Verbiste xml files.

Parameters:language – string. | The language of the conjugator. The default value is fr for French. | The allowed values are: fr, en, es, it, pt, ro.
_load_verbs(verbs_file)[source]

Load and parses the verbs from the xml file.

Parameters:verbs_file – string or path object. Path to the verbs xml file.
_parse_verbs(file)[source]

Parses the XML file.

Parameters:file – FileObject. XML file containing the verbs.
Returns:OrderedDict. An OrderedDict containing the verb and its template for all verbs in the file.
_load_conjugations(conjugations_file)[source]

Load and parses the conjugations from the xml file.

Parameters:conjugations_file – string or path object. Path to the conjugation xml file.
_parse_conjugations(file)[source]

Parses the XML file.

Parameters:file – FileObject. XML file containing the conjugation templates.
Returns:OrderedDict. An OrderedDict containing all the conjugation templates in the file.
_load_tense(tense)[source]

Load and parses the inflected forms of the tense from xml file.

Parameters:tense – list of xml tags containing inflected forms. The list of inflected forms for the current tense being processed.
Returns:list. List of inflected forms.
_detect_allowed_endings()
Detects the allowed endings for verbs in the supported languages.
All the supported languages except for English restrict the form a verb can take.
As English is much more productive and varied in the morphology of its verbs, any word is allowed as a verb.
Returns:set. A set containing the allowed endings of verbs in the target language.
get_conjug_info(template)

Gets conjugation information corresponding to the given template.

Parameters:template – string. Name of the verb ending pattern.
Returns:OrderedDict or None. OrderedDict containing the conjugated suffixes of the template.
get_verb_info(verb)

Gets verb information and returns a VerbInfo instance.

Parameters:verb – string. Verb to conjugate.
Returns:VerbInfo object or None.
is_valid_verb(verb)
Checks if the verb is a valid verb in the given language.
English words are always treated as possible verbs.
Verbs in other languages are filtered by their endings.
Parameters:verb – string. The verb to conjugate.
Returns:bool. True if the verb is a valid verb in the language. False otherwise.
class mlconjug.PyVerbiste.VerbInfo(infinitive, root, template)[source]

This class defines the Verbiste verb information structure.

Parameters:
  • infinitive – string. Infinitive form of the verb.
  • root – string. Lexical root of the verb.
  • template – string. Name of the verb ending pattern.
class mlconjug.PyVerbiste.Verb(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

This class defines the Verb Object. TODO: Make the conjugated forms iterable by implementing the iterator protocol.

Parameters:
  • verb_info – VerbInfo Object.
  • conjug_info – OrderedDict.
  • subject – string. Toggles abbreviated or full pronouns. The default value is ‘abbrev’. Select ‘pronoun’ for full pronouns.
  • predicted – bool. Indicates if the conjugation information was predicted by the model or retrieved from the dataset.
iterate()[source]

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:

_load_conjug()[source]
Populates the inflected forms of the verb.
This is the generic version of this method.
It does not add personal pronouns to the conjugated forms.
This method can handle any new language if the conjugation structure conforms to the Verbiste XML Schema.
class mlconjug.PyVerbiste.VerbFr(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug.PyVerbiste.Verb

This class defines the French Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:

class mlconjug.PyVerbiste.VerbEn(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug.PyVerbiste.Verb

This class defines the English Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:

class mlconjug.PyVerbiste.VerbEs(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug.PyVerbiste.Verb

This class defines the Spanish Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:

class mlconjug.PyVerbiste.VerbIt(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug.PyVerbiste.Verb

This class defines the Italian Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:

class mlconjug.PyVerbiste.VerbPt(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug.PyVerbiste.Verb

This class defines the Portuguese Verb Object.

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.
iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:

class mlconjug.PyVerbiste.VerbRo(verb_info, conjug_info, subject='abbrev', predicted=False)[source]

Bases: mlconjug.PyVerbiste.Verb

This class defines the Romanian Verb Object.

iterate()

Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:

_load_conjug()[source]
Populates the inflected forms of the verb.
Adds personal pronouns to the inflected verbs.