Package Api Documentation for mlconjug¶
API Reference for the classes in mlconjug.mlconjug.py¶
MLConjug Main module.
-
mlconjug.mlconjug.
extract_verb_features
(verb, lang, ngram_range)[source]¶ - Custom Vectorizer optimized for extracting verbs features.The Vectorizer subclasses sklearn.feature_extraction.text.CountVectorizer .As in Indo-European languages verbs are inflected by adding a morphological suffix, the vectorizer extracts verb endings and produces a vector representation of the verb with binary features.To enhance the results of the feature extration, several other features have been included:The features are the verb’s ending n-grams, starting n-grams, length of the verb, number of vowels, number of consonants and the ratio of vowels over consonants.
Parameters: - verb – string. Verb to vectorize.
- lang – string. Language to analyze.
- ngram_range – tuple. The range of the ngram sliding window.
Returns: list. List of the most salient features of the verb for the task of finding it’s conjugation’s class.
-
class
mlconjug.mlconjug.
Conjugator
(language='fr', model=None)[source]¶ - This is the main class of the project.The class manages the Verbiste data set and provides an interface with the scikit-learn pipeline.If no parameters are provided, the default language is set to french and the pre-trained french conjugation pipeline is used.The class defines the method conjugate(verb, language) which is the main method of the module.
Parameters: - language – string. Language of the conjugator. The default language is ‘fr’ for french.
- model – mlconjug.Model or scikit-learn Pipeline or Classifier implementing the fit() and predict() methods. A user provided pipeline if the user has trained his own pipeline.
-
conjugate
(verb, subject='abbrev')[source]¶ - This is the main method of this class.It first checks to see if the verb is in Verbiste.If it is not, and a pre-trained scikit-learn pipeline has been supplied, the method then calls the pipeline to predict the conjugation class of the provided verb.Returns a Verb object or None.
Parameters: - verb – string. Verb to conjugate.
- subject – string. Toggles abbreviated or full pronouns. The default value is ‘abbrev’. Select ‘pronoun’ for full pronouns.
Returns: Verb object or None.
-
class
mlconjug.mlconjug.
DataSet
(verbs_dict)[source]¶ - This class holds and manages the data set.Defines helper methodss for managing Machine Learning tasks like constructing a training and testing set.
Parameters: verbs_dict – A dictionary of verbs and their corresponding conjugation class.
-
class
mlconjug.mlconjug.
Model
(vectorizer=None, feature_selector=None, classifier=None, language=None)[source]¶ Bases:
object
This class manages the scikit-learn pipeline.The Pipeline includes a feature vectorizer, a feature selector and a classifier.If any of the vectorizer, feature selector or classifier is not supplied at instance declaration, the __init__ method will provide good default values that get more than 92% prediction accuracy.Parameters: - vectorizer – scikit-learn Vectorizer.
- feature_selector – scikit-learn Classifier with a fit_transform() method
- classifier – scikit-learn Classifier with a predict() method
- language – language of the corpus of verbs to be analyzed.
API Reference for the classes in mlconjug.PyVerbiste.py¶
PyVerbiste.
-
class
mlconjug.PyVerbiste.
ConjugManager
(language='default')[source]¶ This is the class handling the mlconjug json files.
Parameters: language – string. | The language of the conjugator. The default value is fr for French. | The allowed values are: fr, en, es, it, pt, ro. -
_load_verbs
(verbs_file)[source]¶ Load and parses the verbs from the json file.
Parameters: verbs_file – string or path object. Path to the verbs json file.
-
_load_conjugations
(conjugations_file)[source]¶ Load and parses the conjugations from the xml file.
Parameters: conjugations_file – string or path object. Path to the conjugation xml file.
-
_detect_allowed_endings
()[source]¶ - Detects the allowed endings for verbs in the supported languages.All the supported languages except for English restrict the form a verb can take.As English is much more productive and varied in the morphology of its verbs, any word is allowed as a verb.
Returns: set. A set containing the allowed endings of verbs in the target language.
-
is_valid_verb
(verb)[source]¶ - Checks if the verb is a valid verb in the given language.English words are always treated as possible verbs.Verbs in other languages are filtered by their endings.
Parameters: verb – string. The verb to conjugate. Returns: bool. True if the verb is a valid verb in the language. False otherwise.
-
-
class
mlconjug.PyVerbiste.
Verbiste
(language='default')[source]¶ Bases:
mlconjug.PyVerbiste.ConjugManager
This is the class handling the Verbiste xml files.
Parameters: language – string. | The language of the conjugator. The default value is fr for French. | The allowed values are: fr, en, es, it, pt, ro. -
_load_verbs
(verbs_file)[source]¶ Load and parses the verbs from the xml file.
Parameters: verbs_file – string or path object. Path to the verbs xml file.
-
_parse_verbs
(file)[source]¶ Parses the XML file.
Parameters: file – FileObject. XML file containing the verbs. Returns: OrderedDict. An OrderedDict containing the verb and its template for all verbs in the file.
-
_load_conjugations
(conjugations_file)[source]¶ Load and parses the conjugations from the xml file.
Parameters: conjugations_file – string or path object. Path to the conjugation xml file.
-
_parse_conjugations
(file)[source]¶ Parses the XML file.
Parameters: file – FileObject. XML file containing the conjugation templates. Returns: OrderedDict. An OrderedDict containing all the conjugation templates in the file.
-
_load_tense
(tense)[source]¶ Load and parses the inflected forms of the tense from xml file.
Parameters: tense – list of xml tags containing inflected forms. The list of inflected forms for the current tense being processed. Returns: list. List of inflected forms.
-
_detect_allowed_endings
()¶ - Detects the allowed endings for verbs in the supported languages.All the supported languages except for English restrict the form a verb can take.As English is much more productive and varied in the morphology of its verbs, any word is allowed as a verb.
Returns: set. A set containing the allowed endings of verbs in the target language.
-
get_conjug_info
(template)¶ Gets conjugation information corresponding to the given template.
Parameters: template – string. Name of the verb ending pattern. Returns: OrderedDict or None. OrderedDict containing the conjugated suffixes of the template.
-
get_verb_info
(verb)¶ Gets verb information and returns a VerbInfo instance.
Parameters: verb – string. Verb to conjugate. Returns: VerbInfo object or None.
-
is_valid_verb
(verb)¶ - Checks if the verb is a valid verb in the given language.English words are always treated as possible verbs.Verbs in other languages are filtered by their endings.
Parameters: verb – string. The verb to conjugate. Returns: bool. True if the verb is a valid verb in the language. False otherwise.
-
-
class
mlconjug.PyVerbiste.
VerbInfo
(infinitive, root, template)[source]¶ This class defines the Verbiste verb information structure.
Parameters: - infinitive – string. Infinitive form of the verb.
- root – string. Lexical root of the verb.
- template – string. Name of the verb ending pattern.
-
class
mlconjug.PyVerbiste.
Verb
(verb_info, conjug_info, subject='abbrev', predicted=False)[source]¶ This class defines the Verb Object. TODO: Make the conjugated forms iterable by implementing the iterator protocol.
Parameters: - verb_info – VerbInfo Object.
- conjug_info – OrderedDict.
- subject – string. Toggles abbreviated or full pronouns. The default value is ‘abbrev’. Select ‘pronoun’ for full pronouns.
- predicted – bool. Indicates if the conjugation information was predicted by the model or retrieved from the dataset.
-
class
mlconjug.PyVerbiste.
VerbFr
(verb_info, conjug_info, subject='abbrev', predicted=False)[source]¶ Bases:
mlconjug.PyVerbiste.Verb
This class defines the French Verb Object.
-
_load_conjug
()[source]¶ - Populates the inflected forms of the verb.Adds personal pronouns to the inflected verbs.
-
iterate
()¶ Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:
-
-
class
mlconjug.PyVerbiste.
VerbEn
(verb_info, conjug_info, subject='abbrev', predicted=False)[source]¶ Bases:
mlconjug.PyVerbiste.Verb
This class defines the English Verb Object.
-
_load_conjug
()[source]¶ - Populates the inflected forms of the verb.Adds personal pronouns to the inflected verbs.
-
iterate
()¶ Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:
-
-
class
mlconjug.PyVerbiste.
VerbEs
(verb_info, conjug_info, subject='abbrev', predicted=False)[source]¶ Bases:
mlconjug.PyVerbiste.Verb
This class defines the Spanish Verb Object.
-
_load_conjug
()[source]¶ - Populates the inflected forms of the verb.Adds personal pronouns to the inflected verbs.
-
iterate
()¶ Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:
-
-
class
mlconjug.PyVerbiste.
VerbIt
(verb_info, conjug_info, subject='abbrev', predicted=False)[source]¶ Bases:
mlconjug.PyVerbiste.Verb
This class defines the Italian Verb Object.
-
_load_conjug
()[source]¶ - Populates the inflected forms of the verb.Adds personal pronouns to the inflected verbs.
-
iterate
()¶ Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:
-
-
class
mlconjug.PyVerbiste.
VerbPt
(verb_info, conjug_info, subject='abbrev', predicted=False)[source]¶ Bases:
mlconjug.PyVerbiste.Verb
This class defines the Portuguese Verb Object.
-
_load_conjug
()[source]¶ - Populates the inflected forms of the verb.Adds personal pronouns to the inflected verbs.
-
iterate
()¶ Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:
-
-
class
mlconjug.PyVerbiste.
VerbRo
(verb_info, conjug_info, subject='abbrev', predicted=False)[source]¶ Bases:
mlconjug.PyVerbiste.Verb
This class defines the Romanian Verb Object.
-
iterate
()¶ Iterates over all conjugated forms and returns a list of tuples of those conjugated forms. :return:
-