alex.components.slu package¶

Submodules¶

alex.components.slu.autopath module¶

self cloning, automatic path configuration

copy this into any subdirectory of pypy from which scripts need to be run, typically all of the test subdirs. The idea is that any such script simply issues

import autopath

and this will make sure that the parent directory containing “pypy” is in sys.path.

If you modify the master “autopath.py” version (in pypy/tool/autopath.py) you can directly run it which will copy itself on all autopath.py files it finds under the pypy root directory.

This module always provides these attributes:

pypydir pypy root directory path this_dir directory where this autopath.py resides

alex.components.slu.base module¶

class alex.components.slu.base.CategoryLabelDatabase(file_name=None)[source]¶

Bases: object

Provides a convenient interface to a database of slot value pairs aka category labels.

Attributes:: synonym_value_category: a list of (form, value, category label) tuples

In an utterance:

there can be multiple surface forms in an utterance
surface forms can overlap
a surface form can map to multiple category labels

Then when detecting surface forms / category labels in an utterance:

find all existing surface forms / category labels and generate a new utterance with for every found surface form and category label (called abstracted), where the original surface form is replaced by its category label
- instead of testing all surface forms from the CLDB from the longest to the shortest in the utterance, we test all the substrings in the utterance from the longest to the shortest

form_upnames_vals¶: list of tuples (form, upnames_vals) from the database where upnames_vals is a dictionary

{name.upper(): all values for this (form, name)}.

form_val_upname¶: list of tuples (form, value, name.upper()) from the database

gen_form_value_cl_list()[source]¶

Generates an list of form, value, category label tuples from the database. This list is ordered where the tuples with the longest surface forms are at the beginning of the list.

Returns:	none

gen_mapping_form2value2cl()[source]¶

Generates an list of form, value, category label tuples from the database . This list is ordered where the tuples with the longest surface forms are at the beginning of the list.

Returns:	none

gen_synonym_value_category()[source]¶

load(file_name=None, db_mod=None)[source]¶

normalise_database()[source]¶: Normalise database. E.g., split utterances into sequences of words.

class alex.components.slu.base.SLUInterface(preprocessing, cfg, *args, **kwargs)[source]¶

Bases: object

Defines a prototypical interface each SLU parser should provide.

It should be able to parse:

an utterance hypothesis (an instance of UtteranceHyp)
- output: an instance of SLUHypothesis
an n-best list of utterances (an instance of UtteranceNBList)
- output: an instance of SLUHypothesis
a confusion network (an instance of UtteranceConfusionNetwork)
- output: an instance of SLUHypothesis

extract_features(*args, **kwargs)[source]¶

parse(obs, *args, **kwargs)[source]¶: Check what the input is and parse accordingly.

parse_1_best(obs, *args, **kwargs)[source]¶

parse_confnet(obs, n=40, *args, **kwargs)[source]¶

Parses an observation featuring a word confusion network using the parse_nblist method.

Arguments:

obs – a dictionary of observations: :: observation type -> observed value where observation type is one of values for `obs_type’ used in `ft_props’, and observed value is the corresponding observed value for the input

n – depth of the n-best list generated from the confusion network args – further positional arguments that should be passed to the

`parse_1_best’ method call

kwargs – further keyword arguments that should be passed to the: `parse_1_best’ method call

parse_nblist(obs, *args, **kwargs)[source]¶

Parses an observation featuring an utterance n-best list using the parse_1_best method.

Arguments:

obs – a dictionary of observations: :: observation type -> observed value where observation type is one of values for `obs_type’ used in `ft_props’, and observed value is the corresponding observed value for the input
args – further positional arguments that should be passed to the: `parse_1_best’ method call
kwargs – further keyword arguments that should be passed to the: `parse_1_best’ method call

print_classifiers(*args, **kwargs)[source]¶

prune_classifiers(*args, **kwargs)[source]¶

prune_features(*args, **kwargs)[source]¶

save_model(*args, **kwargs)[source]¶

train(*args, **kwargs)[source]¶

class alex.components.slu.base.SLUPreprocessing(cldb, text_normalization=None)[source]¶

Bases: object

Implements preprocessing of utterances or utterances and dialogue acts. The main purpose is to replace all values in the database by their category labels (slot names) to reduce the complexity of the input utterances.

In addition, it implements text normalisation for SLU input, e.g. removing filler words such as UHM, UM etc., converting “I’m” into “I am” etc. Some normalisation is hard-coded. However, it can be updated by providing normalisation patterns.

normalise(utt_hyp)[source]¶

normalise_confnet(confnet)[source]¶

Normalises the confnet (the output of an ASR).

E.g., it removes filler words such as UHM, UM, etc., converts “I’m” into “I am”, etc.

normalise_nblist(nblist)[source]¶

Normalises the N-best list (the output of an ASR).

Parameters:	nblist –
Returns:

normalise_utterance(utterance)[source]¶

Normalises the utterance (the output of an ASR).

E.g., it removes filler words such as UHM, UM, etc., converts “I’m” into “I am”, etc.

text_normalization_mapping = [(['erm'], []), (['uhm'], []), (['um'], []), (["i'm"], ['i', 'am']), (['(sil)'], []), (['(%hesitation)'], []), (['(hesitation)'], [])]¶

alex.components.slu.common module¶

alex.components.slu.common.get_slu_type(cfg)[source]¶: Reads the SLU type from the configuration.

alex.components.slu.common.slu_factory(cfg, slu_type=None)[source]¶

Creates an SLU parser.

Parameters:	cfg – slu_type – require_model – training – verbose –

alex.components.slu.cued_da module¶

class alex.components.slu.cued_da.CUEDDialogueAct(da_str=None)[source]¶

Bases: alex.components.slu.da.DialogueAct

CUED-style dialogue act

parse(da_str)[source]¶

class alex.components.slu.cued_da.CUEDSlot(slot_str)[source]¶

Bases: object

parse(slot_str)[source]¶

alex.components.slu.cued_da.load_das(das_fname, limit=None, encoding=u'UTF-8')[source]¶

alex.components.slu.da module¶

class alex.components.slu.da.DialogueAct(da_str=None)[source]¶

Bases: object

Represents a dialogue act (DA), i.e., a set of dialogue act items (DAIs).

The DAIs are stored in the `dais’ attribute, sorted w.r.t. their string representation. This class is not responsible for discarding a DAI which is repeated several times, so that you can obtain a DA that looks like this:

inform(food=”chinese”)&inform(food=”chinese”)

Attributes:: dais: a list of DAIs that constitute this dialogue act

append(dai)[source]¶: Append a dialogue act item to the current dialogue act.

dais¶

extend(dais)[source]¶

get_slots_and_values()[source]¶: Returns all slot names and values in the dialogue act.

has_dat(dat)[source]¶: Checks whether any of the dialogue act items has a specific dialogue act type.

has_only_dat(dat)[source]¶: Checks whether all the dialogue act items has a specific dialogue act type.

merge(da)[source]¶

Merges another DialogueAct into self. This is done by concatenating lists of the DAIs, and sorting and merging own DAIs afterwards.

If sorting is not desired, use `extend’ instead.

merge_same_dais()[source]¶: Merges same DAIs. I.e., if they are equal on extension but differ in original values, merges the original values together, and keeps the single DAI. This method causes the list of DAIs to be sorted.

parse(da_str)[source]¶

Parses the dialogue act from text.

If any DAIs have been already defined for this DA, they will be overwritten.

sort()[source]¶: Sorts own DAIs and merges the same ones.

class alex.components.slu.da.DialogueActConfusionNetwork[source]¶

Bases: alex.components.slu.da.SLUHypothesis, alex.ml.hypothesis.ConfusionNetwork

Dialogue act item confusion network. This is a very simple implementation in which all dialogue act items are assumed to be independent. Therefore, the network stores only posteriors for dialogue act items.

This can be efficiently stored as a list of DAIs each associated with its probability. The alternative for each DAI is that there is no such DAI in the DA. This can be represented as the null() dialogue act and its probability is 1 - p(DAI).

If there are more than one null() DA in the output DA, then they are collapsed into one null() DA since it means the same.

Please note that in the confusion network, the null() dialogue acts are not explicitly modelled.

get_best_da()[source]¶: Return the best dialogue act (one with the highest probability).

get_best_da_hyp(use_log=False, threshold=None, thresholds=None)[source]¶

Return the best dialogue act hypothesis.

Arguments:

use_log: whether to express probabilities on the log-scale: (otherwise, they vanish easily in a moderately long confnet)
threshold: threshold on probabilities – items with probability: exceeding the threshold will be present in the output (default: 0.5)
thresholds: threshold on probabilities – items with probability: exceeding the threshold will be present in the output. This is a mapping {dai -> threshold}, and if supplied, overwrites settings of `threshold’. If not supplied, it is ignored.

get_best_nonnull_da()[source]¶

Return the best dialogue act (with the highest probability) ignoring the best null() dialogue act item.

Instead of returning the null() act, it returns the most probable DAI with a defined slot name.

get_da_nblist(n=10, prune_prob=0.005)[source]¶

Parses the input dialogue act item confusion network and generates N-best hypotheses.

The result is a list of dialogue act hypotheses each with a with assigned probability. The list also include a dialogue act for not having the correct dialogue act in the list - other().

Generation of hypotheses will stop when the probability of the hypotheses is smaller then the prune_prob.

items()[source]¶

classmethod make_from_da(da)[source]¶

class alex.components.slu.da.DialogueActHyp(prob=None, da=None)[source]¶

Bases: alex.components.slu.da.SLUHypothesis

Provides functionality of 1-best hypotheses for dialogue acts.

get_best_da()[source]¶

get_da_nblist()[source]¶

class alex.components.slu.da.DialogueActItem(dialogue_act_type=None, name=None, value=None, dai=None, attrs=None, alignment=None)[source]¶

Bases: alex.ml.features.Abstracted

Represents dialogue act item which is a component of a dialogue act.

Each dialogue act item is composed of

dialogue act type - e.g. inform, confirm, request, select, hello

slot name and value pair - e.g. area, pricerange, food for name and

centre, cheap, or Italian for value

Attributes:: dat: dialogue act type (a string) name: slot name (a string or None) value: slot value (a string or None)

add_unnorm_value(newval)[source]¶: Registers `newval’ as another alternative unnormalised value for the value of this DAI’s slot.

alignment¶

category_label2value(catlabs=None)[source]¶

Use this method to substitute back the original value for the category label as the value of this DAI.

Arguments:

catlabs: an optional mapping of category labels to tuples (slot

value, surface form), as obtained from alex.components.slu:SLUPreprocessing

If this object does not remember its original value, it takes it from the provided mapping.

dat¶

extension()[source]¶: Returns an extension of self, i.e., a new DialogueActItem without hidden fields, such as the original value/category label.

get_unnorm_values()[source]¶: Retrieves the original unnormalised vaues of this DAI.

has_category_label()[source]¶: whether the current DAI value is the category label

is_null()[source]¶: whether this object represents the ‘null()’ DAI

iter_typeval()[source]¶

merge_unnorm_values(other)[source]¶: Merges unnormalised values of `other’ to unnormalised values of `self’.

name¶

normalised2value()[source]¶

Use this method to substitute back an unnormalised value for the normalised one as the value of this DAI.

Returns True iff substitution took place. Returns False if no more unnormalised values are remembered as a source for the normalised value.

orig_values¶

parse(dai_str)[source]¶: Parses the dialogue act item in text format into a structured form.

replace_typeval(orig, replacement)[source]¶

splitter = u':'¶

unnorm_values¶

value¶

value2category_label(label=None)[source]¶: Use this method to substitute a category label for value of this DAI.

value2normalised(normalised)[source]¶: Use this method to substitute a normalised value for value of this DAI.

class alex.components.slu.da.DialogueActNBList[source]¶

Bases: alex.components.slu.da.SLUHypothesis, alex.ml.hypothesis.NBList

Provides functionality of N-best lists for dialogue acts.

When updating the N-best list, one should do the following.

add DAs or parse a confusion network
merge and normalise, in either order

Attributes:

n_best: the list containing pairs [prob, DA] sorted from the most: probable to the least probable ones

add_other()[source]¶

get_best_da()[source]¶

Returns the most probable dialogue act.

DEPRECATED. Use get_best instead.

get_best_nonnull_da()[source]¶: Return the best dialogue act (with the highest probability).

get_confnet()[source]¶

has_dat(dat)[source]¶

merge()[source]¶: Adds up probabilities for the same hypotheses. Takes care to keep track of original, unnormalised DAI values. Returns self.

normalise()[source]¶

The N-best list is extended to include the “other()” dialogue act to represent those semantic hypotheses which are not included in the N-best list.

DEPRECATED. Use add_other instead.

scale()[source]¶: Scales the n-best list to sum to one.

sort()[source]¶: DEPRECATED, going to be removed.

class alex.components.slu.da.SLUHypothesis[source]¶

Bases: alex.ml.hypothesis.Hypothesis

This is the base class for all forms of probabilistic SLU hypotheses representations.

alex.components.slu.da.load_das(das_fname, limit=None, encoding=u'UTF-8')[source]¶

Loads a dictionary of DAs from a given file.

The file is assumed to contain lines of the following form:

[[:space:]..]<key>[[:space:]..]=>[[:space:]..]<DA>[[:space:]..]

or just (without keys):

[[:space:]..]<DA>[[:space:]..]

Arguments:: das_fname – path towards the file to read the DAs from limit – limit on the number of DAs to read encoding – the file encoding

Returns a dictionary with DAs (instances of DialogueAct) as values.

alex.components.slu.da.merge_slu_confnets(confnet_hyps)[source]¶: Merge multiple dialogue act confusion networks.

alex.components.slu.da.merge_slu_nblists(multiple_nblists)[source]¶: Merge multiple dialogue act N-best lists.

alex.components.slu.da.save_das(file_name, das, encoding=u'UTF-8')[source]¶

alex.components.slu.dailrclassifier module¶

This is a rewrite of the DAILogRegClassifier from dailrclassifier_old.py. The underlying approach is the same; however, the way how the features are computed is changed significantly.

class alex.components.slu.dailrclassifier.DAILogRegClassifier(cldb, preprocessing, features_size=4, *args, **kwargs)[source]¶

Bases: alex.components.slu.base.SLUInterface

Implements learning of dialogue act item classifiers based on logistic regression.

The parser implements a parser based on set of classifiers for each dialogue act item. When parsing the input utterance, the parse classifies whether a given dialogue act item is present. Then, the output dialogue act is composed of all detected dialogue act items.

Dialogue act is defined as a composition of dialogue act items. E.g.

confirm(drinks=”wine”)&inform(name=”kings shilling”) <=> ‘does kings serve wine’

where confirm(drinks=”wine”) and inform(name=”kings shilling”) are two dialogue act items.

This parser uses logistic regression as the classifier of the dialogue act items.

abstract_utterance(utterance)[source]¶

Return a list of possible abstractions of the utterance.

Parameters:	utterance – an Utterance instance
Returns:	a list of abstracted utterance, form, value, category label tuples

extract_classifiers(das, utterances, verbose=False)[source]¶

gen_classifiers_data(min_pos_feature_count=5, min_neg_feature_count=5, verbose=False, verbose2=False)[source]¶

get_abstract_da(da, fvcs)[source]¶

get_abstract_utterance(utterance, fvc)[source]¶

Return an utterance with the form inn fvc abstracted to its category label

Parameters:	utterance – an Utterance instance fvc – a form, value, category label tuple
Returns:	return the abstracted utterance

get_abstract_utterance2(utterance)[source]¶

Return an utterance with the form un fvc abstracted to its category label

Parameters:	utterance – an Utterance instance
Returns:	return the abstracted utterance

get_features(obs, fvc, fvcs)[source]¶

Generate utterance features for a specific utterance given by utt_idx.

Parameters:	obs – the utterance being processed in multiple formats fvc – a form, value category tuple describing how the utterance should be abstracted
Returns:	a set of features from the utterance

get_features_in_confnet(confnet, fvc, fvcs)[source]¶

get_features_in_nblist(nblist, fvc, fvcs)[source]¶

get_features_in_utterance(utterance, fvc, fvcs)[source]¶

Returns features extracted from the utterance observation. At this moment, the function extracts N-grams of size self.feature_size. These N-grams are extracted from:

the original utterance,
the abstracted utterance for the given FVC
the abstracted where all other FVCs are abstracted as well

Parameters:	utterance – fvc –
Returns:	the UtteranceFeatures instance

get_fvc(*args, **kwds)[source]¶

This function returns the form, value, category label tuple for any of the following classses

Utterance
UttranceNBList
UtteranceConfusionNetwork

Parameters:	obs – the utterance being processed in multiple formats
Returns:	a list of form, value, and category label tuples found in the input sentence

get_fvc_in_confnet(confnet)[source]¶

Return a list of all form, value, category label tuples in the confusion network.

Parameters:	nblist – an UtteranceConfusionNetwork instance
Returns:	a list of form, value, and category label tuples found in the input sentence

get_fvc_in_nblist(nblist)[source]¶

Return a list of all form, value, category label tuples in the nblist.

Parameters:	nblist – an UtteranceNBList instance
Returns:	a list of form, value, and category label tuples found in the input sentence

get_fvc_in_utterance(utterance)[source]¶

Return a list of all form, value, category label tuples in the utterance. This is useful to find/guess what category label level classifiers will be necessary to instantiate.

Parameters:	utterance – an Utterance instance
Returns:	a list of form, value, and category label tuples found in the input sentence

load_model(file_name)[source]¶

parse_1_best(obs={}, ret_cl_map=False, verbose=False, *args, **kwargs)[source]¶

Parse utterance and generate the best interpretation in the form of a dialogue act (an instance of DialogueAct).

The result is the dialogue act confusion network.

parse_X(utterance, verbose=False)[source]¶

parse_confnet(obs, verbose=False, *args, **kwargs)[source]¶: Parses the word confusion network by generating an n-best list and parsing this n-best list.

parse_nblist(obs, verbose=False, *args, **kwargs)[source]¶: Parses n-best list by parsing each item on the list and then merging the results.

print_classifiers()[source]¶

prune_classifiers(min_classifier_count=5)[source]¶

prune_features(clser, min_pos_feature_count, min_neg_feature_count, verbose=False)[source]¶

save_model(file_name, gzip=None)[source]¶

train(inverse_regularisation=1.0, verbose=True)[source]¶

class alex.components.slu.dailrclassifier.Features[source]¶

Bases: object

This is a simple feature object. It is a light version of an unnecessary complicated alex.ml.features.Features class.

get_feature_vector(features_mapping)[source]¶

get_feature_vector_lil(features_mapping)[source]¶

merge(features, weight=1.0, prefix=None)[source]¶

Merges passed feature dictionary with its own features. To the features can be applied weight factor or the features can be added as a binary feature. If a prefix is provided, then the features are added with the prefixed feature name.

Parameters:	features – a dictionary-like object with features as keys and values weight – a weight of added features with respect to already existing features. If None, then it is is added as a binary feature prefix – prefix for a name of an added features, This is useful when one want to distinguish between similarly generated features

prune(remove_features)[source]¶

Prune all features in the remove_feature set.

Parameters:	remove_features – a set of features to be pruned.

scale(scale=1.0)[source]¶

Scale all features with the scale.

Parameters:	scale – the scale factor.

class alex.components.slu.dailrclassifier.UtteranceFeatures(type=u'ngram', size=3, utterance=None)[source]¶

Bases: alex.components.slu.dailrclassifier.Features

This is a simple feature object. It is a light version of a alex.components.asr.utterance.UtteranceFeatures class.

parse(utt)[source]¶

alex.components.slu.dainnclassifier module¶

alex.components.slu.exceptions module¶

exception alex.components.slu.exceptions.CuedDialogueActError[source]¶: Bases: alex.components.slu.exceptions.SLUException

exception alex.components.slu.exceptions.DAIKernelException[source]¶: Bases: alex.components.slu.exceptions.SLUException

exception alex.components.slu.exceptions.DAILRException[source]¶: Bases: alex.components.slu.exceptions.SLUException

exception alex.components.slu.exceptions.DialogueActConfusionNetworkException[source]¶: Bases: alex.components.slu.exceptions.SLUException, alex.ml.hypothesis.ConfusionNetworkException

exception alex.components.slu.exceptions.DialogueActException[source]¶: Bases: alex.components.slu.exceptions.SLUException

exception alex.components.slu.exceptions.DialogueActItemException[source]¶: Bases: alex.components.slu.exceptions.SLUException

exception alex.components.slu.exceptions.DialogueActNBListException[source]¶: Bases: alex.components.slu.exceptions.SLUException

exception alex.components.slu.exceptions.SLUConfigurationException[source]¶: Bases: alex.components.slu.exceptions.SLUException

exception alex.components.slu.exceptions.SLUException[source]¶: Bases: alex.AlexException

alex.components.slu.templateclassifier module¶

class alex.components.slu.templateclassifier.TemplateClassifier(config)[source]¶

Bases: object

This parser is based on matching examples of utterances with known semantics against input utterance. The semantics of the example utterance which is closest to the input utterance is provided as a output semantics.

“Hi” => hello() “I can you give me a phone number” => request(phone) “I would like to have a phone number please” => request(phone)

The first match is reported as the resulting dialogue act.

parse(asr_hyp)[source]¶

readRules(file_name)[source]¶

alex.components.slu.test_da module¶

class alex.components.slu.test_da.TestDA(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_merge_slu_confnets()[source]¶

test_merge_slu_nblists_full_nbest_lists()[source]¶

test_swapping_merge_normalise()[source]¶

class alex.components.slu.test_da.TestDialogueActConfusionNetwork(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_add_merge()[source]¶

test_get_best_da()[source]¶

test_get_best_da_hyp()[source]¶

test_get_best_nonnull_da()[source]¶

test_get_da_nblist()[source]¶

test_get_prob()[source]¶

test_make_from_da()[source]¶

test_merge()[source]¶

test_normalise()[source]¶

test_prune()[source]¶

test_sort()[source]¶

alex.components.slu.test_dailrclassifier module¶

class alex.components.slu.test_dailrclassifier.TestDAILogRegClassifier(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

test_parse_X()[source]¶

alex.components.slu.test_dainnclassifier module¶

class alex.components.slu.test_dainnclassifier.TestDAINNClassifier(methodName='runTest')[source]¶

Bases: unittest.case.TestCase

setUp()[source]¶

tearDown()[source]¶

test_parse_X()[source]¶

alex.components.slu package¶

Submodules¶

alex.components.slu.autopath module¶

alex.components.slu.base module¶

alex.components.slu.common module¶

alex.components.slu.cued_da module¶

alex.components.slu.da module¶

alex.components.slu.dailrclassifier module¶

alex.components.slu.dainnclassifier module¶

alex.components.slu.exceptions module¶

alex.components.slu.templateclassifier module¶

alex.components.slu.test_da module¶

alex.components.slu.test_dailrclassifier module¶

alex.components.slu.test_dainnclassifier module¶

Module contents¶