alex.components.slu package¶
Submodules¶
alex.components.slu.autopath module¶
self cloning, automatic path configuration
copy this into any subdirectory of pypy from which scripts need to be run, typically all of the test subdirs. The idea is that any such script simply issues
import autopath
and this will make sure that the parent directory containing “pypy” is in sys.path.
If you modify the master “autopath.py” version (in pypy/tool/autopath.py) you can directly run it which will copy itself on all autopath.py files it finds under the pypy root directory.
This module always provides these attributes:
pypydir pypy root directory path this_dir directory where this autopath.py resides
alex.components.slu.base module¶
-
class
alex.components.slu.base.
CategoryLabelDatabase
(file_name=None)[source]¶ Bases:
object
Provides a convenient interface to a database of slot value pairs aka category labels.
- Attributes:
- synonym_value_category: a list of (form, value, category label) tuples
In an utterance:
- there can be multiple surface forms in an utterance
- surface forms can overlap
- a surface form can map to multiple category labels
Then when detecting surface forms / category labels in an utterance:
- find all existing surface forms / category labels and generate a new utterance with for every found surface form and
category label (called abstracted), where the original surface form is replaced by its category label
- instead of testing all surface forms from the CLDB from the longest to the shortest in the utterance, we test all the substrings in the utterance from the longest to the shortest
-
form_upnames_vals
¶ list of tuples (form, upnames_vals) from the database where upnames_vals is a dictionary
{name.upper(): all values for this (form, name)}.
-
form_val_upname
¶ list of tuples (form, value, name.upper()) from the database
-
gen_form_value_cl_list
()[source]¶ Generates an list of form, value, category label tuples from the database. This list is ordered where the tuples with the longest surface forms are at the beginning of the list.
Returns: none
-
class
alex.components.slu.base.
SLUInterface
(preprocessing, cfg, *args, **kwargs)[source]¶ Bases:
object
Defines a prototypical interface each SLU parser should provide.
- It should be able to parse:
- an utterance hypothesis (an instance of UtteranceHyp)
- output: an instance of SLUHypothesis
- an n-best list of utterances (an instance of UtteranceNBList)
- output: an instance of SLUHypothesis
- a confusion network (an instance of UtteranceConfusionNetwork)
- output: an instance of SLUHypothesis
-
parse_confnet
(obs, n=40, *args, **kwargs)[source]¶ Parses an observation featuring a word confusion network using the parse_nblist method.
- Arguments:
- obs – a dictionary of observations
- :: observation type -> observed value where observation type is one of values for `obs_type’ used in `ft_props’, and observed value is the corresponding observed value for the input
n – depth of the n-best list generated from the confusion network args – further positional arguments that should be passed to the
`parse_1_best’ method call- kwargs – further keyword arguments that should be passed to the
- `parse_1_best’ method call
-
parse_nblist
(obs, *args, **kwargs)[source]¶ Parses an observation featuring an utterance n-best list using the parse_1_best method.
- Arguments:
- obs – a dictionary of observations
- :: observation type -> observed value where observation type is one of values for `obs_type’ used in `ft_props’, and observed value is the corresponding observed value for the input
- args – further positional arguments that should be passed to the
- `parse_1_best’ method call
- kwargs – further keyword arguments that should be passed to the
- `parse_1_best’ method call
-
class
alex.components.slu.base.
SLUPreprocessing
(cldb, text_normalization=None)[source]¶ Bases:
object
Implements preprocessing of utterances or utterances and dialogue acts. The main purpose is to replace all values in the database by their category labels (slot names) to reduce the complexity of the input utterances.
In addition, it implements text normalisation for SLU input, e.g. removing filler words such as UHM, UM etc., converting “I’m” into “I am” etc. Some normalisation is hard-coded. However, it can be updated by providing normalisation patterns.
-
normalise_confnet
(confnet)[source]¶ Normalises the confnet (the output of an ASR).
E.g., it removes filler words such as UHM, UM, etc., converts “I’m” into “I am”, etc.
-
normalise_nblist
(nblist)[source]¶ Normalises the N-best list (the output of an ASR).
Parameters: nblist – Returns:
-
normalise_utterance
(utterance)[source]¶ Normalises the utterance (the output of an ASR).
E.g., it removes filler words such as UHM, UM, etc., converts “I’m” into “I am”, etc.
-
text_normalization_mapping
= [(['erm'], []), (['uhm'], []), (['um'], []), (["i'm"], ['i', 'am']), (['(sil)'], []), (['(%hesitation)'], []), (['(hesitation)'], [])]¶
-
alex.components.slu.common module¶
alex.components.slu.cued_da module¶
-
class
alex.components.slu.cued_da.
CUEDDialogueAct
(da_str=None)[source]¶ Bases:
alex.components.slu.da.DialogueAct
CUED-style dialogue act
alex.components.slu.da module¶
-
class
alex.components.slu.da.
DialogueAct
(da_str=None)[source]¶ Bases:
object
Represents a dialogue act (DA), i.e., a set of dialogue act items (DAIs).
The DAIs are stored in the `dais’ attribute, sorted w.r.t. their string representation. This class is not responsible for discarding a DAI which is repeated several times, so that you can obtain a DA that looks like this:
inform(food=”chinese”)&inform(food=”chinese”)- Attributes:
- dais: a list of DAIs that constitute this dialogue act
-
dais
¶
-
has_dat
(dat)[source]¶ Checks whether any of the dialogue act items has a specific dialogue act type.
-
has_only_dat
(dat)[source]¶ Checks whether all the dialogue act items has a specific dialogue act type.
-
merge
(da)[source]¶ Merges another DialogueAct into self. This is done by concatenating lists of the DAIs, and sorting and merging own DAIs afterwards.
If sorting is not desired, use `extend’ instead.
-
merge_same_dais
()[source]¶ Merges same DAIs. I.e., if they are equal on extension but differ in original values, merges the original values together, and keeps the single DAI. This method causes the list of DAIs to be sorted.
-
class
alex.components.slu.da.
DialogueActConfusionNetwork
[source]¶ Bases:
alex.components.slu.da.SLUHypothesis
,alex.ml.hypothesis.ConfusionNetwork
Dialogue act item confusion network. This is a very simple implementation in which all dialogue act items are assumed to be independent. Therefore, the network stores only posteriors for dialogue act items.
This can be efficiently stored as a list of DAIs each associated with its probability. The alternative for each DAI is that there is no such DAI in the DA. This can be represented as the null() dialogue act and its probability is 1 - p(DAI).
If there are more than one null() DA in the output DA, then they are collapsed into one null() DA since it means the same.
Please note that in the confusion network, the null() dialogue acts are not explicitly modelled.
-
get_best_da_hyp
(use_log=False, threshold=None, thresholds=None)[source]¶ Return the best dialogue act hypothesis.
- Arguments:
- use_log: whether to express probabilities on the log-scale
- (otherwise, they vanish easily in a moderately long confnet)
- threshold: threshold on probabilities – items with probability
- exceeding the threshold will be present in the output (default: 0.5)
- thresholds: threshold on probabilities – items with probability
- exceeding the threshold will be present in the output. This is a mapping {dai -> threshold}, and if supplied, overwrites settings of `threshold’. If not supplied, it is ignored.
-
get_best_nonnull_da
()[source]¶ Return the best dialogue act (with the highest probability) ignoring the best null() dialogue act item.
Instead of returning the
null()
act, it returns the most probable DAI with a defined slot name.
-
get_da_nblist
(n=10, prune_prob=0.005)[source]¶ Parses the input dialogue act item confusion network and generates N-best hypotheses.
The result is a list of dialogue act hypotheses each with a with assigned probability. The list also include a dialogue act for not having the correct dialogue act in the list - other().
Generation of hypotheses will stop when the probability of the hypotheses is smaller then the
prune_prob
.
-
-
class
alex.components.slu.da.
DialogueActHyp
(prob=None, da=None)[source]¶ Bases:
alex.components.slu.da.SLUHypothesis
Provides functionality of 1-best hypotheses for dialogue acts.
-
class
alex.components.slu.da.
DialogueActItem
(dialogue_act_type=None, name=None, value=None, dai=None, attrs=None, alignment=None)[source]¶ Bases:
alex.ml.features.Abstracted
Represents dialogue act item which is a component of a dialogue act.
Each dialogue act item is composed of
dialogue act type - e.g. inform, confirm, request, select, hello
- slot name and value pair - e.g. area, pricerange, food for name and
centre, cheap, or Italian for value
- Attributes:
- dat: dialogue act type (a string) name: slot name (a string or None) value: slot value (a string or None)
-
add_unnorm_value
(newval)[source]¶ Registers `newval’ as another alternative unnormalised value for the value of this DAI’s slot.
-
alignment
¶
-
category_label2value
(catlabs=None)[source]¶ Use this method to substitute back the original value for the category label as the value of this DAI.
- Arguments:
- catlabs: an optional mapping of category labels to tuples (slot
value, surface form), as obtained from alex.components.slu:SLUPreprocessing
If this object does not remember its original value, it takes it from the provided mapping.
-
dat
¶
-
extension
()[source]¶ Returns an extension of self, i.e., a new DialogueActItem without hidden fields, such as the original value/category label.
-
merge_unnorm_values
(other)[source]¶ Merges unnormalised values of `other’ to unnormalised values of `self’.
-
name
¶
-
normalised2value
()[source]¶ Use this method to substitute back an unnormalised value for the normalised one as the value of this DAI.
Returns True iff substitution took place. Returns False if no more unnormalised values are remembered as a source for the normalised value.
-
orig_values
¶
-
splitter
= u':'¶
-
unnorm_values
¶
-
value
¶
-
class
alex.components.slu.da.
DialogueActNBList
[source]¶ Bases:
alex.components.slu.da.SLUHypothesis
,alex.ml.hypothesis.NBList
Provides functionality of N-best lists for dialogue acts.
When updating the N-best list, one should do the following.
- add DAs or parse a confusion network
- merge and normalise, in either order
- Attributes:
- n_best: the list containing pairs [prob, DA] sorted from the most
- probable to the least probable ones
-
merge
()[source]¶ Adds up probabilities for the same hypotheses. Takes care to keep track of original, unnormalised DAI values. Returns self.
-
class
alex.components.slu.da.
SLUHypothesis
[source]¶ Bases:
alex.ml.hypothesis.Hypothesis
This is the base class for all forms of probabilistic SLU hypotheses representations.
-
alex.components.slu.da.
load_das
(das_fname, limit=None, encoding=u'UTF-8')[source]¶ Loads a dictionary of DAs from a given file.
The file is assumed to contain lines of the following form:
[[:space:]..]<key>[[:space:]..]=>[[:space:]..]<DA>[[:space:]..]or just (without keys):
[[:space:]..]<DA>[[:space:]..]- Arguments:
- das_fname – path towards the file to read the DAs from limit – limit on the number of DAs to read encoding – the file encoding
Returns a dictionary with DAs (instances of DialogueAct) as values.
-
alex.components.slu.da.
merge_slu_confnets
(confnet_hyps)[source]¶ Merge multiple dialogue act confusion networks.
alex.components.slu.dailrclassifier module¶
This is a rewrite of the DAILogRegClassifier from dailrclassifier_old.py
. The underlying approach is the same; however,
the way how the features are computed is changed significantly.
-
class
alex.components.slu.dailrclassifier.
DAILogRegClassifier
(cldb, preprocessing, features_size=4, *args, **kwargs)[source]¶ Bases:
alex.components.slu.base.SLUInterface
Implements learning of dialogue act item classifiers based on logistic regression.
The parser implements a parser based on set of classifiers for each dialogue act item. When parsing the input utterance, the parse classifies whether a given dialogue act item is present. Then, the output dialogue act is composed of all detected dialogue act items.
Dialogue act is defined as a composition of dialogue act items. E.g.
confirm(drinks=”wine”)&inform(name=”kings shilling”) <=> ‘does kings serve wine’
where confirm(drinks=”wine”) and inform(name=”kings shilling”) are two dialogue act items.
This parser uses logistic regression as the classifier of the dialogue act items.
-
abstract_utterance
(utterance)[source]¶ Return a list of possible abstractions of the utterance.
Parameters: utterance – an Utterance instance Returns: a list of abstracted utterance, form, value, category label tuples
-
gen_classifiers_data
(min_pos_feature_count=5, min_neg_feature_count=5, verbose=False, verbose2=False)[source]¶
-
get_abstract_utterance
(utterance, fvc)[source]¶ Return an utterance with the form inn fvc abstracted to its category label
Parameters: - utterance – an Utterance instance
- fvc – a form, value, category label tuple
Returns: return the abstracted utterance
-
get_abstract_utterance2
(utterance)[source]¶ Return an utterance with the form un fvc abstracted to its category label
Parameters: utterance – an Utterance instance Returns: return the abstracted utterance
-
get_features
(obs, fvc, fvcs)[source]¶ Generate utterance features for a specific utterance given by utt_idx.
Parameters: - obs – the utterance being processed in multiple formats
- fvc – a form, value category tuple describing how the utterance should be abstracted
Returns: a set of features from the utterance
-
get_features_in_utterance
(utterance, fvc, fvcs)[source]¶ Returns features extracted from the utterance observation. At this moment, the function extracts N-grams of size self.feature_size. These N-grams are extracted from:
- the original utterance,
- the abstracted utterance for the given FVC
- the abstracted where all other FVCs are abstracted as well
Parameters: - utterance –
- fvc –
Returns: the UtteranceFeatures instance
-
get_fvc
(*args, **kwds)[source]¶ This function returns the form, value, category label tuple for any of the following classses
- Utterance
- UttranceNBList
- UtteranceConfusionNetwork
Parameters: obs – the utterance being processed in multiple formats Returns: a list of form, value, and category label tuples found in the input sentence
-
get_fvc_in_confnet
(confnet)[source]¶ Return a list of all form, value, category label tuples in the confusion network.
Parameters: nblist – an UtteranceConfusionNetwork instance Returns: a list of form, value, and category label tuples found in the input sentence
-
get_fvc_in_nblist
(nblist)[source]¶ Return a list of all form, value, category label tuples in the nblist.
Parameters: nblist – an UtteranceNBList instance Returns: a list of form, value, and category label tuples found in the input sentence
-
get_fvc_in_utterance
(utterance)[source]¶ Return a list of all form, value, category label tuples in the utterance. This is useful to find/guess what category label level classifiers will be necessary to instantiate.
Parameters: utterance – an Utterance instance Returns: a list of form, value, and category label tuples found in the input sentence
-
parse_1_best
(obs={}, ret_cl_map=False, verbose=False, *args, **kwargs)[source]¶ Parse
utterance
and generate the best interpretation in the form of a dialogue act (an instance of DialogueAct).The result is the dialogue act confusion network.
-
parse_confnet
(obs, verbose=False, *args, **kwargs)[source]¶ Parses the word confusion network by generating an n-best list and parsing this n-best list.
-
-
class
alex.components.slu.dailrclassifier.
Features
[source]¶ Bases:
object
This is a simple feature object. It is a light version of an unnecessary complicated alex.ml.features.Features class.
-
merge
(features, weight=1.0, prefix=None)[source]¶ Merges passed feature dictionary with its own features. To the features can be applied weight factor or the features can be added as a binary feature. If a prefix is provided, then the features are added with the prefixed feature name.
Parameters: - features – a dictionary-like object with features as keys and values
- weight – a weight of added features with respect to already existing features. If None, then it is is added as a binary feature
- prefix – prefix for a name of an added features, This is useful when one want to distinguish between similarly generated features
-
-
class
alex.components.slu.dailrclassifier.
UtteranceFeatures
(type=u'ngram', size=3, utterance=None)[source]¶ Bases:
alex.components.slu.dailrclassifier.Features
This is a simple feature object. It is a light version of a alex.components.asr.utterance.UtteranceFeatures class.
alex.components.slu.dainnclassifier module¶
alex.components.slu.exceptions module¶
-
exception
alex.components.slu.exceptions.
DialogueActConfusionNetworkException
[source]¶ Bases:
alex.components.slu.exceptions.SLUException
,alex.ml.hypothesis.ConfusionNetworkException
-
exception
alex.components.slu.exceptions.
SLUException
[source]¶ Bases:
alex.AlexException
alex.components.slu.templateclassifier module¶
-
class
alex.components.slu.templateclassifier.
TemplateClassifier
(config)[source]¶ Bases:
object
This parser is based on matching examples of utterances with known semantics against input utterance. The semantics of the example utterance which is closest to the input utterance is provided as a output semantics.
“Hi” => hello() “I can you give me a phone number” => request(phone) “I would like to have a phone number please” => request(phone)
The first match is reported as the resulting dialogue act.
alex.components.slu.test_da module¶
-
class
alex.components.slu.test_da.
TestDA
(methodName='runTest')[source]¶ Bases:
unittest.case.TestCase