策略(Policies)
策略决定在对话的每个步骤中采取什么行动. 可以同时使用机器学习和基于规则的策略.
可通过在项目的config.yml中指定policy键来自定义助手使用的策略。 有不同的策略可供选择,可在单个配置中包含多个策略。 以下是一个策略列表的示例:
recipe: default.v1
policies:
- name: MemoizationPolicy
- name: TEDPolicy
max_history: 5
epochs: 200
- name: RulePolicy
如果不知道要选择哪些策略,请完全省略config.yml中的策略键。 建议配置功能将提供默认策略。
动作选择(Action Selection)
在每一轮,配置中定义的每个策略都会以一定的置信度预测下一个操作。 以最高置信度预测的策略决定了下一步行动。
默认情况下,最多可以在每条用户消息后预测10个下一步操作。 要更新此值,可以将环境变量MAX_NUMBER_OF_PREDICTIONS设置为所需的最大预测数。
策略优先级
如果两个策略以相等的置信度进行预测(例如,Memoization和Rule Policies可能都以置信度1进行预测),则考虑策略的优先级。 Rasa策略具有默认优先级,以确保在平局的情况下达到预期结果。
数字越高,优先级越高: 6 - RulePolicy 3 - MemoizationPolicy or AugmentedMemoizationPolicy 2 - UnexpecTEDIntentPolicy 1 - TEDPolicy
一般来说,不建议在配置中为每个优先级设置多个策略。 如果有两个具有相同优先级的策略,并且它们以相同的置信度进行预测,则将随机选择结果操作。
如果创建了自己的策略,请使用这些优先级作为确定策略优先级的指南。 如果策略是机器学习策略,那么它很可能具有优先级1,与TEDPolicy相同。
所有策略优先级都可以通过策略配置中的优先级参数进行配置,但不建议在自定义策略等特定情况之外更改它们。 这样做可能会导致意外行为。
机器学习策略(Machine Learning Policies)
TED
TED策略是一种用于下一步动作预测和实体识别的多任务架构。 该架构由多个transformer encoders组成,这些编码器为两个任务共享。 实体标签序列是通过与令牌输入序列对应的用户序列变换器编码器输出之上的条件随机字段(CRF)标记层预测的。 对于下一个动作预测,dialogue transformer encoder输出和系统动作标签被嵌入到单个语义向量空间中。 使用点积损失(dot-product loss)来最大化与目标标签的相似性,并最小化与负样本的相似性。
TED策略架构包括以下步骤:
- 连接特性
- 用户输入(用户意图和实体)或通过用户序列变换器编码器处理的用户文本
- 通过bot序列变换编码器处理的先前系统动作或bot话语
- 插槽和活动表单
对于每个时间步长,将其输入到对话变换器之前的嵌入层的输入向量中。
-
将输入向量的嵌入馈送到对话变换器编码器中。
-
对对话转换器的输出应用密集层,以获得每个时间步的对话嵌入。
-
应用密集层为每个时间步的系统操作创建嵌入。
-
计算对话嵌入和嵌入式系统动作之间的相似性。这一步是基于StarSpace的想法。
-
将用户序列变换器编码器的令牌级输出与对话变换器编码器在每个时间步长的输出连接起来。
-
应用CRF算法预测每个用户文本输入的上下文实体。
配置
以使用config.yml文件将配置参数传递给TEDPolicy。 如果需要微调模型,首先可修改以下参数
- epochs: 此参数设置算法将看到训练数据的轮次数(默认值:300)。 一轮等于所有训练示例的一次前向传播和反向传播。 有时,模型需要更多的时间来正确学习。 有时,更多的轮次不会影响性能。 迭代次数越少,模型的训练速度就越快。 ``` yaml policies:
- name: TEDPolicy epochs: 200 ```
- max_history: 此参数控制模型查看多少对话历史,以决定下一步采取哪个操作。 此策略的默认max_history为None,考虑会话重新启动后的完整对话历史。 如果将模型限制为只看到一定数量的先前对话回合,可将max_history设置为有限值。应仔细选择max_history,以便模型有足够的先前对话回合来创建正确的预测.
policies:
- name: TEDPolicy
max_history: 8
- number_of_transformer_layers: 此参数设置用于用户、动作和动作标签文本以及对话变换编码器的序列变换编码层数.
(默认值:text:1,action_text:1,label_action_text=1,对话:1)。 序列变换编码器层的数量对应于用于模型的变换器块。
-
transformer_size: 此参数设置用于用户、动作和动作标签文本的顺序变换器编码器以及对话变换器编码器的序列变换器编码器层中的单元数。 (默认值:文本:128,action_text:128,label_action_text:128,对话:128)。 从变编码器输出的矢量将具有给定的变大小。
-
connection_density: 此参数定义了模型中所有前馈层设置为非零值的内核权重分数(默认值:0.2)。 该值应介于0和1之间。 如果将connection_density设置为1,则不会将内核权重设置为0,该层充当标准前馈层。 不应该将connection_density设置为0,因为这将导致所有内核权重都为0,即模型无法学习。
split_entities_by_comma: 此参数定义了用逗号分隔的相邻实体是应被视为一个实体,还是应拆分。 例如,具有类型成分的实体,如“苹果,香蕉”,可以分为“苹果”和“香蕉”。
具有类型地址的实体,如“Schönhauser Allee 17510119 Berlin”,应被视为一个实体。
全局范围内可以为True或False:
policies:
- name: TEDPolicy
split_entities_by_comma: True
或按实体类型设置,例如:
policies:
- name: TEDPolicy
split_entities_by_comma:
address: False
ingredients: True
-
constrain_similarities: 当此参数设置为True时,会对所有相似性项应用S形交叉熵损失。这有助于将输入和负标签之间的相似性保持在较小的值。 这应该有助于将模型更好地推广到现实世界的测试集。
-
model_confidence:此参数允许用户配置在推理过程中如何计算置信度。 它只能接受一个值作为输入,即softmax1。 在softmax中,置信度在[0,1]范围内。计算出的相似性用softmax激活函数进行归一化。
-
use_gpu: 此参数定义是否将使用GPU(如果可用)进行训练。 默认情况下,如果GPU可用(即use_GPU为True),TEDPolicy将在GPU上进行训练。 要强制TEDPolicy只使用CPU进行训练,请将use_gpu设置为False。
-
更多可配置参数:
+---------------------------------------+------------------------+--------------------------------------------------------------+ | Parameter | Default Value | Description | +=======================================+========================+==============================================================+ | hidden_layers_sizes | text: [] | Hidden layer sizes for layers before the embedding layers | | | action_text: [] | for user messages and bot messages in previous actions | | | label_action_text: [] | and labels. The number of hidden layers is | | | | equal to the length of the corresponding list. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | dense_dimension | text: 128 | Dense dimension for sparse features to use after they are | | | action_text: 128 | converted into dense features. | | | label_action_text: 128 | | | | intent: 20 | | | | action_name: 20 | | | | label_action_name: 20 | | | | entities: 20 | | | | slots: 20 | | | | active_loop: 20 | | +---------------------------------------+------------------------+--------------------------------------------------------------+ | concat_dimension | text: 128 | Common dimension to which sequence and sentence features of | | | action_text: 128 | different dimensions get converted before concatenation. | | | label_action_text: 128 | | +---------------------------------------+------------------------+--------------------------------------------------------------+ | encoding_dimension | 50 | Dimension size of embedding vectors | | | | before the dialogue transformer encoder. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | transformer_size | text: 128 | Number of units in user text sequence transformer encoder. | | | action_text: 128 | Number of units in bot text sequence transformer encoder. | | | label_action_text: 128 | Number of units in bot text sequence transformer encoder. | | | dialogue: 128 | Number of units in dialogue transformer encoder. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | number_of_transformer_layers | text: 1 | Number of layers in user text sequence transformer encoder. | | | action_text: 1 | Number of layers in bot text sequence transformer encoder. | | | label_action_text: 1 | Number of layers in bot text sequence transformer encoder. | | | dialogue: 1 | Number of layers in dialogue transformer encoder. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | number_of_attention_heads | 4 | Number of self-attention heads in transformers. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | unidirectional_encoder | True | Use a unidirectional or bidirectional encoder | | | | for `text`, `action_text`, and `label_action_text`. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | use_key_relative_attention | False | If 'True' use key relative embeddings in attention. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | use_value_relative_attention | False | If 'True' use value relative embeddings in attention. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | max_relative_position | None | Maximum position for relative embeddings. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | batch_size | [64, 256] | Initial and final value for batch sizes. | | | | Batch size will be linearly increased for each epoch. | | | | If constant `batch_size` is required, pass an int, e.g. `8`. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | batch_strategy | "balanced" | Strategy used when creating batches. | | | | Can be either 'sequence' or 'balanced'. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | epochs | 1 | Number of epochs to train. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | random_seed | None | Set random seed to any 'int' to get reproducible results. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | learning_rate | 0.001 | Initial learning rate for the optimizer. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | embedding_dimension | 20 | Dimension size of dialogue & system action embedding vectors.| +---------------------------------------+------------------------+--------------------------------------------------------------+ | number_of_negative_examples | 20 | The number of incorrect labels. The algorithm will minimize | | | | their similarity to the user input during training. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | similarity_type | "auto" | Type of similarity measure to use, either 'auto' or 'cosine' | | | | or 'inner'. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | loss_type | "cross_entropy" | The type of the loss function, either 'cross_entropy' | | | | or 'margin'. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | ranking_length | 0 | Number of top actions to include in prediction. Confidences | | | | of all other actions will be set to 0. Set to 0 to let the | | | | prediction include confidences for all actions. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | renormalize_confidences | False | Normalize the top predictions. Applicable only with loss | | | | type 'cross_entropy' and 'softmax' confidences. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | maximum_positive_similarity | 0.8 | Indicates how similar the algorithm should try to make | | | | embedding vectors for correct labels. | | | | Should be 0.0 < ... < 1.0 for 'cosine' similarity type. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | maximum_negative_similarity | -0.2 | Maximum negative similarity for incorrect labels. | | | | Should be -1.0 < ... < 1.0 for 'cosine' similarity type. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | use_maximum_negative_similarity | True | If 'True' the algorithm only minimizes maximum similarity | | | | over incorrect intent labels, used only if 'loss_type' is | | | | set to 'margin'. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | scale_loss | True | Scale loss inverse proportionally to confidence of correct | | | | prediction. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | regularization_constant | 0.001 | The scale of regularization. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | negative_margin_scale | 0.8 | The scale of how important it is to minimize the maximum | | | | similarity between embeddings of different labels. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | drop_rate_dialogue | 0.1 | Dropout rate for embedding layers of dialogue features. | | | | Value should be between 0 and 1. | | | | The higher the value the higher the regularization effect. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | drop_rate_label | 0.0 | Dropout rate for embedding layers of label features. | | | | Value should be between 0 and 1. | | | | The higher the value the higher the regularization effect. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | drop_rate_attention | 0.0 | Dropout rate for attention. Value should be between 0 and 1. | | | | The higher the value the higher the regularization effect. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | connection_density | 0.2 | Connection density of the weights in dense layers. | | | | Value should be between 0 and 1. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | use_sparse_input_dropout | True | If 'True' apply dropout to sparse input tensors. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | use_dense_input_dropout | True | If 'True' apply dropout to sparse features after they are | | | | converted into dense features. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | evaluate_every_number_of_epochs | 20 | How often to calculate validation accuracy. | | | | Set to '-1' to evaluate just once at the end of training. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | evaluate_on_number_of_examples | 0 | How many examples to use for hold out validation set. | | | | Large values may hurt performance, e.g. model accuracy. | | | | Keep at 0 if your data set contains a lot of unique examples | | | | of dialogue turns. | | | | Set to 0 for no validation. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | tensorboard_log_directory | None | If you want to use tensorboard to visualize training | | | | metrics, set this option to a valid output directory. You | | | | can view the training metrics after training in tensorboard | | | | via 'tensorboard --logdir <path-to-given-directory>'. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | tensorboard_log_level | "epoch" | Define when training metrics for tensorboard should be | | | | logged. Either after every epoch ('epoch') or for every | | | | training step ('batch'). | +---------------------------------------+------------------------+--------------------------------------------------------------+ | checkpoint_model | False | Save the best performing model during training. Models are | | | | stored to the location specified by `--out`. Only the one | | | | best model will be saved. | | | | Requires `evaluate_on_number_of_examples > 0` and | | | | `evaluate_every_number_of_epochs > 0` | +---------------------------------------+------------------------+--------------------------------------------------------------+ | e2e_confidence_threshold | 0.5 | The threshold that ensures that end-to-end is picked only if | | | | the policy is confident enough. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | featurizers | [] | List of featurizer names (alias names). Only features | | | | coming from the listed names are used. If list is empty | | | | all available features are used. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | entity_recognition | True | If 'True' entity recognition is trained and entities are | | | | extracted. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | constrain_similarities | False | If `True`, applies sigmoid on all similarity terms and adds | | | | it to the loss function to ensure that similarity values are | | | | approximately bounded. | | | | Used only when `loss_type=cross_entropy`. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | model_confidence | "softmax" | Affects how model's confidence for each action | | | | is computed. Currently, only one value is supported: | | | | 1. `softmax` - Similarities between input and action | | | | embeddings are post-processed with a softmax function, | | | | as a result of which confidence for all labels sum up to 1. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | BILOU_flag | True | If 'True', additional BILOU tags are added to entity labels. | +---------------------------------------+------------------------+--------------------------------------------------------------+ | split_entities_by_comma | True | Splits a list of extracted entities by comma to treat each | | | | one of them as a single entity. Can either be `True`/`False` | | | | globally, or set per entity type, such as: | | | | ``` | | | | - name: TEDPolicy | | | | split_entities_by_comma: | | | | address: True | | | | ``` | +---------------------------------------+------------------------+--------------------------------------------------------------+
UnexpecTED Intent Policy
此功能是实验性的。 略
Memoization Policy
MemorizationPolicy会记住训练数据中的故事。 它检查当前对话是否与stories.yml文件中的故事匹配。 如果是这样,它将根据训练数据的匹配故事预测下一步行动,置信度为1.0。 如果没有找到匹配的对话,则策略预测无,置信度为0.0。
在训练数据中查找匹配项时,该策略将考虑对话的最后max_history轮数。 一轮包括用户发送的消息以及助手在等待下一条消息之前执行的任何操作。
可在配置中配置MemorizationPolicy应使用的轮数:
policies:
- name: "MemoizationPolicy"
max_history: 3
Augmented Memoization Policy
会记住训练故事中示例,可达max_history轮,和MemizationPolicy一样。 它还有一个遗忘机制,会忘记对话历史中的一定数量的步骤,并试图在故事中找到与减少的历史相匹配的内容。 如果找到匹配,它将以置信度1.0预测下一个动作,否则它将以置信度0.0预测“无”。
如果对话中,预测时间内设置的某些时段可能未在训练故事中设置(例如,在以提醒开头的训练故事中,并非所有先前的时段都已设置),请确保将没有时段的相关故事也添加到训练数据中。
基于规则的策略(Rule-based Policies)
Rule Policy
RulePolicy是一种处理遵循固定行为(例如业务逻辑)的对话部分的策略。 它根据训练数据中的任何规则进行预测.
policies:
- name: "RulePolicy"
core_fallback_threshold: 0.3
core_fallback_action_name: action_default_fallback
enable_fallback_prediction: true
restrict_rules: true
check_for_contradictions: true
-
core_fallback_threshold(默认值:0.3): 有关更多信息,请参阅回退文档。
-
core_fallback_action_name(默认值:action_default_fallback): 有关更多信息,请参阅回退文档。
-
enable_fallback_pression(默认值:true): 有关更多信息,请参阅回退文档。
check_for_contraditions(默认值:true): 在训练之前,RulePolicy将执行检查,以确保为所有规则一致地定义了由操作设置的slots和active loops。
以下代码片段包含一个不完整规则的示例:
rules:
- rule: complete rule
steps:
- intent: search_venues
- action: action_search_venues
- slot_was_set:
- venues: [{"name": "Big Arena", "reviews": 4.5}]
- rule: incomplete rule
steps:
- intent: search_venues
- action: action_search_venues
在第二个不完整规则中,action_search_venues应该设置venues插槽,因为它是在完整规则中设置的,但缺少此事件。 有几种可能的方法可以修复此规则。
如果action_search_venues找不到venue,并且不应设置venues slot ,则应将插槽的值显式设置为null。 在下面的故事中,RulePolicy仅在未设置插槽位置的情况下预测utter_venues_not_found
rules:
- rule: fixes incomplete rule
steps:
- intent: search_venues
- action: action_search_venues
- slot_was_set:
- venues: null
- action: utter_venues_not_found
如果想让插槽设置由不同的规则或故事处理,应该在规则片段的末尾添加wait_for_user_input:false:
rules:
- rule: incomplete rule
steps:
- intent: search_venues
- action: action_search_venues
wait_for_user_input: false
训炼后,RulePolicy将检查规则或故事是否相互矛盾。 以下代码片段是两个相互矛盾的规则的示例
rules:
- rule: Chitchat
steps:
- intent: chitchat
- action: utter_chitchat
- rule: Greet instead of chitchat
steps:
- intent: chitchat
- action: utter_greet # `utter_greet` contradicts `utter_chitchat` from the rule above
- restrict_rules(默认值:true): 规则仅限于一个用户回合,但可以有多个机器人事件,包括填写表单及其后续提交。将此参数更改为false可能会导致意外行为。
随着复杂性的增加,将规则过度用于推荐用例之外的目的将使维护变得非常困难。
配置策略(Configuring Policies)
最大历史(Max History)
Rasa策略的一个重要超参数是max_history。 控制了模型查看多少对话历史来决定下一步采取什么行动。
可以通过将max_history传递给config.yml中策略配置中的策略来设置它。 默认值为None,这意味着自会话重新启动以来的完整对话历史记录将被记录在帐户中。
policies:
- name: TEDPolicy
max_history: 5
epochs: 200
batch_size: 50
max_training_samples: 300
RulePolicy没有maxhistory参数,它总是考虑所提供规则的完整长度。
举个例子,假设有一个out_of_scope意图,它描述了非主题的用户消息。 如果连续多次看到这种意图,可能想告诉用户可以帮助他们做什么。 所以故事可能看起来像这样:
stories:
- story: utter help after 2 fallbacks
steps:
- intent: out_of_scope
- action: utter_default
- intent: out_of_scope
- action: utter_default
- intent: out_of_scope
- action: utter_help_message
为了让模型学习这种模式,max_history必须至少为4。
如果增加max_history,模型会变得更大,训练时间也会更长。 如果有一些信息会在很长一段时间内影响对话,应该把它存储在一个插槽中。
数据增强(Data Augmentation)
当训练模型时,Rasa会通过随机组合故事文件中的故事来创建更长的故事。 以下面的故事为例
stories:
- story: thank
steps:
- intent: thankyou
- action: utter_youarewelcome
- story: say goodbye
steps:
- intent: goodbye
- action: utter_goodbye
实际上,你想让策略在对话历史不相关时忽略它,并以同样的行动做出回应,无论之前发生了什么。 为了实现这一点,将单个故事连接成更长的故事。 从上面的例子中可以看出,数据增强可能会通过将“谢谢”与“再见”和“再次感谢”相结合来产生一个故事,相当于:
stories:
- story: thank -> say goodbye -> thank
steps:
- intent: thankyou
- action: utter_youarewelcome
- intent: goodbye
- action: utter_goodbye
- intent: thankyou
- action: utter_youarewelcome
可以使用–augmentation标志更改此行为,该标志允许设置augmentation_factor。 augmentation_factor决定了在训练过程中对多少增强故事进行子采样。 在训练之前,对增强的故事进行二次采样,因为它们的数量可能会变得非常大,而需要限制它。 采样的故事数量是augmentation_factorx10。 默认情况下,augmentation_factor设置为50,最多可生成500个增强故事。