# SMP2019 **Repository Path**: a2798063/SMP2019 ## Basic Information - **Project Name**: SMP2019 - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2019-09-05 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ### SMP2019代码 - 比赛共3个任务:领域分类、意图分类、信息槽填充 - 下发数据train.json 一共2579条数据 - 数据集划分成了训练集train、验证集(开发集)dev和测试集test。 - 训练集分成5折。标签有两种格式txt和npy - NPY格式的domain和intent标签用阿拉伯数字表示,slot是IOB格式 - 使用预训练的BERT词向量,768维,句子最大长度(词数)30 - bert-serving-start -model_dir D:/BERT/chinese_L-12_H-768_A-12 -num_worker=4 -max_seq_len=30 -pooling_strategy=NONE #### Domain 1. 共29个类别,转换字典 domain2id = { 'app': 0, 'bus': 1, 'joke': 2, 'story': 3, 'cinemas': 4, 'contacts': 5, 'cookbook': 6, 'email': 7, 'epg': 8, 'flight': 9, 'health': 10, 'lottery': 11, 'map': 12, 'match': 13, 'message': 14, 'music': 15, 'news': 16, 'novel': 17, 'poetry': 18, 'radio': 19, 'riddle': 20, 'stock': 21, 'telephone': 22, 'train': 23, 'translation': 24, 'tvchannel': 25, 'video': 26, 'weather': 27, 'website': 28 } #### Intent 1. 共24个类别,转换字典: intent2id = { 'LAUNCH': 0, 'QUERY': 1, 'ROUTE': 2, 'SENDCONTACTS': 3, 'SEND': 4, 'REPLY': 5, 'REPLAY_ALL': 6, 'LOOK_BACK': 7, 'NUMBER_QUERY': 8, 'POSITION': 9, 'PLAY': 10, 'DEFAULT': 11, 'DIAL': 12, 'TRANSLATION': 13, 'OPEN': 14, 'CREATE': 15, 'FORWARD': 16, 'VIEW': 17, 'SEARCH': 18, 'RISERATE_QUERY': 19, 'DOWNLOAD': 20, 'UNKNOWN': 21, 'DATE_QUERY': 22, 'CLOSEPRICE_QUERY': 23 } #### Slot Filling 1. 共125个类别,转换字典: {'': 0, 'O': 1, 'B-name': 2, 'I-name': 3, 'B-Src': 4, 'I-Src': 5, 'B-Dest': 6, 'I-Dest': 7, 'B-startLoc_poi': 8, 'I-startLoc_poi': 9, 'B-endLoc_poi': 10, 'I-endLoc_poi': 11, 'B-endLoc_city': 12, 'I-endLoc_city': 13, 'B-startLoc_city': 14, 'I-startLoc_city': 15, 'B-endLoc_province': 16, 'I-endLoc_province': 17, 'B-endLoc_area': 18, 'I-endLoc_area': 19, 'B-location_poi': 20, 'I-location_poi': 21, 'B-location_area': 22, 'I-location_area': 23, 'B-location_city': 24, 'I-location_city': 25, 'B-startDate_date': 26, 'I-startDate_date': 27, 'B-startLoc_area': 28, 'I-startLoc_area': 29, 'B-category': 30, 'I-category': 31, 'B-theatre': 32, 'I-theatre': 33, 'B-film': 34, 'I-film': 35, 'B-datetime_date': 36, 'I-datetime_date': 37, 'B-receiver': 38, 'I-receiver': 39, 'B-headNum': 40, 'I-headNum': 41, 'B-content': 42, 'I-content': 43, 'B-teleOperator': 44, 'I-teleOperator': 45, 'B-ingredient': 46, 'I-ingredient': 47, 'B-dishName': 48, 'I-dishName': 49, 'B-tvchannel': 50, 'I-tvchannel': 51, 'B-datetime_time': 52, 'I-datetime_time': 53, 'B-startDate_time': 54, 'I-startDate_time': 55, 'B-startDate_dateOrig': 56, 'I-startDate_dateOrig': 57, 'B-keyword': 58, 'I-keyword': 59, 'B-type': 60, 'I-type': 61, 'B-song': 62, 'I-song': 63, 'B-artist': 64, 'I-artist': 65, 'B-media': 66, 'I-media': 67, 'B-author': 68, 'I-author': 69, 'B-dynasty': 70, 'B-queryField': 71, 'I-queryField': 72, 'B-location_province': 73, 'I-location_province': 74, 'B-code': 75, 'I-code': 76, 'B-yesterday': 77, 'I-yesterday': 78, 'B-target': 79, 'I-target': 80, 'B-resolution': 81, 'I-resolution': 82, 'B-area': 83, 'I-area': 84, 'B-timeDescr': 85, 'I-timeDescr': 86, 'B-popularity': 87, 'I-popularity': 88, 'B-tag': 89, 'I-tag': 90, 'B-scoreDescr': 91, 'I-scoreDescr': 92, 'B-date': 93, 'I-date': 94, 'B-subfocus': 95, 'B-questionWord': 96, 'I-questionWord': 97, 'B-startLoc_province': 98, 'I-startLoc_province': 99, 'B-relIssue': 100, 'I-relIssue': 101, 'B-location_country': 102, 'I-location_country': 103, 'B-episode': 104, 'I-episode': 105, 'B-artistRole': 106, 'I-artistRole': 107, 'B-utensil': 108, 'I-utensil': 109, 'I-subfocus': 110, 'B-dishNamet': 111, 'I-dishNamet': 112, 'B-homeName': 113, 'I-homeName': 114, 'B-awayName': 115, 'I-awayName': 116, 'B-season': 117, 'I-season': 118, 'B-decade': 119, 'I-decade': 120, 'B-payment': 121, 'I-payment': 122, 'B-absIssue': 123, 'I-absIssue': 124}