69一区二三区好的精华液,中文字幕无码av波多野吉衣,亚洲精品久久久久久无码色欲四季,日本不卡高字幕在线2019

Lesson 18 命名實體識別 & 關系抽取
來源: 陳仕鴻/
廣東外語外貿大學
3744
0
0
2018-06-04

一、命名實體識別Named Entity Recognition(NER)

NE TypeExamples
組織ORGANIZATIONGeorgia-Pacific Corp.WHO
人物PERSONEddy BontePresident Obama
地點LOCATIONMurray RiverMount Everest
DATEJune2008-06-29
TIMEtwo fifty a m1:30 p.m.
MONEY175 million Canadian DollarsGBP 10.40
百分數PERCENTtwenty pct18.75 %
設施FACILITYWashington MonumentStonehenge
政治地緣實體GPESouth East AsiaMidlothian

s="""The fourth Wells account moving to another agency is the packaged paper-products division of Georgia-Pacific Corp., which arrived at Wells only last fall. Like Hertz and the History Channel, it is also leaving for an Omnicom-owned agency, the BBDO South unit of BBDO Worldwide. BBDO South in Atlanta, which handles corporate advertising for Georgia-Pacific, will assume additional duties for brands like Angel Soft toilet tissue and Sparkle paper towels, said Ken Haldin, a spokesman for Georgia-Pacific in Atlanta."""

s_w=nltk.word_tokenize(s) #分詞 s_tag=nltk.pos_tag(s_w)  #POS 標注 print(nltk.ne_chunk(s_tag)) #ne_chunk命名實體識別函數 #print(nltk.ne_chunk(s_tag, binary=True)) #binary=True,則實體都顯示為NE,否則顯示具體類別


練習:根據上例,完成下面文本的NER。

Guangdong University of Foreign Studies (GDUFS) is a major internationalized university in South China for its global-minded faculty/students and its research on international languages, literature, culture, trade and strategic studies. 

Dating back to 1965 when the Guangzhou Institute of Foreign Languages was established and 1980 when the Guangzhou Institute of Foreign Trade was founded, the University had its present form by merging the two in 1995, with the Guangdong College of Finance and Economics incorporated into the University in 2008. The University has three campuses with a total area of 153 hectares: the North Campus at the foot of the Baiyun Mountain, the South Campus in Guangzhou Higher Education Mega Center, and Dalang Campus.


二、關系抽取

如果命名實體被確定后,就可以實現關系抽取來提取信息。一種方法是:尋找所有的三元組(X,a,Y)。其中X和Y是命名實體,a是表示兩者關系的字符串,示例如下:


import nltk, re

IN = re.compile(r'.*\bin\b') #預先設定好正則表達式,匹配單詞in

for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'):

     for rel in nltk.sem.extract_rels('ORG', 'LOC', doc, corpus='ieer', pattern = IN):

         print(nltk.sem.rtuple(rel))


三、BosonNLP  
https://bosonnlp.com/

中文語義開放平臺


附件

登錄用戶可以查看和發表評論, 請前往  登錄 或  注冊
SCHOLAT.com 學者網
免責聲明 | 關于我們 | 聯系我們
聯系我們:
主站蜘蛛池模板: 孝感市| 含山县| 黔东| 子洲县| 梨树县| 太白县| 青田县| 信阳市| 饶平县| 手游| 鄂伦春自治旗| 建水县| 临汾市| 大连市| 瑞安市| 通州市| 嘉祥县| 平和县| 磐安县| 秦皇岛市| 平陆县| 兴和县| 迁安市| 门源| 彩票| 西宁市| 元谋县| 漳平市| 明光市| 全州县| 平原县| 穆棱市| 区。| 广东省| 通州市| 永德县| 桂阳县| 梅河口市| 商南县| 铜鼓县| 游戏|