猿问

如何在熊猫中搜索单独列的文本匹配项?

我有一个如下所示的数据框(原始):


      Player Name       Headline

1     LeBron James      LeBron James suggests 5-10 games before playoff

2     LeBron James      LeBron James (groin) probable for Thursday 

3     LeBron James      LeBron James overcomes Pelicans with 34/13/12

4     LeBron James      Kyrie Irving (groin) plans to play on Tuesday   

5     LeBron James      LeBron James (rest) questionable Tuesday      

6     LeBron James      LeBron James (leg) will start on Saturday   

7     LeBron James      Kevin Love (hip) is questionable 

8     Ryan Anderson     Anderson (flu) returns against Cavs on Sunday   

9     Ryan Anderson     Ryan Anderson out with respiratory infection   

10    Ryan Anderson     Anderson (rest) not playing 

(text)我想删除标题列中没有的所有行。Injury/Rest另外,我想在Location下面标记两个新列。这就是我为实现这一目标所做的工作:


df['Location'] = df.Headline.str.extract('\((.*)\)')[0]

df = df[df['Location'].notnull()]

df['Injury/Rest'] = np.where(df['Location'].eq('rest'), 'Rest', 'Injury')

新输出:


        Player Name    Headline                                       Location    Injury/Rest

    --  -------------  ---------------------------------------------  ----------  -------------

     2  LeBron James   LeBron James (groin) probable for Thursday     groin       Injury

     4  LeBron James   Kyrie Irving (groin) plans to play on Tuesday  groin       Injury

     5  LeBron James   LeBron James (rest) questionable Tuesday       rest        Rest

     6  LeBron James   LeBron James (leg) will start on Saturday      leg         Injury

     7  LeBron James   Kevin Love (hip) is questionable               hip         Injury

     8  Ryan Anderson  Anderson (flu) returns against Cavs on Sunday  flu         Injury

    10  Ryan Anderson  Anderson (rest) not playing                    rest        Rest



慕姐4208626
浏览 88回答 2
2回答

拉丁的传说

您可以使用 str.extract 使用模式提取所有匹配项,df = df.assign(**df['Headline'].str.extract('(?P<Headline_Player>.*)\s\((?P<Location>.*)\)\s(?P<Status>.*)'))df = df.dropna()df['Injury/Rest'] = np.where(df['Location'].eq('rest'), 'Rest', 'Injury')&nbsp; &nbsp; Player Name&nbsp; &nbsp; &nbsp;Headline&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Headline_Player Location&nbsp; &nbsp; Status&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Injury/Rest1&nbsp; &nbsp;LeBron James&nbsp; &nbsp; LeBron James (groin) probable for Thursday&nbsp; &nbsp; &nbsp; LeBron James&nbsp; &nbsp; groin&nbsp; &nbsp;probable for Thursday&nbsp; &nbsp;Injury3&nbsp; &nbsp;LeBron James&nbsp; &nbsp; Kyrie Irving (groin) plans to play on Tuesday&nbsp; &nbsp;Kyrie Irving&nbsp; &nbsp; groin&nbsp; &nbsp;plans to play on Tuesday&nbsp; &nbsp; Injury4&nbsp; &nbsp;LeBron James&nbsp; &nbsp; LeBron James (rest) questionable Tuesday&nbsp; &nbsp; &nbsp; &nbsp; LeBron James&nbsp; &nbsp; rest&nbsp; &nbsp; questionable Tuesday&nbsp; &nbsp; Rest5&nbsp; &nbsp;LeBron James&nbsp; &nbsp; LeBron James (leg) will start on Saturday&nbsp; &nbsp; &nbsp; &nbsp;LeBron James&nbsp; &nbsp; leg&nbsp; &nbsp; &nbsp;will start on Saturday&nbsp; Injury编辑:要处理像 Unfortunately to hear that LeBron James (groin) probably for Thursday) 这样的边缘情况,您可以使用正则表达式提取两个由空格分隔的字符串。如果名称是两个字符串的形式,这将严格起作用。df.assign(**df['Headline'].str.extract('(?P<Headline_Player>\w+\s\w+)\s\((?P<Location>.*)\)\s(?P<Status>.*)'))

UYOU

这个怎么样?df_new = df[df.Headline.str.contains('\(')].copy()df_new['Headline_Player'] = df_new.Headline.apply(lambda x: x.split('(')[0])df_new['Location']=df.Headline.str.extract('\((.*)\)')[0]df_new['Injury/Rest'] = np.where(df_new['Location'].eq('rest'), 'Rest', 'Injury')df_new['Status'] = df_new.Headline.apply(lambda x: x.split(')')[1])df_new输出Player Name&nbsp; &nbsp; &nbsp;Headline&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Headline_Player&nbsp; &nbsp;Location&nbsp; &nbsp; Injury/Rest&nbsp; &nbsp; &nbsp;StatusLeBron James&nbsp; &nbsp; LeBron James (groin) probable for Thursday&nbsp; &nbsp; LeBron James&nbsp; &nbsp; &nbsp; groin&nbsp; &nbsp;Injury&nbsp; probable for ThursdayLeBron James&nbsp; &nbsp; Kyrie Irving (groin) plans to play on Tuesday Kyrie Irving&nbsp; &nbsp; &nbsp; groin&nbsp; &nbsp;Injury&nbsp; plans to play on TuesdayLeBron James&nbsp; &nbsp; LeBron James (rest) questionable Tuesday&nbsp; &nbsp; &nbsp; LeBron James&nbsp; &nbsp; &nbsp; rest&nbsp; &nbsp; Rest&nbsp; &nbsp; questionable TuesdayLeBron James&nbsp; &nbsp; LeBron James (leg) will start on Saturday&nbsp; &nbsp; &nbsp;LeBron James&nbsp; &nbsp; &nbsp; leg&nbsp; &nbsp; &nbsp;Injury&nbsp; will start on SaturdayLeBron James&nbsp; &nbsp; Kevin Love (hip) is questionable&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Kevin Love&nbsp; &nbsp; &nbsp; &nbsp; hip&nbsp; &nbsp; &nbsp;Injury&nbsp; is questionableRyan Anderson&nbsp; &nbsp;Anderson (flu) returns against Cavs on Sunday Anderson&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; flu&nbsp; &nbsp; &nbsp;Injury&nbsp; returns against Cavs on SundayRyan Anderson&nbsp; &nbsp;Anderson (rest) not playing&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Anderson&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rest&nbsp; &nbsp; Rest&nbsp; &nbsp; not playing
随时随地看视频慕课网APP

相关分类

Python
我要回答