猿问

dplyr left_join小于,大于条件

这个问题在某种程度上与以下问题有关:在非平凡条件下有效地合并两个数据框,并检查r中的日期是否在两个日期之间。我在这里发布的一个请求功能是否存在: GitHub问题


我希望使用连接两个数据框dplyr::left_join()。我用来加入的条件是小于,大于,即<=和>。是否dplyr::left_join()支持此功能?或仅在键=之间使用操作符。从SQL运行起来很简单(假设我在数据库中有数据框)


这是一个MWE:我有两个数据集,一个企业年(fdata),而第二个是每五年发生一次的调查数据。因此,对于fdata两个调查年度之间的所有年份,我都会加入相应的调查年度数据。


id <- c(1,1,1,1,

        2,2,2,2,2,2,

        3,3,3,3,3,3,

        5,5,5,5,

        8,8,8,8,

        13,13,13)


fyear <- c(1998,1999,2000,2001,1998,1999,2000,2001,2002,2003,

       1998,1999,2000,2001,2002,2003,1998,1999,2000,2001,

       1998,1999,2000,2001,1998,1999,2000)


byear <- c(1990,1995,2000,2005)

eyear <- c(1995,2000,2005,2010)

val <- c(3,1,5,6)


sdata <- tbl_df(data.frame(byear, eyear, val))


fdata <- tbl_df(data.frame(id, fyear))


test1 <- left_join(fdata, sdata, by = c("fyear" >= "byear","fyear" < "eyear"))

我懂了


Error: cannot join on columns 'TRUE' x 'TRUE': index out of bounds 

除非是否left_join可以处理该条件,但是我的语法缺少什么?


慕桂英3389331
浏览 1568回答 3
3回答

精慕HU

使用filter。(但是请注意,此答案不能产生正确的答案LEFT JOIN;但是MWE会给出正确的结果,INNER JOIN而带有a 。)dplyr如果要求合并两个表而没有要合并的内容,则该程序包不满意,因此在下面,我为此在两个表中都创建了一个哑变量,然后进行过滤,然后删除dummy:fdata %>%&nbsp;&nbsp; &nbsp; mutate(dummy=TRUE) %>%&nbsp; &nbsp; left_join(sdata %>% mutate(dummy=TRUE)) %>%&nbsp; &nbsp; filter(fyear >= byear, fyear < eyear) %>%&nbsp; &nbsp; select(-dummy)并注意,如果您在PostgreSQL中进行此操作(例如),查询优化器将通过dummy以下两个查询解释来查看该变量:> fdata %>%&nbsp;+&nbsp; &nbsp; &nbsp;mutate(dummy=TRUE) %>%+&nbsp; &nbsp; &nbsp;left_join(sdata %>% mutate(dummy=TRUE)) %>%+&nbsp; &nbsp; &nbsp;filter(fyear >= byear, fyear < eyear) %>%+&nbsp; &nbsp; &nbsp;select(-dummy) %>%+&nbsp; &nbsp; &nbsp;explain()Joining by: "dummy"<SQL>SELECT "id" AS "id", "fyear" AS "fyear", "byear" AS "byear", "eyear" AS "eyear", "val" AS "val"FROM (SELECT * FROM (SELECT "id", "fyear", TRUE AS "dummy"FROM "fdata") AS "zzz136"LEFT JOIN&nbsp;(SELECT "byear", "eyear", "val", TRUE AS "dummy"FROM "sdata") AS "zzz137"USING ("dummy")) AS "zzz138"WHERE "fyear" >= "byear" AND "fyear" < "eyear"<PLAN>Nested Loop&nbsp; (cost=0.00..50886.88 rows=322722 width=40)&nbsp; Join Filter: ((fdata.fyear >= sdata.byear) AND (fdata.fyear < sdata.eyear))&nbsp; ->&nbsp; Seq Scan on fdata&nbsp; (cost=0.00..28.50 rows=1850 width=16)&nbsp; ->&nbsp; Materialize&nbsp; (cost=0.00..33.55 rows=1570 width=24)&nbsp; &nbsp; &nbsp; &nbsp; ->&nbsp; Seq Scan on sdata&nbsp; (cost=0.00..25.70 rows=1570 width=24)并使用SQL更干净地进行操作会得到完全相同的结果:> tbl(pg, sql("+&nbsp; &nbsp; &nbsp;SELECT *+&nbsp; &nbsp; &nbsp;FROM fdata&nbsp;+&nbsp; &nbsp; &nbsp;LEFT JOIN sdata&nbsp;+&nbsp; &nbsp; &nbsp;ON fyear >= byear AND fyear < eyear")) %>%+&nbsp; &nbsp; &nbsp;explain()<SQL>SELECT "id", "fyear", "byear", "eyear", "val"FROM (&nbsp; &nbsp; SELECT *&nbsp; &nbsp; FROM fdata&nbsp;&nbsp; &nbsp; LEFT JOIN sdata&nbsp;&nbsp; &nbsp; ON fyear >= byear AND fyear < eyear) AS "zzz140"<PLAN>Nested Loop Left Join&nbsp; (cost=0.00..50886.88 rows=322722 width=40)&nbsp; Join Filter: ((fdata.fyear >= sdata.byear) AND (fdata.fyear < sdata.eyear))&nbsp; ->&nbsp; Seq Scan on fdata&nbsp; (cost=0.00..28.50 rows=1850 width=16)&nbsp; ->&nbsp; Materialize&nbsp; (cost=0.00..33.55 rows=1570 width=24)&nbsp; &nbsp; &nbsp; &nbsp; ->&nbsp; Seq Scan on sdata&nbsp; (cost=0.00..25.70 rows=1570 width=24)

慕桂英4014372

看起来这是打包Fuzzyjoin地址的任务。软件包的各种功能与dplyr连接功能相似。在这种情况下,其中一项fuzzy_*_join功能将为您服务。dplyr::left_join和之间的主要区别在于fuzzyjoin::fuzzy_left_join,您提供了在match.fun参数匹配过程中使用的函数列表。请注意,该by参数的写法仍然与相同left_join。下面是一个例子。我使用的功能来匹配顷>=并<为fyear到byear和fyear到eyear的比较,分别。的library(fuzzyjoin)fuzzy_left_join(fdata, sdata,&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;by = c("fyear" = "byear", "fyear" = "eyear"),&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;match_fun = list(`>=`, `<`))Source: local data frame [27 x 5]&nbsp; &nbsp; &nbsp; id fyear byear eyear&nbsp; &nbsp;val&nbsp; &nbsp;(dbl) (dbl) (dbl) (dbl) (dbl)1&nbsp; &nbsp; &nbsp; 1&nbsp; 1998&nbsp; 1995&nbsp; 2000&nbsp; &nbsp; &nbsp;12&nbsp; &nbsp; &nbsp; 1&nbsp; 1999&nbsp; 1995&nbsp; 2000&nbsp; &nbsp; &nbsp;13&nbsp; &nbsp; &nbsp; 1&nbsp; 2000&nbsp; 2000&nbsp; 2005&nbsp; &nbsp; &nbsp;54&nbsp; &nbsp; &nbsp; 1&nbsp; 2001&nbsp; 2000&nbsp; 2005&nbsp; &nbsp; &nbsp;55&nbsp; &nbsp; &nbsp; 2&nbsp; 1998&nbsp; 1995&nbsp; 2000&nbsp; &nbsp; &nbsp;16&nbsp; &nbsp; &nbsp; 2&nbsp; 1999&nbsp; 1995&nbsp; 2000&nbsp; &nbsp; &nbsp;17&nbsp; &nbsp; &nbsp; 2&nbsp; 2000&nbsp; 2000&nbsp; 2005&nbsp; &nbsp; &nbsp;58&nbsp; &nbsp; &nbsp; 2&nbsp; 2001&nbsp; 2000&nbsp; 2005&nbsp; &nbsp; &nbsp;59&nbsp; &nbsp; &nbsp; 2&nbsp; 2002&nbsp; 2000&nbsp; 2005&nbsp; &nbsp; &nbsp;510&nbsp; &nbsp; &nbsp;2&nbsp; 2003&nbsp; 2000&nbsp; 2005&nbsp; &nbsp; &nbsp;5..&nbsp; &nbsp;...&nbsp; &nbsp;...&nbsp; &nbsp;...&nbsp; &nbsp;...&nbsp; &nbsp;...
随时随地看视频慕课网APP
我要回答