function - User defined match terms for sting distance calculation in R -
there many choices of string distance calculation methods in r in package {stringdist}
(https://cran.r-project.org/web/packages/stringdist/stringdist.pdf), curious if possible include user defined match items using regex
or other ways in jaro
or jaro-winker
distance calculations? if not, there other packages provide kind of function?
for example: string "usa starwar corporation"
(a)
, "us starwar corporation"
(b)
, "united states starwar corporation"
(c)
jaro distances between ((a),(b)),((b),(c)),((a),(c))
respectively 0.01449275, 0.2020202, 0.216513
. there way define "usa"
matches "us"
matches"united states"
in calculation , therefore distance 0,0,0
?
thanks!
Comments
Post a Comment