Neo4j data-model of documents, keywords, and word stems for searching -
my goal 2 different kinds of searches documents using neo4j. i'll use recipes(documents) example. have list of ingredients(key-words) on-hand (milk, butter, flour, salt, sugar, eggs...) , have recipes in database ingredients attached each recipe. i'd input list , 2 different results. 1 recipes closely include ingredients entered. second combinations of recipes include of ingredients.
given: milk, butter, flour, salt, sugar, eggs
a search result first case might be:
1.)sugar cookies
2.)butter cookies
a result second might be:
1.)flat bread , gogel-mogel
i'm reading in recipes insert neo4j, , pulling out ingredients ingredients list @ top of each recipe, recipe instructions. want weigh these differently, maybe 60/40 in favor of ingredients list.
i stem each ingredient in case people enter similar words.
i'm struggling come data model in neo4j. plan user enter english ingredients, , stem them in background, , use searching on.
my first thought was: intuitive me lot of hops find recipes.
next maybe this:
which gets recipes directly stems, need pass recipe ids in relationships(right?) actual ingredients.
third, maybe combine them this? there's lots of duplication.
here cypher statements create first idea:
//create 4 recipes create (r1:recipe {rid:'1', title:'sugar cookies'}), (r2:recipe {rid:'2', title:'butter cookies'}), (r3:recipe {rid:'3', title:'flat bread'}), (r4:recipe {rid:'4', title:'gogel-mogel'}) //adding ingredients merge (i1:ingredient {ingredient:"salted butter"}) merge (i2:ingredient {ingredient:"white sugar"}) merge (i3:ingredient {ingredient:"brown sugar"}) merge (i4:ingredient {ingredient:"all purpose flour"}) merge (i5:ingredient {ingredient:"iodized salt"}) merge (i6:ingredient {ingredient:"eggs"}) merge (i7:ingredient {ingredient:"milk"}) merge (i8:ingredient {ingredient:"powdered sugar"}) merge (i9:ingredient {ingredient:"wheat flour"}) merge (i10:ingredient {ingredient:"bananas"}) merge (i11:ingredient {ingredient:"chocolate chips"}) merge (i12:ingredient {ingredient:"raisins"}) merge (i13:ingredient {ingredient:"unsalted butter"}) merge (i14:ingredient {ingredient:"wheat flour"}) merge (i15:ingredient {ingredient:"himalayan salt"}) merge (i16:ingredient {ingredient:"chocolate bars"}) merge (i17:ingredient {ingredient:"vanilla flavoring"}) merge (i18:ingredient {ingredient:"vanilla"}) //stems added each ingredient merge (i1)<-[:stem_of]-(s1:stem {stem:"butter"}) merge (i2)<-[:stem_of]-(s2:stem {stem:"sugar"}) merge (i3)<-[:stem_of]-(s2) merge (i4)<-[:stem_of]-(s4:stem {stem:"flour"}) merge (i5)<-[:stem_of]-(s5:stem {stem:"salt"}) merge (i6)<-[:stem_of]-(s6:stem {stem:"egg"}) merge (i7)<-[:stem_of]-(s7:stem {stem:"milk"}) merge (i8)<-[:stem_of]-(s2) merge (i9)<-[:stem_of]-(s4) merge (i10)<-[:stem_of]-(s10:stem {stem:"banana"}) merge (i11)<-[:stem_of]-(s11:stem {stem:"chocolate"}) merge (i12)<-[:stem_of]-(s12:stem {stem:"raisin"}) merge (i13)<-[:stem_of]-(s1) merge (i14)<-[:stem_of]-(s4) merge (i15)<-[:stem_of]-(s5) merge (i16)<-[:stem_of]-(s11) merge (i17)<-[:stem_of]-(s13:stem {stem:"vanilla"}) merge (i18)<-[:stem_of]-(s13) create (r1)<-[:ingredients_list{weight:.7}]-(i1) create (r1)<-[:ingredients_list{weight:.6}]-(i2) create (r1)<-[:ingredients_list{weight:.5}]-(i4) create (r1)<-[:ingredients_list{weight:.4}]-(i5) create (r1)<-[:ingredients_list{weight:.4}]-(i6) create (r1)<-[:ingredients_list{weight:.2}]-(i7) create (r1)<-[:ingredients_list{weight:.1}]-(i18) create (r2)<-[:ingredients_list{weight:.7}]-(i1) create (r2)<-[:ingredients_list{weight:.6}]-(i3) create (r2)<-[:ingredients_list{weight:.5}]-(i4) create (r2)<-[:ingredients_list{weight:.4}]-(i5) create (r2)<-[:ingredients_list{weight:.3}]-(i6) create (r2)<-[:ingredients_list{weight:.2}]-(i7) create (r2)<-[:ingredients_list{weight:.1}]-(i18) create (r3)<-[:ingredients_list{weight:.7}]-(i1) create (r3)<-[:ingredients_list{weight:.6}]-(i5) create (r3)<-[:ingredients_list{weight:.5}]-(i7) create (r3)<-[:ingredients_list{weight:.4}]-(i9) create (r4)<-[:ingredients_list{weight:.6}]-(i2) create (r4)<-[:ingredients_list{weight:.5}]-(i6) create (r1)<-[:ingredients_instr{weight:.2}]-(i1) create (r1)<-[:ingredients_instr{weight:.2}]-(i2) create (r1)<-[:ingredients_instr{weight:.2}]-(i4) create (r1)<-[:ingredients_instr{weight:.2}]-(i5) create (r1)<-[:ingredients_instr{weight:.1}]-(i6) create (r1)<-[:ingredients_instr{weight:.1}]-(i7) create (r2)<-[:ingredients_instr{weight:.3}]-(i1) create (r2)<-[:ingredients_instr{weight:.2}]-(i3) create (r2)<-[:ingredients_instr{weight:.2}]-(i4) create (r2)<-[:ingredients_instr{weight:.2}]-(i5) create (r2)<-[:ingredients_instr{weight:.2}]-(i6) create (r2)<-[:ingredients_instr{weight:.1}]-(i7) create (r3)<-[:ingredients_instr{weight:.3}]-(i1) create (r3)<-[:ingredients_instr{weight:.3}]-(i5) create (r3)<-[:ingredients_instr{weight:.1}]-(i7) create (r3)<-[:ingredients_instr{weight:.1}]-(i9) create (r4)<-[:ingredients_instr{weight:.3}]-(i2) create (r4)<-[:ingredients_instr{weight:.3}]-(i6)
and link neo4j console above statements: http://console.neo4j.org/?id=3o8y44
how neo4j care multiple relationships? also, can single ingredient, how put query recipes given more 1 ingredient?
edit: thank michael! got me further. able expand answer this:
with split("egg, sugar, chocolate, milk, flour, salt",", ") terms unwind terms term match (stem:stem {stem:term})-[:stem_of]-> (ingredient:ingredient)-[lst:ingredients_list]->(r:recipe) r, size(terms) - count(distinct stem) notcovered, sum(lst.weight) weight, collect(distinct stem.stem) matched return r , notcovered,matched, weight order notcovered asc, weight desc
and got list of ingredients matched , weight. how change query show weights of :ingredients_instr relationship use both weights @ same time calculations? [lst:ingredients_list|ingredients_instr] isn't i'd like.
edit:
this seems work, correct?
with split("egg, sugar, chocolate, milk, flour, salt",", ") terms unwind terms term match (stem:stem {stem:term})-[:stem_of]-> (ingredient:ingredient)-[lstl:ingredients_list]->(r:recipe)<- [lsti:ingredients_instr]-(ingredient:ingredient) r, size(terms) - count(distinct stem) notcovered, sum(lsti.weight) wi, sum(lstl.weight) wl, collect(distinct stem.stem) matched return r , notcovered,matched, wl+wi order notcovered asc, wl+wi desc
also, second query? given list of ingredients, combinations of recipes returned include given ingredients. again!
i go version 1).
don't worry additional hops. put information amount / weight on relationship between recipe , actual ingredient.
you can have multiple relationships.
here example query, doesn't work dataset have no recipe has ingredients:
with split("milk, butter, flour, salt, sugar, eggs",", ") terms unwind terms term match (stem:stem {stem:term})-[:stem_of]->(ingredient:ingredient)-->(r:recipe) r, size(terms) - count(distinct stem) notcovered return r order notcovered asc limit 2 +-----------------------------------------+ | r | +-----------------------------------------+ | node[0]{rid:"1",title:"sugar cookies"} | | node[1]{rid:"2",title:"butter cookies"} | +-----------------------------------------+ 2 rows
the following optimization large datasets:
and querying first find ingredients, , recipes attached selective 1 (with lowest degree).
and check remaining ingredients against each recipe.
Comments
Post a Comment