Neo4j data-model of documents, keywords, and word stems for searching -


my goal 2 different kinds of searches documents using neo4j. i'll use recipes(documents) example. have list of ingredients(key-words) on-hand (milk, butter, flour, salt, sugar, eggs...) , have recipes in database ingredients attached each recipe. i'd input list , 2 different results. 1 recipes closely include ingredients entered. second combinations of recipes include of ingredients.

given: milk, butter, flour, salt, sugar, eggs

a search result first case might be:

1.)sugar cookies

2.)butter cookies

a result second might be:

1.)flat bread , gogel-mogel

i'm reading in recipes insert neo4j, , pulling out ingredients ingredients list @ top of each recipe, recipe instructions. want weigh these differently, maybe 60/40 in favor of ingredients list.

i stem each ingredient in case people enter similar words.

i'm struggling come data model in neo4j. plan user enter english ingredients, , stem them in background, , use searching on.

my first thought was: neo4j data model 1 intuitive me lot of hops find recipes.

next maybe this: neo4j data model 2

which gets recipes directly stems, need pass recipe ids in relationships(right?) actual ingredients.

third, maybe combine them this? neo4j data model 3 there's lots of duplication.

here cypher statements create first idea:

//create 4 recipes create (r1:recipe {rid:'1', title:'sugar cookies'}), (r2:recipe {rid:'2', title:'butter cookies'}),  (r3:recipe {rid:'3', title:'flat bread'}), (r4:recipe {rid:'4', title:'gogel-mogel'})   //adding ingredients merge (i1:ingredient {ingredient:"salted butter"}) merge (i2:ingredient {ingredient:"white sugar"}) merge (i3:ingredient {ingredient:"brown sugar"}) merge (i4:ingredient {ingredient:"all purpose flour"}) merge (i5:ingredient {ingredient:"iodized salt"}) merge (i6:ingredient {ingredient:"eggs"}) merge (i7:ingredient {ingredient:"milk"}) merge (i8:ingredient {ingredient:"powdered sugar"}) merge (i9:ingredient {ingredient:"wheat flour"}) merge (i10:ingredient {ingredient:"bananas"}) merge (i11:ingredient {ingredient:"chocolate chips"}) merge (i12:ingredient {ingredient:"raisins"}) merge (i13:ingredient {ingredient:"unsalted butter"}) merge (i14:ingredient {ingredient:"wheat flour"}) merge (i15:ingredient {ingredient:"himalayan salt"}) merge (i16:ingredient {ingredient:"chocolate bars"}) merge (i17:ingredient {ingredient:"vanilla flavoring"}) merge (i18:ingredient {ingredient:"vanilla"})  //stems added each ingredient merge (i1)<-[:stem_of]-(s1:stem {stem:"butter"}) merge (i2)<-[:stem_of]-(s2:stem {stem:"sugar"}) merge (i3)<-[:stem_of]-(s2) merge (i4)<-[:stem_of]-(s4:stem {stem:"flour"}) merge (i5)<-[:stem_of]-(s5:stem {stem:"salt"}) merge (i6)<-[:stem_of]-(s6:stem {stem:"egg"}) merge (i7)<-[:stem_of]-(s7:stem {stem:"milk"}) merge (i8)<-[:stem_of]-(s2) merge (i9)<-[:stem_of]-(s4) merge (i10)<-[:stem_of]-(s10:stem {stem:"banana"})  merge (i11)<-[:stem_of]-(s11:stem {stem:"chocolate"}) merge (i12)<-[:stem_of]-(s12:stem {stem:"raisin"}) merge (i13)<-[:stem_of]-(s1) merge (i14)<-[:stem_of]-(s4) merge (i15)<-[:stem_of]-(s5) merge (i16)<-[:stem_of]-(s11) merge (i17)<-[:stem_of]-(s13:stem {stem:"vanilla"}) merge (i18)<-[:stem_of]-(s13)   create (r1)<-[:ingredients_list{weight:.7}]-(i1) create (r1)<-[:ingredients_list{weight:.6}]-(i2)     create (r1)<-[:ingredients_list{weight:.5}]-(i4) create (r1)<-[:ingredients_list{weight:.4}]-(i5) create (r1)<-[:ingredients_list{weight:.4}]-(i6) create (r1)<-[:ingredients_list{weight:.2}]-(i7) create (r1)<-[:ingredients_list{weight:.1}]-(i18)  create (r2)<-[:ingredients_list{weight:.7}]-(i1) create (r2)<-[:ingredients_list{weight:.6}]-(i3)     create (r2)<-[:ingredients_list{weight:.5}]-(i4) create (r2)<-[:ingredients_list{weight:.4}]-(i5) create (r2)<-[:ingredients_list{weight:.3}]-(i6) create (r2)<-[:ingredients_list{weight:.2}]-(i7) create (r2)<-[:ingredients_list{weight:.1}]-(i18)  create (r3)<-[:ingredients_list{weight:.7}]-(i1) create (r3)<-[:ingredients_list{weight:.6}]-(i5) create (r3)<-[:ingredients_list{weight:.5}]-(i7) create (r3)<-[:ingredients_list{weight:.4}]-(i9)  create (r4)<-[:ingredients_list{weight:.6}]-(i2) create (r4)<-[:ingredients_list{weight:.5}]-(i6)    create (r1)<-[:ingredients_instr{weight:.2}]-(i1) create (r1)<-[:ingredients_instr{weight:.2}]-(i2)    create (r1)<-[:ingredients_instr{weight:.2}]-(i4) create (r1)<-[:ingredients_instr{weight:.2}]-(i5) create (r1)<-[:ingredients_instr{weight:.1}]-(i6) create (r1)<-[:ingredients_instr{weight:.1}]-(i7)   create (r2)<-[:ingredients_instr{weight:.3}]-(i1) create (r2)<-[:ingredients_instr{weight:.2}]-(i3)    create (r2)<-[:ingredients_instr{weight:.2}]-(i4) create (r2)<-[:ingredients_instr{weight:.2}]-(i5) create (r2)<-[:ingredients_instr{weight:.2}]-(i6) create (r2)<-[:ingredients_instr{weight:.1}]-(i7)   create (r3)<-[:ingredients_instr{weight:.3}]-(i1) create (r3)<-[:ingredients_instr{weight:.3}]-(i5) create (r3)<-[:ingredients_instr{weight:.1}]-(i7) create (r3)<-[:ingredients_instr{weight:.1}]-(i9)  create (r4)<-[:ingredients_instr{weight:.3}]-(i2) create (r4)<-[:ingredients_instr{weight:.3}]-(i6) 

and link neo4j console above statements: http://console.neo4j.org/?id=3o8y44

how neo4j care multiple relationships? also, can single ingredient, how put query recipes given more 1 ingredient?

edit: thank michael! got me further. able expand answer this:

with split("egg, sugar, chocolate, milk, flour, salt",", ") terms  unwind  terms term  match (stem:stem {stem:term})-[:stem_of]-> (ingredient:ingredient)-[lst:ingredients_list]->(r:recipe)  r,  size(terms) - count(distinct stem) notcovered,  sum(lst.weight) weight,  collect(distinct stem.stem) matched  return r , notcovered,matched, weight  order notcovered asc, weight desc 

and got list of ingredients matched , weight. how change query show weights of :ingredients_instr relationship use both weights @ same time calculations? [lst:ingredients_list|ingredients_instr] isn't i'd like.

edit:

this seems work, correct?

with split("egg, sugar, chocolate, milk, flour, salt",", ") terms   unwind  terms term   match (stem:stem {stem:term})-[:stem_of]-> (ingredient:ingredient)-[lstl:ingredients_list]->(r:recipe)<- [lsti:ingredients_instr]-(ingredient:ingredient) r, size(terms) -  count(distinct stem) notcovered,  sum(lsti.weight) wi, sum(lstl.weight)  wl, collect(distinct stem.stem) matched   return r ,  notcovered,matched, wl+wi order notcovered asc, wl+wi desc 

also, second query? given list of ingredients, combinations of recipes returned include given ingredients. again!

i go version 1).

don't worry additional hops. put information amount / weight on relationship between recipe , actual ingredient.

you can have multiple relationships.

here example query, doesn't work dataset have no recipe has ingredients:

with split("milk, butter, flour, salt, sugar, eggs",", ") terms  unwind terms term  match (stem:stem {stem:term})-[:stem_of]->(ingredient:ingredient)-->(r:recipe)  r, size(terms) - count(distinct stem) notcovered  return r order notcovered asc limit 2  +-----------------------------------------+ | r                                       | +-----------------------------------------+ | node[0]{rid:"1",title:"sugar cookies"}  | | node[1]{rid:"2",title:"butter cookies"} | +-----------------------------------------+ 2 rows 

the following optimization large datasets:

and querying first find ingredients, , recipes attached selective 1 (with lowest degree).

and check remaining ingredients against each recipe.


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -