hadoop - How much My assumptions /Understanding about "Big data Stack" is farthest from the truth? Specially the Cloud part -


not talking oltp . no sql,cassandra ,hbase. post .

pros

• generalized distributed/ , parallel cluster framework.

• cost : escape licensing , vendor specific storage

why not or original point point

• streaming : nothing new point 1 , 2 in pros section apply.

• unstructured data : needs data model . there . point 1 , 2 in pros section apply.

• mpp : long way go. parquet , adaptive ser-de , off heap processing still long way match traditional rdbms mpp or exadata . lack in multiple read , multiple write of same data because of inherent storage design . need compaction after update.

• low latency need large in memory processing still lack of collocation , relatively less sophisticated algorithms.

  • impala not support orc. hive orc supports acid,update though bad performance . means hive /imapla not option hive/spark can . again imapala outperform spark in low latency,structured queries

• random access: 128 mb or more chunk ,no physical record boundary not suitable general production env data searching,correction must. archiving ,aggregation or may ods.

• rollback : without begin,commit , rollback support tricky. .

cloud infrastructure/management perspective . nothing actual db design, performance . data processing boundary nodes should collocated . sharded nodes can have varying wan timeline .

pardon . correct me m wrong,redundant .


Comments

Popular posts from this blog

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -

python - Error while using APScheduler: 'NoneType' object has no attribute 'now' -