Big Data
Home » Big Data » No SQL

Introduction to NoSQL databases

Posted by Vinay Kher | May 16, 2010 | (0) | Add a Comment  |   Bookmark and Share

NoSQL is a new buzzword in software field now-a-days. It is now catching up not just because many software geeks are talking & writing articles about it, rather the key reason behind the popularity of the term is due to the fact that moving out of the theoretical phase now it has already entered into practical stage successfully. A variety of NoSQL products are not only readily available off the shelf in market for using but those products are now mature as well as proven also. These are successfully working in large scale production systems across various domains & proven out to deliver good results. Let’s understand what this hot term exactly means & what it has to offer.

Welcome to the NoSQL universe. Now let’s try to demystify what does exactly NoSQL mean? To easily start with, it simply refers to those databases (database like products) which do not use SQL to store/retrieve data from the underlying data store. Some use to describe it even as “Not Only SQL”. Although the other term non relational database is also used for them but this wide category covers even more other variants (the older ones using other data models like network databases, hierarchical databases,..). When we talk about NoSQL databases in-fact actually many of them are not even databases at all. Thus to be more precise NoSQL is a common term used to cover a wide range of non-relational data stores under one umbrella. Actually these systems are unique key based data storage & retrieval systems (i.e. somewhat similar to data structure map, offering hash table style fast lookup & get/put functionality to store & retrieve data) with an optional combination of few more functionalities like some sort of support to process, replicate, analyze & validate the stored data.

Before naming the NoSQL products let’s try to answer a few key questions related to NoSQL databases.

Why there is a need of NoSQL database?

Following interrelated point will help to understand the reasons.

  • Increased storage & processing support needs: There’s an ongoing demands for a system that will effectively handle & process much more large data volumes & even further continuous to work with ever growing volumes of data
  • Even using relational DB due to various factors there is an upper limit to the volume of data that can be handled effectively. It becomes difficult to manage the data when data volume grows above a certain limit either in single table on even in entire database (Data > EB > PB > TB > GB)
  • Minimizing response time
  • Need of parallel processing to minimize the response time & associated work load as no matter how much sophisticated & high speed hardware you add & how much hardware technology improvements takes place there is always an upper limit after which there in turn comes a demand of parallel processing in order to fulfill the need of processing large amount of data in lesser amount of time. The solution may involve even using grid computing to address this but for this the data must be available locally with each processing node otherwise data transfer itself will be responsible for latency & bottleneck
  • Minimizing dependency on sophisticated high end (& high cost) hardware by offering software solution backed workaround to build an equally fast, reliable & fault tolerant system using comparatively cheaper commodity hardware.
  • Fulfilling the need of scalability just by simply adding more hardware

Standard attributes of NoSQL databases

  • Data Versioning: There is no concept of locking at all which improves performance. Offer concurrent read/write access & write conflicts are properly handled using data versioning solution
  • Provide Map-Reduce or similar capabilities
  • Scalable (either Horizontal or Vertical which highly depends on selected implementation of data store)
  • No support for join operation (thus ideally one should avoid join operations by accordingly organizing the data or if needed at all then should perform the join operation by implementing the join logic at client end code)

Non-standard attributes of NoSQL databases (A particular selected product may or may not offer all of the following features)

  • Persistent storage of data
  • Support for both textual as well as no textual (i.e. binary) data
  • Distributed operation
  • Replication
  • Compaction
  • Indexing
  • Pagination
  • Data is stored as un-interpreted contents (in most of them)
  • View generation support
  • Transactions capability & ACID operation guarantee
  • Fixed table schema enforcement (normally most will allow a schema free or variable schema support)

Downside of using NoSQL databases

  • While using relational database one can quickly switch across other relational databases at least for common operations (i.e. non DB specific operations) just by changing corresponding driver which is not possible with NoSQL databases in any case (either switching from relational to NoSQL or even switching across other NoSQL products)
  • Using NoSQL database in existing project requires code changes especially the complete data access layer needs to be re-written as there is no common API or interface
  • Only one API implementation exists for some products (thus you are left with no other choice if you come across some bug or performance issue in that API)
  • For products where multiple API are supported & that too with different language extensions but as of now there are no common standards or interfaces in different created API (even across the different API created in same language for a particular product). Although the reason for is quite obvious as this baby is being new born ones.
  • Need of modeling the data in a totally different un-normalized way (which at present we are not use to of due to our maturity in thought process with the long experience of relational databases)
  • Application migration from relational database to NoSQL database by keeping the DB layer API constant is relatively more challenging as compared to developing NoSQL DB integration from scratch without any backward compatibility

Classification of NoSQL products

  • Document Databases : Store documents
  • XML Databases : Store XML
  • Object Databases: Store Objects
  • Key/Value databases: Store key: value pairs
  • Column databases: Store columns of data (even varying no of cols with each row)
  • Graph databases: Store the information related to graph data structure

NoSQL Products

Being aware of few notable NoSQL products, just before summarizing them in form of a table here when I tried to search few more products I was surprised to know that many more similar products (50+) are available from different vendors each one offering some different features (apart from the required standard one features) to meet some specific need.

Type

Products

Document Databases

CouchDB (Apache), MongoDB, Riak, Persevere

XML Databases

Xindice (Apache), Sedna, Tamino, eXist

Object Database

db4o

Key-Value & Column Databases

Hbase (Apache), Cassandra, BigTable (Google), Voldemort, Dynamo (Amazon), SimpleDB (Amazon), HyperBase, Hypertable, Redis, Scalaris, MonetDB

Graph Databases

HyperGraphDB, InfoGrid, Neo4j, sonesGraphDB, VertexDB, AllegroGraph

0 Comment for this post

Post a Comment

Required Information *
Name* Email*
Comments*  

*

In accordance with our comment policy, we encourage comments that are on topic, relevant and to-the-point. Once submitted, your comments will be published by the Impetus blog moderator. We will remove comments that include profanity, personal attacks, racial slurs, or threats of violence, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.
Cloud and SAAS model for ISVs Big Data : Open Source Revolution Mature Big Data Open Source products adding excitement
Cloud and SAAS model for ISVs Cloud and SAAS model for ISVs Pankaj Mittal,
CTO & Sr.VP, Impetus
More More More Videos