Thursday, October 23, 2014

Some user story : Pro and cons of MangoDB


Why You Should Never Use MongoDB

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/


Does everyone hate MongoDB?


https://blog.serverdensity.com/does-everyone-hate-mongodb/


User of MangoDB

http://www.mongodb.org/about/production-deployments/


Adhar

India’s Unique Identification project, aka Aadhar, is the world’s biggest biometrics database. Aadhar is in the process of capturing demographic and biometric data of over 1.2 billion residents. Aadhar has used MongoDB as one of its database to store this huge amount of data. MongoDB was among several database products, apart from MySQL, Hadoop and HBase, originally procured for running the database search. Here, MySQL is used for storing demographic data and MongoDB is used to store images. According to techcrunch.com, MongoDB has nothing to do with the “sensitive” data.

 Shutterfly

considered a wide variety of alternate database systems, including Cassandra, CouchDB and BerkeleyDB, before settling on the MongoDB. Shutterfly has installed MongoDB for metadata associated with uploaded photos. And for those parts of the application which require richer transactional model, like billing and account management, the traditional RDBMS is still in place.

MetLife:
MetLife is a leading global provider of insurance, annuities and employee benefit programs. They serve about 90 million customers and hold leading market positions in the United States, Japan, Latin America, Asia, Europe and the Middle East. MetLife uses MongoDB for “The Wall”, an innovative customer service application that provides a consolidated view of MetLife customers, including policy details and transactions. The Wall is designed to look and function like Facebook and has improved customer satisfaction and call centre productivity. The Wall brings together data from more than 70 legacy systems and merges it into a single record. It runs across six servers in two data centres and presently stores about 24 terabytes of data. MongoDB-based applications are part of a series of Big Data projects that MetLife is working on to transform the company and bring technology, business and customers together.

eBay:
eBay is an American multinational internet consumer-to-consumer corporation, headquartered in San Jose. eBay has a number of projects running on MongoDB for search suggestions, metadata storage, cloud management and merchandizing categorization.

Wednesday, October 22, 2014

SQL to MongoDB Mapping

 Ref : http://docs.mongodb.org/manual/reference/sql-comparison/

SQL to MongoDB Mapping Chart

In addition to the charts that follow, you might want to consider the Frequently Asked Questions section for a selection of common questions about MongoDB.

Terminology and Concepts

The following table presents the various SQL terminology and concepts and the corresponding MongoDB terminology and concepts.
SQL Terms/Concepts MongoDB Terms/Concepts
database database
table collection
row document or BSON document
column field
index index
table joins embedded documents and linking
primary key
Specify any unique column or column combination as primary key.
In MongoDB, the primary key is automatically set to the _id field.
aggregation (e.g. group by)
aggregation pipeline

Executables

The following table presents some database executables and the corresponding MongoDB executables. This table is not meant to be exhaustive.
  MongoDB MySQL Oracle Informix DB2
Database Server mongod mysqld oracle IDS DB2 Server
Database Client mongo mysql sqlplus DB-Access DB2 Client

Examples

The following table presents the various SQL statements and the corresponding MongoDB statements. The examples in the table assume the following conditions:
  • The SQL examples assume a table named users.
  • The MongoDB examples assume a collection named users that contain documents of the following prototype:
    {
      _id: ObjectId("509a8fb2f3f4948bd2f983a0"),
      user_id: "abc123",
      age: 55,
      status: 'A'
    }
    

Create and Alter

The following table presents the various SQL statements related to table-level actions and the corresponding MongoDB statements.
SQL Schema Statements MongoDB Schema Statements
CREATE TABLE users (
    id MEDIUMINT NOT NULL
        AUTO_INCREMENT,
    user_id Varchar(30),
    age Number,
    status char(1),
    PRIMARY KEY (id)
)
Implicitly created on first insert() operation. The primary key _id is automatically added if _id field is not specified.
db.users.insert( {
    user_id: "abc123",
    age: 55,
    status: "A"
 } )
However, you can also explicitly create a collection:
db.createCollection("users")
ALTER TABLE users
ADD join_date DATETIME
Collections do not describe or enforce the structure of its documents; i.e. there is no structural alteration at the collection level.
However, at the document level, update() operations can add fields to existing documents using the $set operator.
db.users.update(
    { },
    { $set: { join_date: new Date() } },
    { multi: true }
)
ALTER TABLE users
DROP COLUMN join_date
Collections do not describe or enforce the structure of its documents; i.e. there is no structural alteration at the collection level.
However, at the document level, update() operations can remove fields from documents using the $unset operator.
db.users.update(
    { },
    { $unset: { join_date: "" } },
    { multi: true }
)
CREATE INDEX idx_user_id_asc
ON users(user_id)
db.users.ensureIndex( { user_id: 1 } )
CREATE INDEX
       idx_user_id_asc_age_desc
ON users(user_id, age DESC)
db.users.ensureIndex( { user_id: 1, age: -1 } )
DROP TABLE users
db.users.drop()
For more information, see db.collection.insert(), db.createCollection(), db.collection.update(), $set, $unset, db.collection.ensureIndex(), indexes, db.collection.drop(), and Data Modeling Concepts.

Insert

The following table presents the various SQL statements related to inserting records into tables and the corresponding MongoDB statements.
SQL INSERT Statements MongoDB insert() Statements
INSERT INTO users(user_id,
                  age,
                  status)
VALUES ("bcd001",
        45,
        "A")
db.users.insert(
   { user_id: "bcd001", age: 45, status: "A" }
)
For more information, see db.collection.insert().

Select

The following table presents the various SQL statements related to reading records from tables and the corresponding MongoDB statements.
SQL SELECT Statements MongoDB find() Statements
SELECT *
FROM users
db.users.find()
SELECT id,
       user_id,
       status
FROM users
db.users.find(
    { },
    { user_id: 1, status: 1 }
)
SELECT user_id, status
FROM users
db.users.find(
    { },
    { user_id: 1, status: 1, _id: 0 }
)
SELECT *
FROM users
WHERE status = "A"
db.users.find(
    { status: "A" }
)
SELECT user_id, status
FROM users
WHERE status = "A"
db.users.find(
    { status: "A" },
    { user_id: 1, status: 1, _id: 0 }
)
SELECT *
FROM users
WHERE status != "A"
db.users.find(
    { status: { $ne: "A" } }
)
SELECT *
FROM users
WHERE status = "A"
AND age = 50
db.users.find(
    { status: "A",
      age: 50 }
)
SELECT *
FROM users
WHERE status = "A"
OR age = 50
db.users.find(
    { $or: [ { status: "A" } ,
             { age: 50 } ] }
)
SELECT *
FROM users
WHERE age > 25
db.users.find(
    { age: { $gt: 25 } }
)
SELECT *
FROM users
WHERE age < 25
db.users.find(
   { age: { $lt: 25 } }
)
SELECT *
FROM users
WHERE age > 25
AND   age <= 50
db.users.find(
   { age: { $gt: 25, $lte: 50 } }
)
SELECT *
FROM users
WHERE user_id like "%bc%"
db.users.find( { user_id: /bc/ } )
SELECT *
FROM users
WHERE user_id like "bc%"
db.users.find( { user_id: /^bc/ } )
SELECT *
FROM users
WHERE status = "A"
ORDER BY user_id ASC
db.users.find( { status: "A" } ).sort( { user_id: 1 } )
SELECT *
FROM users
WHERE status = "A"
ORDER BY user_id DESC
db.users.find( { status: "A" } ).sort( { user_id: -1 } )
SELECT COUNT(*)
FROM users
db.users.count()
or
db.users.find().count()
SELECT COUNT(user_id)
FROM users
db.users.count( { user_id: { $exists: true } } )
or
db.users.find( { user_id: { $exists: true } } ).count()
SELECT COUNT(*)
FROM users
WHERE age > 30
db.users.count( { age: { $gt: 30 } } )
or
db.users.find( { age: { $gt: 30 } } ).count()
SELECT DISTINCT(status)
FROM users
db.users.distinct( "status" )
SELECT *
FROM users
LIMIT 1
db.users.findOne()
or
db.users.find().limit(1)
SELECT *
FROM users
LIMIT 5
SKIP 10
db.users.find().limit(5).skip(10)
EXPLAIN SELECT *
FROM users
WHERE status = "A"
db.users.find( { status: "A" } ).explain()
For more information, see db.collection.find(), db.collection.distinct(), db.collection.findOne(), $ne $and, $or, $gt, $lt, $exists, $lte, $regex, limit(), skip(), explain(), sort(), and count().

Update Records

The following table presents the various SQL statements related to updating existing records in tables and the corresponding MongoDB statements.
SQL Update Statements MongoDB update() Statements
UPDATE users
SET status = "C"
WHERE age > 25
db.users.update(
   { age: { $gt: 25 } },
   { $set: { status: "C" } },
   { multi: true }
)
UPDATE users
SET age = age + 3
WHERE status = "A"
db.users.update(
   { status: "A" } ,
   { $inc: { age: 3 } },
   { multi: true }
)
For more information, see db.collection.update(), $set, $inc, and $gt.

Delete Records

The following table presents the various SQL statements related to deleting records from tables and the corresponding MongoDB statements.
SQL Delete Statements MongoDB remove() Statements
DELETE FROM users
WHERE status = "D"
db.users.remove( { status: "D" } )
DELETE FROM users
db.users.remove({})

MongoDB on Amazon EC2

Amazon EC2

MongoDB runs well on Amazon EC2. To deploy MongoDB on EC2 you can either set up a new instance manually (refer to Deploy MongoDB on EC2) or deploy a pre-configured AMI from the AWS Marketplace (refer to Deploy from the AWS Marketplace for more information).

Storage Considerations

EC2 instances can be configured either with ephemeral storage or persistent storage using the Elastic Block Store (EBS). Ephemeral storage is lost when instances are terminated so it is generally not recommended for use unless you’re comfortable with the data-loss implications.
For almost all deployments EBS will be the better choice. For production systems we recommend using
  • EBS-optimized EC2 instances
  • Provisioned IOPS (PIOPS) EBS volumes
Storage configurations can vary from one deployment to the next but for the best performance we recommend one volume for each of the following: data directory, journal, and log. Each of those has different write behaviours and we use one volume for each to reduce IO contention. Different RAID levels such as RAID0, RAID1, or RAID10 can also be used to provide volume level redundancy or capacity. Different storage configurations will have different cost implications especially when combined with PIOPS EBS volumes.

Deploy from the AWS Marketplace

There are three officially maintained MongoDB AMIs on the AWS Marketplace. Each AMI comes pre-configured with individual PIOPS EBS volumes for data, journal, and the log.
For specific information about how each instance was configured, refer to Deploy MongoDB on EC2.

Deploy MongoDB on EC2

The following steps can be used to deploy MongoDB on EC2. The instances will be configured with the following characteristics:
  • Amazon Linux
  • MongoDB 2.4.x installed via Yum
  • Individual PIOPS EBS volumes for data (1000 IOPS), journal (250 IOPS), and log (100 IOPS)
  • Updated read-ahead values for each block device
  • Update ulimit settings
Before continuing be sure to have the following:
Create the instance using the key pair and security group previously created and also include the --ebs-optimized flag and specify individual PIOPS EBS volumes (/dev/xvdf for data, /dev/xvdg for journal, /dev/xvdh for log). Refer to the documentation for ec2-run-instances for more information on devices and parameters.:
$ ec2-run-instances ami-05355a6c -t m1.large -g [SECURITY-GROUP] -k [KEY-PAIR] -b "/dev/xvdf=:200:false:io1:1000" -b "/dev/xvdg=:25:false:io1:250" -b "/dev/xvdh=:10:false:io1:100" --ebs-optimized true
You can use the instance-id returned to ascertain the IP Address or DNS information for the instance:
$ ec2-describe-instances [INSTANCE-ID]
Now SSH into the instance:
$ ssh -i /path/to/keypair.pem ec2-user@ec2-1-2-3-4.amazonaws.com
After login, update installed packages, add the MongoDB yum repo, and install MongoDB:
$ sudo yum -y update

$ echo "[MongoDB]
name=MongoDB Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64
gpgcheck=0
enabled=1" | sudo tee -a /etc/yum.repos.d/mongodb.repo

$ sudo yum install -y mongodb-org-server mongodb-org-shell mongodb-org-tools
Next, create/configure the mount points, mount each volume, set ownership (MongoDB runs under the mongod user/group), and set the /journal link:
$ sudo mkdir /data /log /journal

$ sudo mkfs.ext4 /dev/xvdf
$ sudo mkfs.ext4 /dev/xvdg
$ sudo mkfs.ext4 /dev/xvdh

$ echo '/dev/xvdf /data ext4 defaults,auto,noatime,noexec 0 0
/dev/xvdg /journal ext4 defaults,auto,noatime,noexec 0 0
/dev/xvdh /log ext4 defaults,auto,noatime,noexec 0 0' | sudo tee -a /etc/fstab

$ sudo mount /data
$ sudo mount /journal
$ sudo mount /log

$ sudo chown mongod:mongod /data /journal /log

$ sudo ln -s /journal /data/journal
Now configure the following MongoDB parameters by editing the configuration file /etc/mongod.conf:
dbpath = /data
logpath = /log/mongod.log
Optionally, if you don’t want MongoDB to start at boot you can issue the following command:
$ sudo chkconfig mongod off
By default Amazon Linux uses ulimit settings that are not appropriate for MongoDB. To setup ulimit to match the documented ulimit settings use the following steps:
$ sudo nano /etc/security/limits.conf
* soft nofile 64000
* hard nofile 64000
* soft nproc 32000
* hard nproc 32000

$ sudo nano /etc/security/limits.d/90-nproc.conf
* soft nproc 32000
* hard nproc 32000
Additionally, default read ahead settings on EC2 are not optimized for MongoDB. As noted in the read-ahead settings from Production Notes, the settings should be adjusted to read approximately 32 blocks (or 16 KB) of data. The following command will set the readahead appropriately (repeat for necessary volumes):
$ sudo blockdev --setra 32 /dev/xvdf
To make this change persistent across system boot you can issue the following command:
$ echo 'ACTION=="add", KERNEL=="xvdf", ATTR{bdi/read_ahead_kb}="16"' | sudo tee -a /etc/udev/rules.d/85-ebs.rules
Once again, repeat the above command for all required volumes (note: the device we created was named /dev/xvdf but the name used by the system is xvdf).
To startup MongoDB, issue the following command:
$ sudo service mongod start
And now connect to the MongoDB instance using the mongo shell:
$ mongo
MongoDB shell version: 2.4.8
connecting to: test
>
To have MongoDB startup automatically at boot issue the following command:
$ sudo chkconfig mongod on
For production deployments consider using Replica Sets or Sharding.

Backup, Restore, Verify

Depending upon the configuration of your EC2 instances, there are a number of ways to conduct regular backups of your data. For specific instructions on backing up, restoring and verifying refer to EC2 Backup and Restore.

Deployment Notes

Instance Types

MongoDB works on most EC2 types including Linux and Windows. We recommend you use a 64 bit instance as this is required for all MongoDB databases of significant size. Additionally, we find that the larger instances tend to be on the freshest ec2 hardware.

Running MongoDB

Before running the database one should decide where to put datafiles. Run df -h to see volumes. On some images /mnt will be the many locally attached storage volume. Alternatively you may want to use Elastic Block Store which will have a different mount point.
If you mount the file-system, ensure that you mount with the noatime and nodiratime attributes, for example:
/dev/mapper/my_vol /var/lib/mongodb xfs noatime,noexec,nodiratime 0 0
Create the mongodb datafile directory in the desired location and then run the database:
mkdir /mnt/db
./mongod --fork --logpath ~/mongod.log --dbpath /mnt/db/

Operating System

Occasionally, due to the shared and virtualized nature of EC2, an instance can experience intermittent I/O problems and low responsiveness compared to other similar instances. Terminating the instance and bringing up a new one can in some cases result in better performance.
Some people have reported problems with ubuntu 10.04 on ec2.
Please read Ubuntu issue 614853 and Linux Kernel issue 16991 for further information.

Networking

Port Management

By default the database will now be listening on port 27017. The web administrative UI will be on port 28017.

Keepalive

Change the default TCP keepalive time to 300 seconds. See our troubleshooting page for details.

Secure Instances

Restrict access to your instances by using the Security Groups feature within AWS. A Security Group is a set of firewall rules for incoming packets that can apply to TCP, UDP or ICMP.
A common approach is to create a MongoDB security group that contains the nodes of your cluster (replica set members or sharded cluster members), followed by the creation of a separate security group for your app servers or clients.
Create a rule in your MongoDB security group with the “source” field set to the Security Group name containing your app servers and the port set to 27017 (or whatever port you use for your MongoDB). This will ensure that only your app servers have permission to connect to your MongoDB instances.
Remember that Security Groups only control ingress traffic.

Communication Across Regions

Every EC2 instance will have a private IP address that can be used to communicate within the EC2 network. It is also possible to assign a public “elastic” IP to communicate with the servers from another network. If using different EC2 regions, servers can only communicate via public IPs.
To set up a cluster of servers that spans multiple regions, it is recommended to name the server hostname to the “public dns name” provided by EC2. This will ensure that servers from a different network use the public IP, while the local servers use the private IP, thereby saving costs. This is required since EC2 security groups are local to a region.
To control communications between instances in different regions (for example, if you have two members of a replica set in one region and a third member in another), it is possible to use a built-in firewall (such as IPtables on Linux) to restrict allowed traffic to certain (elastic) IP addresses or ports.
For example one solution is following, on each server:
  • set the hostname of the server
sudo hostname server1
  • install “bind”, it will serve as local resolver
  • add a zone for your domain, say “myorg.com”, and add the CNAMEs for all your servers
server1          IN     CNAME   ec2-50-19-237-42.compute-1.amazonaws.com.
server2          IN     CNAME   ec2-50-18-157-87.us-west-1.compute.amazonaws.com.
  • restart bind and modify /etc/resolv.conf to use the local bind
search myorg.conf
nameserver 127.0.0.1
Then:
  • verify that you can properly resolve server1, server2, ... using a tool like dig.
  • when running mongod, db.serverStatus() should show the correct hostname, e.g. “server1:27017”.
  • you can then set up replica sets or shards using the simple hostname. For example connect to server1 and run rs.initiate(), then rs.add('server2:27017').

Ref :  http://docs.mongodb.org/ecosystem/platforms/amazon-ec2/

Install MongoDB

Install MongoDB

 

Packages

MongoDB provides packages of the officially supported MongoDB builds in its own repository. This repository provides the MongoDB distribution in the following packages:

Configure the package management system (YUM).

Create a /etc/yum.repos.d/mongodb.repo file to hold the following configuration information for the MongoDB repository:

If you are running a 64-bit system, use the following configuration:
[mongodb]
name=MongoDB Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
gpgcheck=0
enabled=1 
 
If you are running a 32-bit system, which is not recommended for production deployments, use the following configuration:
[mongodb]
name=MongoDB Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/i686/
gpgcheck=0
enabled=1

Install the MongoDB packages and associated tools.

When you install the packages, you choose whether to install the current release or a previous one. This step provides the commands for both.

To install the latest stable version of MongoDB, issue the following command:
 
sudo yum install -y mongodb-org

To install a specific release of MongoDB, specify each component package individually and append the version number to the package name, as in the following example that installs the 2.6.1` release of MongoDB:

sudo yum install -y mongodb-org-2.6.1 mongodb-org-server-2.6.1 mongodb-org-shell-2.6.1 mongodb-org-mongos-2.6.1 mongodb-org-tools-2.6.1

You can specify any available version of MongoDB. However yum will upgrade the packages when a newer version becomes available. To prevent unintended upgrades, pin the package. To pin a package, add the following exclude directive to your /etc/yum.conf file:

exclude=mongodb-org,mongodb-org-server,mongodb-org-shell,mongodb-org-mongos,mongodb-org-tools


Start MongoDB.

You can start the mongod process by issuing the following command:
sudo service mongod start

Verify that MongoDB has started successfully

You can verify that the mongod process has started successfully by checking the contents of the log file at /var/log/mongodb/mongod.log for a line reading
[initandlisten] waiting for connections on port <port>
where <port> is the port configured in /etc/mongod.conf, 27017 by default.
You can optionally ensure that MongoDB will start following a system reboot by issuing the following command:
sudo chkconfig mongod on

Stop MongoDB.

As needed, you can stop the mongod process by issuing the following command:
sudo service mongod stop

Restart MongoDB.

You can restart the mongod process by issuing the following command:
sudo service mongod restart
You can follow the state of the process for errors or important messages by watching the output in the /var/log/mongodb/mongod.log file.

The MongoDB instance stores its data files in /var/lib/mongo and its log files in /var/log/mongodb by default, and runs using the mongod user account. You can specify alternate log and data file directories in /etc/mongodb.conf. See systemLog.path and storage.dbPath for additional informatio


 Check MongoDB Version and Test Setup
Use following command to check installed mongodb version
# mongo --version

MongoDB shell version: 2.6.0
Connect MongoDB using command line and execute some test commands for checking proper working.
# mongo
> db.test.save( { a: 1 } )
> db.test.find()

  { "_id" : ObjectId("52b0dc8285f8a8071cbb5daf"), "a" : 1 }


Ref : http://docs.mongodb.org/manual/tutorial/install-mongodb-on-red-hat-centos-or-fedora-linux/