This recording demonstrates how to run an HortonWorks multinode cluster on Docker using CoreOS flannel I love CoreOS flannel and I don’t understand why nobody speaks about it. So, I wanted to find a good example to show the value it provides. Docker is very simple to use when deploying an application running in a single container. But, when you want to deploy a distributed application, there are common challenges: they provide their own resiliency mechanisms, so you need to run the containers on different machines most of the distributed applications are communicating through several ports, so you need to indicate all the ports when starting each container many different applications are using the same ports, so you can’t run two different distributed applications on the same machines if they are using conflicting ports CoreOS flannel allows all the Docker containers running on different machines to communicate between them as if they were all running on the same machine.
This recording demonstrates how to run Apache Spark on Mesos with EMC ECS HDFS storage. With ECS, data can be stored using the Amazon S3 API, the Atmos REST API or the Openstack Swift API, and then accessed through HDFS. For this demo, I will use a Mesos cluster with 3 slaves. Using S3Browser, I can see the input file I’ve uploaded using the Amazon S3 API. This file is an Open Directory RDF dump which contains many URLs.
OVERVIEW Web Automation Center is a web application I’ve written using the ruby on rails framework and running in jruby. The source file is available at: https://github.com/djannot/web-automation-center You just need to run the following commands to start the application: export RAILS_ENV=production jruby -S bundle install jruby -S rails s trinidad The following environment variables must be set to use Amazon S3 (or another Amazon S3 compliant storage platform) to backup/restore data and share data among users: S3_URL (ex: http://s3.amazonaws.com) S3_PORT (ex: 80) S3_BUCKET S3_ACCESS_KEY_ID S3_SECRET_ACCESS_KEY The goal of this application is to allow people to: Learn how to use different REST APIs (Amazon S3, Atmos REST, ViPR Management API, …) Run some demos to demonstrate the API capabilities Automate the creation of resources, for example during a POC Show how object storage can solve many problems (large files upload with Amazon S3 multipart upload for example, …) Share demos with other people Troubleshoot API issues (for example, comparing the response using a true Amazon account and a S3 compliant platform) DOCKER CONTAINER The Dockerfile can be used to create a Docker container for this web application.
Introduction I recently created a similar post using Mesos & Marathon instead of kubernetes and said that kubernetes was at too early stage. This week I attended a kubernetes Meetup in London and I’ve been really amazed by the kubernetes demo from Kelsey Hightower (from the CoreOS team). Then, I decided that I needed to learn more about Kubernetes NOW. Mesos & Marathon are providing HA (with Mesos backup masters and Marathon instances proxying the requests).
Introduction In a previous post I’ve explained how to create a web scale infrastructure based on Docker, CoreOS, Vulcand and Mesos. And why Object Storage becomes the de facto data repository. After this post, I’ve been asked to provide more details about the tool I’ve developped to update the Vulcanproxy rules. The source file is available at: https://github.com/djannot/mesos-vulcanproxy The goal of this project is to automatically create/update Vulcanproxy rules for all the Docker containers created through Mesos Marathon.
Introduction I’ve created Dockerfiles to create a ScaleIO cluster on 3 Docker hosts. This cluster will provide MDM, TB, SDS and Gateway services. Then, other ScaleIO clients (where the SDC service will be installed) will access the ScaleIO volumes. It has been tested with ScaleIO 1.31 using three CoreOS nodes. This gives anyone the ability to create a clean ScaleIO cluster for test purposes. Disclaimer: Don’t use this for production. The Dockerfiles are available at: https://github.com/djannot/scaleio-docker Build To build the docker images, you need to copy the ScaleIO RPM packages for Red Hat 6 in the 3 directories: scaleio-primary-mdm scaleio-secondary-mdm scaleio-tb Then, you can simply run the following command in each directory: docker build .
Because it provides so many advantages While, in the past, most of the unstructured content was stored in local File Systems or NAS appliances, these approaches haven’t been designed for 3rd platform applications. First, when the team in charge of developing a new application needs to ask the team in charge of the storage infrastructure to provide storage, they need to specify how much storage they need. But, they generally have no idea about the capacity they will need as it will depends on many criteria (number of users, data stored by each user, success of the application, …).
Introduction In a previous post I’ve explained how to create a web scale infrastructure based on Docker, CoreOS, Vulcand and Mesos. And why Object Storage becomes the de facto data repository. After this post, I’ve been asked to provide more details about the way I’ve implemented Mesos. I originally implemented Mesos using the Mesos-on-coreos Docker image available at https://registry.hub.docker.com/u/tnolet/mesos-on-coreos/ But, the Docker image was based on an old version of Mesos and wasn’t providing High Availability.
Introduction Let’s first discuss about why I decided to use these software to show how to create a web scale infrastructure. Why Docker ? The first question should even be, why Linux containers ? Linux containers are providing a very low compute and storage overhead compared to virtual machines. Docker has really simplified the way anyone can leverage Linux containers, but also provide useful features like the Dockerfiles, the Docker Hub and a layered file system (see https://docs.docker.com/terms/layer/).
In a previous recording, I’ve explained how to implement the ViPR controller in two different sites, to federate both instances and finally to configure the ViPR Object Service. In this recording I will explain how to use ViPR as a backend for distributed private Docker registries. First, I need to create an Amazon S3 Bucket. Then, I create a config file for the Docker registry container I’ll start in Paris. I indicate the hostname of the ViPR Data Services node located in Paris.
In a previous recording, I’ve explained how to implement the ViPR controller in two different sites, to federate both instances and finally to configure the ViPR Object Service. In this recording I will explain how to configure the Openstack Swift API on ViPR using the same environment. First, I need to create a Swift password on ViPR. Using the ViPR REST Management API, I execute a PUT request on the /object/user-password/the_name_of_the_user path and include a xml document containing the password and the Swift group in the body.
In a previous recording, I’ve explained how to implement the ViPR controller in two different sites, to federate both instances and finally to configure the ViPR Object Service. In this recording I will explain how to configure the CAS API on ViPR using the same environment. First, I create a CAS Cluster using the default options. Then, I create a CAS Pool and I select an Object Virtual Pool which is protecting the data across two locations.
In a previous recording, I’ve explained how to implement the ViPR controller in two different sites, to federate both instances and finally to configure the ViPR Object Service. In this recording I will explain how to configure the Atmos REST API on ViPR using the same environment. First, I need to create an Atmos subtenant on ViPR. Using the Atmos REST API, I execute a PUT request on the /rest/subtenant path and include the header x-emc-uid with a value corresponding to the ViPR username.
This recording explains how to implement the ViPR controller in two different sites, to federate both instances and finally to configure the ViPR Object Service. ViPR 2.0 supports the Amazon S3 API, the Atmos REST API, the CAS API and the Openstack Swift API. When using ViPR 2.0 on traditional arrays (VNX, Isilon or NetApp), the local data protection is managed by the array itself, but the replication is done by ViPR to enable an active/active data access with strong consistency.
This recording explains how Geoparity works on EMC Atmos and the different ways to use this feature. Atmos provides two different Geoparity options: EC 10⁄16 where 10 fragments are created for data and 6 fragments are created for parity. An object can be read if at least 10 fragments are accessible. EC 9⁄12 where 9 fragments are created for data and 3 fragments are created for parity. An object can be read if at least 9 fragments are accessible.
This recording explains how to integrate EMC Atmos and EMC NetWorker. Atmos is a globally accessible and highly scalable cloud storage platform. NetWorker is an enterprise class backup software. NetWorker can backup data or clone backups to Atmos using the Atmos REST API. The lab environment is composed of a NetWorker server version 8 running on Windows 2008 R2 64 bits and an Atmos Gen3 appliance with 4 nodes. 2 Atmos policies have been created.
This recording explains how to create a hybrid cloud with EMC Atmos and how to define policy transitions to automatically move objects from the private cloud to the public cloud. Atmos is a globally accessible and highly scalable cloud storage platform. Atmos Online is a platform which can be used to try Atmos easily. The Atmos Online portal is based on the Atmos Cloud Delivery Platform software, also called ACDP. ACDP can be used by a service provider to create a Storage as a Service offering, but can also be used by other companies to provide self service and to manage internal billing.
This recording explains how to integrate the EMC Atmos and EMC Cloud Tiering Appliance, also known as CTA. Atmos is a globally accessible and highly scalable cloud storage platform. CTA can archive files from an EMC VNX or a NetApp NAS Here, you can see a subtenant and a UID have been created on Atmos for CTA. Atmos can now be added to CTA. You click on the Configuration Tab, then on Files Server.
This recording explains how to integrate EMC Atmos and EMC Syncplicity. Atmos is a globally accessible and highly scalable cloud storage platform. Syncplicity is a File Sync & Share software. The on-premises version of Syncplicity allows a company to store the user data in its own datacenters. When a user adds a file in a folder managed by Syncplicity, the Syncplicity client running on the user’s computer contacts the Syncplicity cloud.
This record shows how to create a fully private “Dropbox like” service using Ctera and EMC Atmos. The Cloud Drive features of Ctera provide to the end user the same flexibility and ease of use he likes when he uses his favorite file sharing tool at home and give him access to his data from any device (Windows, Mac, Ipad, Android, …) and anywhere. EMC Atmos provides, through its REST API, the storage scalability, a high availability and an active/active access from different sites.
This record shows how to do a failover/failback operation. This feature not only allows to recover data from the disaster recovery site, but also to run the backups on this site without any modification of the backup policies. Then, a failback can be done to return on the production site and all the backup done on the disaster recovery site are available.
This record shows how Panzura and EMC Atmos can provide a NAS and backup appliance for remote sites. Panzura provides CIFS/NFS shares capabilities, a global File System with all the data stored in the cloud (Atmos) and a global deduplication. In the first part of this record, a file is copied in the Panzura CIFS share, then the same file is copied again in the CIFS share with another name. The monitoring tools of Panzura and the load balancer demonstrate that no data is transfered to the cloud during the second copy.
This record shows how the new EMC Avamar Extended Retention feature allows to export Avamar backups to tape without any third party backup product. Backups are rehydrated and sent to tape by the Avamar Extended Retention node without any temporary space needed. The goal is to provide an optimized way to keep backups during many years for compliancy purposes. When an exported backup must be recovered, the backup is imported back to the Avamar Extended Retention node which also acts as an Avamar Single Node 7.8 TB.
This record shows how EMC Avamar allows to backup an EMC or a NetApp NAS device (an EMC VNX in this case) using the NDMP protocol. During the first backup, the NAS device create a Level 0 snapshot and sends the data to the Avamar NDMP Accelerator node which chunks the data and send only unique blocks to the Avamar server. All the subsequent backups are done requesting a Level 1 snapshot on the NAS device.
This record shows the EMC Data Domain integration in EMC NetWorker. Using the DDBoost protocol, NetWorker sends to Data Domain only the blocks which are not already stored. When a clone operation is started, if both the source and the destination media are Data Domain Devices, NetWorker start a replication between the the two Data Domain. This gives the user the advantage of a clone operation (both copies are known by NetWorker) and the advantage of a Data Domain replication (only new blocks are sent to the target Data Domain).
This record shows how EMC Avamar allows to do hot backups and recoveries of PostgreSQL databases using a named pipe created by mkfifo. The pg_dump command sends data to the named pipe, then the Avamar avtar command starts backing up this named pipe. This way, you can backup PostgreSQL databases without any temporary space, so you don’t have to compress the dump which will allow you to obtain a good
This record shows how EMC Avamar allows to do hot backups and recoveries of MySQL databases using a named pipe created by mkfifo. The procedure is the same than the one explained in my previous post about PostgreSQL. The mysqldump command sends data to the named pipe, then the Avamar avtar command starts backing up this named pipe. This way, you can backup MySQL databases without any temporary space, so
This record shows how EMC Avamar allows to backup a VM in a VMWare vCloud Director environment. The VM is provisioned using the new Fast Provisioning feature (also known as Linked Clone) of vCloud Director, so the VM uses only few GB of disk storage. Using Avamar, you can leverage all the features of the VMWare vStorage API although the VM is a linked clone. And you can even recover the VM using the unique Change Block Recovery feature provided by Avamar.
This record shows how EMC Avamar allows to backup a VMWare environment. Avamar leverge virtual proxies to scale easily from smaller to bigger VMWare configurations. A proxy is a small VM (1 vCpu, 1GB RAM, 4 GB disk) which chunks the Snapshot of the VM and sends to Avamar only unique blocks. All the subsequent backups are done using the VMWare Change Block Tracking feature. No other full backups have to be done after the first one.
This record shows how EMC Avamar allow to backup Desktops and Laptops with an easy to use web interface. The end user can start a backup, browse his backups to recover a file or a directory and search any files matching a keyword. The web interface is available in several languages for Windows and Mac computers, and can be integrated with a LDAP or Active Directory. If the user has a new computer, he can automatically recover data from his old computer’s backups as Avamar records all the user profiles of a computer during the backup.
This record shows how to protect a IOMEGA Nas px series with EMC Avamar. An Avamar client is natively embedded into this Nas. The ability of Avamar to chunks the new and modified files on the client side and to send to the server only unique blocks allows to backup the Nas very quickly and using a low bandwidth.
This record shows how simple it is to recover an item from a Sharepoint VSS backup with EMC Avamar. The backup policy is defined only for one frontend of the Sharepoint farm. when the backup starts, the Avamar client leverages the VSS features to know all the other frontends and backends of the farm and asks all the Avamar clients of these servers to create a VSS Snapshots and to backup the data.
This record shows how simple it is to recover an email from an Exchange VSS backup with EMC Avamar. The Avamar client chunsk the data and sends to the Avamar server (or to a Data Domain appliance) only unique blocks. Avamar has also the ability to virtually mount the backup of the Exchange database to provide Exchange Granular recovery without recovering the complete Exchange database.