{"id":5180,"date":"2020-03-25T14:58:28","date_gmt":"2020-03-25T13:58:28","guid":{"rendered":"https:\/\/lcloud.pl\/?p=5180"},"modified":"2024-12-11T13:09:36","modified_gmt":"2024-12-11T12:09:36","slug":"quick-big-data-processing","status":"publish","type":"post","link":"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/","title":{"rendered":"Quick Big Data processing | Amazon EMR"},"content":{"rendered":"<h6 style=\"text-align: justify;\"><span style=\"color: #979797;\"><strong><span style=\"font-size: 22px;\">Amazon EMR is a service that allows cost-effective and fast processing of large amounts of data. It uses the <span style=\"color: #199ad8;\"><a style=\"color: #199ad8;\" href=\"https:\/\/hadoop.apache.org\/\">Hadoop<\/a><\/span> (open-source data processing software) framework, based on Amazon EC2 and Amazon S3. <\/span><\/strong><\/span><\/h6>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400; font-size: 22px;\">It provides the ability to efficiently process large amounts of data in processes such as:<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">indexing<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">data mining<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">machine learning<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">financial analysis<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400; font-size: 22px;\">Amazon EMR saves us time-consuming configuration, commissioning and management of Hadoop clusters and the computing power that we need. Thanks to this, we can freely build workflows and monitor the progress of big data analysis.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400; font-size: 22px;\">The main unit when using the service is a cluster, which consists of nodes that can perform different functions, i.e. they can be of different types. Amazon EMR, on each type of instance (node), installs other software components, thereby assigning a specific role to the framework (Hadoop Apache). There are 3 types of nodes (nodes):<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\"><strong><span style=\"color: #199ad8;\">Master node<\/span><\/strong> &#8211; responsible for the distribution of data between all nodes. Also, it monitors the progress of the analysis and checks the condition of the entire cluster.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\"><strong><span style=\"color: #199ad8;\">Core node<\/span> <\/strong>&#8211; contains software components that launch task nodes.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\"><strong><span style=\"color: #199ad8;\">Task node<\/span><\/strong> &#8211; contains software components that perform the task and do not store data. This type of node is optional.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400; font-size: 22px;\">After building the cluster, we can proceed to commission it to work. The next step is data analysis.<\/span><\/p>\n<p><iframe loading=\"lazy\" title=\"YouTube video player\" data-cookieconsent=\"statistics, marketing\" data-src=\"https:\/\/www.youtube.com\/embed\/kNsS9aDf6uE?si=vMO0FFzJLaj2McNR\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><div class=\"cookieconsent-optout-statistics cookieconsent-optout-marketing\"><\/div><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400; font-size: 22px;\">Being aware of the benefits of using Amazon EMR, let&#8217;s get to the security issue. The necessity of high data protection is undeniable. Especially those with sensitive status. The service uses such safeguards as:<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">encryption,<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">Amazon VPC,<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">Security Groups,<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">AWS CloudTrail,<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">Amazon EC2 Key Pairs,<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">IAM.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400; font-size: 22px;\">In addition, the service is fully integrated with AWS CloudWatch, which monitors the flow of traffic and activities in the cluster. To control changes in the cluster, we can also use such services as AWS CLI, SDK, API or the AWS console itself. An additional advantage is the ability to reuse a configuration that has already been created while building new clusters.<\/span><\/p>\n<h5 style=\"text-align: justify;\"><strong><span style=\"font-size: 22px; color: #199ad8;\">How much does the solution cost?<\/span><\/strong><\/h5>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400; font-size: 22px;\">Cost estimation is extremely simple.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400; font-size: 22px;\">The service applies:<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">billing per second, which must last a minimum of 60 seconds. So a 10 node cluster operating for 10 hours will cost the same as a 100 node cluster for 1 hour.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">hourly billing, which depends on such factors as &#8211; the type of instance or CPU. Hourly billing is calculated to the nearest second and shows the time in decimal form.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span style=\"font-size: 22px;\"><span style=\"font-weight: 400;\">The exact cost calculation methods can be found in the <\/span><span style=\"color: #199ad8;\"><a style=\"color: #199ad8;\" href=\"https:\/\/aws.amazon.com\/emr\/pricing\/?nc=sn&amp;loc=4\"><span style=\"font-weight: 400;\">Pricing tab<\/span><\/a><\/span><span style=\"font-weight: 400;\">, and the exact billing for use can be found in the <\/span><a href=\"https:\/\/console.aws.amazon.com\/billing\/home\"><span style=\"font-weight: 400;\"><span style=\"color: #199ad8;\">Billing &amp; Cost Management Console<\/span>.<\/span><\/a><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-size: 22px;\"><span style=\"font-weight: 400;\">You can check the availability of the service in individual AWS Regions <\/span><span style=\"color: #199ad8;\"><a style=\"color: #199ad8;\" href=\"https:\/\/aws.amazon.com\/about-aws\/global-infrastructure\/regional-product-services\/\"><span style=\"font-weight: 400;\">here<\/span><\/a><\/span><span style=\"font-weight: 400;\">.<\/span><\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400; font-size: 22px;\">In addition to the obvious advantage of using Amazon EMR, which is optimization and cost reduction during data analysis, there are several other reasons for its implementation.<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">Integrity with other AWS services allows them to be combined quickly and easily, which in turn translates into faster deployment.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">It is highly available and scalable, which is critical.<\/span><\/li>\n<li style=\"font-weight: 400;\"><span style=\"font-weight: 400; font-size: 22px;\">It is secure, thanks to the previously mentioned integrity with AWS services and those responsible for security, thus ensuring a high level of protection for your data.<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400; font-size: 22px;\">We also recommend video from re:Invent<\/span><\/p>\n<p><iframe loading=\"lazy\" title=\"AWS re:Invent 2015 | (BDT208) A Technical Introduction to Amazon Elastic MapReduce\" width=\"500\" height=\"281\" data-cookieconsent=\"statistics, marketing\" data-src=\"https:\/\/www.youtube.com\/embed\/WnFYoiRqEHw?start=2&#038;feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><div class=\"cookieconsent-optout-statistics cookieconsent-optout-marketing\"><\/div><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Amazon EMR is a service that allows cost-effective and fast processing of large amounts of data. It uses the Hadoop (open-source data processing software) framework, based on Amazon EC2 and Amazon S3. It provides the ability to efficiently process large amounts of data in processes such as: indexing data mining machine learning financial analysis Amazon [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":9984,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3],"tags":[30,16,147,35],"class_list":["post-5180","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-aws-en","tag-big-data-en","tag-chmura-obliczeniowa-en","tag-cloud-computing"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Quick Big Data processing | Amazon EMR<\/title>\n<meta name=\"description\" content=\"Amazon EMR is a service that allows cost-effective and fast processing of large amounts of data. It uses the Hadoop framework, based on Amazon EC2 and....\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Quick Big Data processing | Amazon EMR\" \/>\n<meta property=\"og:description\" content=\"Amazon EMR is a service that allows cost-effective and fast processing of large amounts of data. It uses the Hadoop (open-source data processing software) framework, based on Amazon EC2 and Amazon S3.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/\" \/>\n<meta property=\"og:site_name\" content=\"LCloud\" \/>\n<meta property=\"article:published_time\" content=\"2020-03-25T13:58:28+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-12-11T12:09:36+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lcloud.pl\/wp-content\/uploads\/top-amazon-emr-2.3.png\" \/>\n\t<meta property=\"og:image:width\" content=\"598\" \/>\n\t<meta property=\"og:image:height\" content=\"214\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"LCloud\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Quick Big Data processing | Amazon EMR\" \/>\n<meta name=\"twitter:description\" content=\"Amazon EMR is a service that allows cost-effective and fast processing of large amounts of data. It uses the Hadoop (open-source data processing software) framework, based on Amazon EC2 and Amazon S3.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/lcloud.pl\/wp-content\/uploads\/top-amazon-emr-2.3.png\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"LCloud\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/\",\"url\":\"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/\",\"name\":\"Quick Big Data processing | Amazon EMR\",\"isPartOf\":{\"@id\":\"https:\/\/lcloud.pl\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lcloud.pl\/wp-content\/uploads\/Szybkie-przetwarzanie-Big-Data-Amazon-EMR.jpg\",\"datePublished\":\"2020-03-25T13:58:28+00:00\",\"dateModified\":\"2024-12-11T12:09:36+00:00\",\"author\":{\"@id\":\"https:\/\/lcloud.pl\/#\/schema\/person\/4e56c347d5a37e0bd0ae7d8353ac1b0a\"},\"description\":\"Amazon EMR is a service that allows cost-effective and fast processing of large amounts of data. It uses the Hadoop framework, based on Amazon EC2 and....\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/#primaryimage\",\"url\":\"https:\/\/lcloud.pl\/wp-content\/uploads\/Szybkie-przetwarzanie-Big-Data-Amazon-EMR.jpg\",\"contentUrl\":\"https:\/\/lcloud.pl\/wp-content\/uploads\/Szybkie-przetwarzanie-Big-Data-Amazon-EMR.jpg\",\"width\":1440,\"height\":274,\"caption\":\"Szybkie przetwarzanie Big Data Amazon EMR\"},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lcloud.pl\/#website\",\"url\":\"https:\/\/lcloud.pl\/\",\"name\":\"LCloud\",\"description\":\"AWS Advanced Consulting Partner | APN Well-Architected Partner\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lcloud.pl\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/lcloud.pl\/#\/schema\/person\/4e56c347d5a37e0bd0ae7d8353ac1b0a\",\"name\":\"LCloud\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lcloud.pl\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/0d1d7540a45e57ac9534226adcc4ce4700cdb19ae67e134ae46e7f5d9fce93e8?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/0d1d7540a45e57ac9534226adcc4ce4700cdb19ae67e134ae46e7f5d9fce93e8?s=96&d=mm&r=g\",\"caption\":\"LCloud\"},\"url\":\"https:\/\/lcloud.pl\/en\/author\/wpdev\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Quick Big Data processing | Amazon EMR","description":"Amazon EMR is a service that allows cost-effective and fast processing of large amounts of data. It uses the Hadoop framework, based on Amazon EC2 and....","og_locale":"en_US","og_type":"article","og_title":"Quick Big Data processing | Amazon EMR","og_description":"Amazon EMR is a service that allows cost-effective and fast processing of large amounts of data. It uses the Hadoop (open-source data processing software) framework, based on Amazon EC2 and Amazon S3.","og_url":"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/","og_site_name":"LCloud","article_published_time":"2020-03-25T13:58:28+00:00","article_modified_time":"2024-12-11T12:09:36+00:00","og_image":[{"width":598,"height":214,"url":"https:\/\/lcloud.pl\/wp-content\/uploads\/top-amazon-emr-2.3.png","type":"image\/png"}],"author":"LCloud","twitter_card":"summary_large_image","twitter_title":"Quick Big Data processing | Amazon EMR","twitter_description":"Amazon EMR is a service that allows cost-effective and fast processing of large amounts of data. It uses the Hadoop (open-source data processing software) framework, based on Amazon EC2 and Amazon S3.","twitter_image":"https:\/\/lcloud.pl\/wp-content\/uploads\/top-amazon-emr-2.3.png","twitter_misc":{"Written by":"LCloud","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/","url":"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/","name":"Quick Big Data processing | Amazon EMR","isPartOf":{"@id":"https:\/\/lcloud.pl\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/#primaryimage"},"image":{"@id":"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/#primaryimage"},"thumbnailUrl":"https:\/\/lcloud.pl\/wp-content\/uploads\/Szybkie-przetwarzanie-Big-Data-Amazon-EMR.jpg","datePublished":"2020-03-25T13:58:28+00:00","dateModified":"2024-12-11T12:09:36+00:00","author":{"@id":"https:\/\/lcloud.pl\/#\/schema\/person\/4e56c347d5a37e0bd0ae7d8353ac1b0a"},"description":"Amazon EMR is a service that allows cost-effective and fast processing of large amounts of data. It uses the Hadoop framework, based on Amazon EC2 and....","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lcloud.pl\/en\/quick-big-data-processing\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lcloud.pl\/en\/quick-big-data-processing\/#primaryimage","url":"https:\/\/lcloud.pl\/wp-content\/uploads\/Szybkie-przetwarzanie-Big-Data-Amazon-EMR.jpg","contentUrl":"https:\/\/lcloud.pl\/wp-content\/uploads\/Szybkie-przetwarzanie-Big-Data-Amazon-EMR.jpg","width":1440,"height":274,"caption":"Szybkie przetwarzanie Big Data Amazon EMR"},{"@type":"WebSite","@id":"https:\/\/lcloud.pl\/#website","url":"https:\/\/lcloud.pl\/","name":"LCloud","description":"AWS Advanced Consulting Partner | APN Well-Architected Partner","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lcloud.pl\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/lcloud.pl\/#\/schema\/person\/4e56c347d5a37e0bd0ae7d8353ac1b0a","name":"LCloud","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lcloud.pl\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/0d1d7540a45e57ac9534226adcc4ce4700cdb19ae67e134ae46e7f5d9fce93e8?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/0d1d7540a45e57ac9534226adcc4ce4700cdb19ae67e134ae46e7f5d9fce93e8?s=96&d=mm&r=g","caption":"LCloud"},"url":"https:\/\/lcloud.pl\/en\/author\/wpdev\/"}]}},"_links":{"self":[{"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/posts\/5180","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/comments?post=5180"}],"version-history":[{"count":7,"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/posts\/5180\/revisions"}],"predecessor-version":[{"id":9988,"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/posts\/5180\/revisions\/9988"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/media\/9984"}],"wp:attachment":[{"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/media?parent=5180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/categories?post=5180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lcloud.pl\/en\/wp-json\/wp\/v2\/tags?post=5180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}