Fine Tune Your Polling and Batching in Mule ESB

They say it's best to learn from others. With that in mind, let's dive into a use case I recently ran into. We were dealing with a number of legacy systems when our company decided to shift to a cloud-based solution. Of course, we had to prepare for the move — and all the complications that came with it.Use CaseWe have a legacy system built with Oracle DB using Oracle forms to create applications and lots and lots of stored procedures in the database. It's also been in use for over 17 years now with no major upgrades or changes. Of course, there have been a lot of development changes over these 17 years that taken the system close to the breaking point and almost impossible to implement something new. So, the company decided to move to CRM (Salesforce) and we needed to transfer data to SF from our legacy database. However, we couldn't create or make any triggers on our database to send real-time data to SF during the transition period.SolutionSo we decided to use Mule Poll to poll our database and get the records in bulk, then send them to SF using the Salesforce Mule connector.I am assuming that we all are clear about polling in general. If not, please refer to references at the end. Also, if you are not clear with Mule polling implementation there are few references at the bottom, too. Sounds simple enough doesn't it? But wait, there are few things to consider.What is the optimum timing of the poll frequency of your polls?How many threads of each poll you want to have? How many active or inactive threads do you want to keep?.How many polls can we write before we break the object store and queue store used by Mule to maintain your polling?What is the impact on server file system if you use watermark values of the object store?How many records can we fetch in one query from the database?How many records can we actually send in bulk to Salesforce using SFDC?These are few, if not all the considerations you have to do before implementation. The major part of polling is the WATERMARK of polling and how Mule implements the watermark in the server.Polling for Updates Using WatermarksRather than polling a resource for all its data with every call, you may want to acquire only the data that has been newly created or updated since the last call. To acquire only new or updated data, you need to keep a persistent record of either the item that was last processed, or the time at which your flow last polled the resource. In the context of Mule flows, this persistent record is called a watermark.To achieve the persistency of watermark, Mule ESB will store the watermarks in the object store of the runtime directory of a project in the ESB server. Depending on the type of object store you have implemented, you may have a SimpleMemoryObjectStore or TextFileObjectStore, which can be configured like below: Below is a simple memory object store sample: Below is text file object store sample: For any kind of object store, Mule ESB creates files in-server, and if the frequency of your polls are not carefully configured, then you may run into file storage issues on your server. For example, if you are running your poll every 10 seconds with multiple threads, and your flow takes more than 10 seconds to send data to SF, then a new object store entry is made to persist the watermark value for each flow trigger, and we will end up with too many files in the server object store.To set these values, we have consider how many records we are fetching from the database, as SF has limit of 200 records that you can send in one bulk. So, if you are fetching 2,000 records, then one batch will call SF 10 times to transfer  these 2,000 records. If your flow takes five seconds to process 200 records, including the network transfer to send data to SF and come back, then your complete poll will take around 50 seconds to transfer 2,000 records.If our polling frequency is 10 seconds, it means we are piling up the object store.Another issue that will arise is the queue store. Because the frequency and execution time have big gaps, the queue store's will also keep queuing. Again, you have to deal with too many files.To resolve this, it’s always a good idea to fine-tune your execution time of the flow and frequency to keep the gap small. To manage the threads, you can use Mule's batch flow threading function to control how many threads you want to run and how many you want to keep active.I hope few of the details may help you set up your polling in a better way.There are few more things we have to consider. What happens when error occurs while sending data? What happens when SF gives you error and can't process your data? What about the types of errors SF will send you? How do you rerun your batch with the watermark value if it failed? What about logging and recovery? I will try to cover these issues in a second blog post.Refrences:https://docs.mulesoft.com/mule-user-guide/v/3.6/poll-reference#polling-for-updates-using-watermarkshttps://docs.mulesoft.com/mule-user-guide/v/3.7/poll-referencehttps://docs.mulesoft.com/mule-user-guide/v/3.7/poll-schedulers#fixed-frequency-schedulerhttps://en.wikipedia.org/wiki/Polling_(computer_science) Read more

Yahoo revenue rises as Internet business improves

Yahoo Inc (YHOO.O) reported a 5.2 percent rise in total quarterly revenue, a sign of improvement in its troubled core Internet business it is auctioning off.Yahoo is in the process of auctioning off its search and advertising business, and reports suggested that a final bidder would be picked on July 18.Verizon Communications Inc (VZ.N) and AT&T Inc (T.N) are said to be in the running, as well as private equity firm TPG Capital and a consortium lead by Quicken Loans founder Dan Gilbert and backed by billionaire Warren Buffett.Yahoo's fortunes have waned under Chief Executive Marissa Mayer, who has made little progress in her attempts to gain ground against newer, bigger Internet players such Facebook Inc (FB.O) and Alphabet Inc's (GOOGL.O) Google.The tepid progress in turning around the business attracted pressure from activist investors who pushed Yahoo to launch an auction of its core business in February. Yahoo has also said it could spin off the business. Revenue in the company's emerging businesses, which Mayer calls Mavens - mobile, video, native and social advertising - rose 25.7 percent to $504 million in the second quarter ended June 30.Total revenue rose to $1.31 billion from $1.24 billion a year earlier. After deducting fees paid to partner websites, revenue fell to $841.2 million from $1.04 billion. Net loss attributable to Yahoo was $439.9 million, or 46 cents per share, in the latest reported quarter, compared with a loss of $21.6 million, or 2 cents per share, a year earlier.Yahoo recorded a non-cash goodwill impairment charge of $395 million related to its Tumblr unit. On an adjusted basis, the company earned 9 cents per share, while analysts were expecting earnings of 10 cents on average, according to Thomson Reuters I/B/E/S.Yahoo's shares were little changed at $37.92 in trading after the bell. (Reporting by Supantha Mukherjee in Bengaluru; Editing by Saumyadeb Chakrabarty) Read more

Solar plane leaves Seville on penultimate leg of round-the-world flight

SEVILLE, Spain An airplane powered solely by energy from the sun took off from southern Spain early on Monday on the penultimate leg of the first ever fuel-free round-the-world flight.The single-seat Solar Impulse 2 lifted off from Seville at 0420 GMT (12:20 a.m. EDT) en route for Cairo, a trip expected to take 50 hours and 30 minutes.The plane has more than 17,0000 solar cells built in to its wings and travels at a cruising speed of around 70 km per hour (43 mph). On its journey, which began in Abu Dhabi and is due to end there, it has been piloted in turns by Swiss aviators Andre Borschberg and Bertrand Piccard. Borschberg is taking this run, the 16th leg, over the Mediterranean Sea, crossing through the airspace of Tunisia, Algeria, Malta, Italy and Greece before ending in Egypt. (Reporting by Marcelo Pozo; Writing by Paul Day; editing by John Stonestreet) Read more

The Life of a Serverless Microservice on AWS

In this post, I will demonstrate how you can develop, test, deploy, and operate a production-ready serverless microservice using the AWS ecosystem. The combination of AWS Lambda and Amazon API Gateway allows us to operate a REST endpoint without the need of any virtual machines. We will use Amazon DynamoDB as our database, Amazon CloudWatch for metrics and logs, and AWS CodeCommit and AWS CodePipeline as our delivery pipeline. In the end, you will know how to wire together a bunch of AWS services to run a system in production.The LifeMy idea of "The Life of a Serverless Microservice on AWS" is best described by this figure:A developer is pushing code changes to a repository. This git push triggers the CI & CD pipeline to deploy a new version of the service, which our users consume. The load generated on the system produces logs and metrics that are used by the developer to operate the system. The operational feedback is used to improve the quality of the system.What is Serverless?Serverless or Function as a Service (FaaS) describes the idea that the deployment unit is a single function. A function takes input and returns output. The responsibility of the FaaS user is to develop the function while the FaaS provider's responsible is to execute the function whenever some event happens. The following figure demonstrates this idea.Some possible events:File uploaded.E-Mail received.Database changed.Manual invoked.HTTP API called.Cron.The cool things about serverless architecture are:You only pay when the function is executed.No under/over provisioning.No boot time.No patching.No SSH.No load balancing.Read more about Serverless Architectures if you are interested in the details.What is a Microservice?Imagine a small system where users have a public visible profile page with location information of that user. The idea of a microservice architecture is that you slice your system into smaller units around bounded contexts. I identified three of them:Authentication Service: Handles authentication.Location Service: Manages location information via a private HTTP API. Uses the Authentication Service internally to authenticate requests.Profile Service: Stores and retrieves the profile via a public HTTP API. Makes an internal call to the Location Service to retrieve the location information.Each service gets its own database, and services are only to communicate with each other over well-defined APIs, not the database!Let's get started!The source code and installation instruction can be found at the bottom of this page. Please use the us-east-1 region! We will use services that are not available in other AWS regions at the moment.CodeAWS CodeCommit is a hosted Git repository that uses IAM for access control. You need to upload your public SSH key to your IAM User as shown in the following figure:Creating a repository is simple. Just click on the Create new Repository button in the AWS Management Console.We need a repository for each service. You can then clone the repository locally with the following command. Replace $SSHKeyID with the SSH Key ID of your IAM user and $RepositoryName with the name of your repository.git clone ssh://$SSHKeyID@git-codecommit.us-east-1.amazonaws.com/v1/repos/$RepositoryName` We now have a home for our code.Continuous Integration & Continuous DeliveryAWS CodePipeline is a service to manage a build and deployment pipeline. CodePipeline itself is only responsible triggering integrations to do things like:Build.TestDeploy.We need a pipeline for each service that:Downloads the sources from CodeCommit if something changes there.Runs our test and bundles the code in a zip file for Lambda.Deploys the zip file.Luckily, CodePipeline has native support for downloading sources from CodeCommit. To run our tests, we will use a third-party integration to trigger Solano CI to run our tests and bundle the source files. The deployment step is implemented in a Lambda function that triggers a CloudFormation stack update. A CloudFormation stack is a bunch of AWS resources managed by CloudFormation based on a template that you provide (Infrastructure as Code). Read more about CloudFormation on our blog.The following figure shows the pipeline:The cool thing about CloudFormation is that you can define the pipeline itself in a template. So we get Pipeline as Code.The CloudFormation template that is used for service deployment describes a Lambda function, a DynamoDB database, and an API Gateway. After deployment you will see one CloudFormation stack for each service:We now have a CI & CD pipeline.ServiceWe use a bunch of AWS services to run our microservices.Amazon API GatewayAPI Gateway is a service that offers a configurable REST API as a service. You describe what should happen if a certain HTTP Method (GET, POST,PUT, DELETE, ...) is called on a certain HTTP Resource (e.g. /user). In our case, we want to execute a Lambda function if an HTTP request comes in. API Gateway also takes care of mapping input and output data between formats. The following figure shows how this looks like in the AWS Management Console for the Profile Service.The API Gateway is a fully managed service. You only pay for requests, no under/over provisioning, no boot time, no patching, no SSH, no load balancing. AWS takes care of all those aspects.Read more about API Gateway on our blogAWS LambdaTo run code in AWS Lambda you need to:use one of the supported runtimes (Node.js (JavaScript), Python, JVM (Java, Scala, ...).implement a predefined interface.The interface in abstract terms requires a function that takes an input parameter and returns void, something, or throws an error.We will use the Node.js runtime where a function implementation looks like this:exports.handler = function(event, context, cb) { console.log(JSON.stringify(event)); // TODO do something cb(null, {name: 'Michael'}); }; In Node.js, the function is not expected to return something. Instead, you need to call the callback function cb that is passed into the function as a parameter.The following figure shows how this looks like in the AWS Management Console for the profile service.AWS Lambda is a fully managed service. You only pay for function executions, no under/over provisioning, no boot time, no patching, no SSH, no load balancing. AWS takes care of all those aspects.Read more about Lambda on our blogAmazon DynamoDBDynamoDB is a Key-Value-Store or Document-Store. You can lookup values by their key. DynamoDB replicates across multiple Availability Zones (data centers) and is eventually consistent.The following figure shows how this looks like in the AWS Management Console for the authentication service.Amazon DynamoDB is a 99% managed service. The 1% that is up to you is that you need to provision read and write capacity. When your service makes more request than provisioned, you will see errors. So it is your job to monitor the consumed capacity to increase the provisioned capacity before you run out of capacity.Read more about DynamoDB on our blogRequest FlowThe three services work together in the following way:The user's HTTP request hits API Gateway. API Gateway checks if the request is valid — if so, it invokes the Lambda function. The function makes one or more requests to the database and executes some business logic. The result of the function is then transformed into an HTTP response by API Gateway.We now have an environment to run our microservices.Logs, Metrics, and AlertingA Blackbox is very hard to operate. That's why we need as much information from the inside of the system as possible. AWS CloudWatch is the right place to store and analyze this kind of information:Metrics (numbers).Logs (text).CloudWatch also lets you define alarms on metrics. The following figure demonstrated how the pieces work together.Operational insights that you get out-of-the-box:Lambda writes STDOUTand STDERR to CloudWatch logs.Lambda publishes metrics to CloudWatch about the number of invocations, runtime duration, the number of failures, etc.API Gateway publishes metrics about the number of requests, 4XX and 5XX Response Codes, etc.DynamoDB publishes metrics about consumed capacity, the number of requests, etc.The following figure shows a CloudWatch alarm that is triggered if the number of throttled read requests of the Location Service DynamoDB table is bigger or equal to one. This situation indicates that the provisioned capacity is not sufficient to serve the traffic.With all those metrics and alarms in place, we now can be confident that we receive an alert if our system is not working properly.SummaryYou can run a high-quality system on AWS by only using managed services. This approach frees you from many operational tasks that are not directly related to your service. Think of operating a monitoring system, a log index system, a database, virtual machines, etc. Instead, you can focus on operating and improving your service's code.The following figure shows the overall architecture of our system:Serverless or FaaS does not force you to use a specific framework. As long as you are fine with the interface (a function with input and output), you can do whatever you want inside your function to produce an output with the given input. Read more

Google beats children's web privacy appeal, Viacom to face one claim

Google and Viacom on Monday defeated an appeal in a nationwide class action lawsuit by parents who claimed the companies illegally tracked the online activity of children under the age of 13 who watched videos and played video games on Nickelodeon's website.By a 3-0 vote, the 3rd U.S. Circuit Court of Appeals in Philadelphia said Google, a unit of Alphabet Inc, and Viacom Inc were not liable under several federal and state laws for planting "cookies" on boys' and girls' computers, to gather data that advertisers could use to send targeted ads.The court also revived one state law privacy claim against Viacom, claiming that it promised on the Nick.com website not to collect children's personal information, but did so anyway.Monday's decision largely upheld a January 2015 ruling by U.S. District Judge Stanley Chesler in Newark, New Jersey. It returned the surviving claim to him.Jay Barnes, a lawyer for the parents, declined to comment.Viacom spokesman Jeremy Zweig said the company is pleased with the dismissals and confident it will prevail on the remaining claim. "Nickelodeon is proud of its record on children's privacy issues and strongly committed to the best practices in the industry," he added. Google did not immediately respond to a request for comment.Monday's decision is a fresh setback for computer users, after the same appeals court last November 10 said Google was not liable under federal privacy laws for bypassing cookie blockers on Apple Inc's Safari browser and Microsoft Corp's Internet Explorer browser.Circuit Judge Julio Fuentes, who wrote both decisions, said that ruling doomed many of the parents' claims against Mountain View, California-based Google and New York-based Viacom. He also rejected the parents' claims under the Video Privacy Protection Act, a 1988 law adopted a year after a newspaper wrote about movies rented by failed Supreme Court nominee Robert Bork, based on a list provided by a video store.Fuentes said the law was meant to thwart the collection of data to help monitor people's video-watching behavior.He said Congress, despite amending the law in 2013, never updated it to cover the collection of data such as users' IP addresses, browser settings and operating settings, and reflect a "contemporary understanding" of Internet privacy. "Some disclosures predicated on new technology, such as the dissemination of precise GPS coordinates or customer ID numbers, may suffice," Fuentes wrote. "But others--including the kinds of disclosures described by the plaintiffs here--are simply too far afield from the circumstances that motivated the act's passage to trigger liability."The revived privacy claim accused Viacom of reneging on a promise on Nick.com that said: "HEY GROWN-UPS: We don't collect ANY personal information about your kids. Which means we couldn't share it even if we wanted to!"Fuentes said a reasonable jury might find Viacom liable for "intrusion upon seclusion" if it found its alleged privacy intrusion "highly offensive to the ordinary reasonable man."The case is In re: Nickelodeon Consumer Privacy Litigation, 3rd U.S. Circuit Court of Appeals, No. 15-1441. (Reporting by Jonathan Stempel in New York; Editing by David Gregorio; Editing by David Gregorio) Read more

Older Post