As data continues to grow in both volume and formats across multiple deployments, performing analytics has become more complicated. By 2019, 75% of analytic solutions will incorporate 10 or more external data sources. Organizations able to glean insights from this diverse data set will have competitive advantages, from deeper understanding of customers, better responsiveness to trends, and more efficient operations, to name just a few.
This reality of data diversity has given rise to the “data lake”-a data management architecture that allows organizations store and analyze a wide variety structured and unstructured data.
A data lake is a method of data storage. What makes this approach unique is that all of the data is stored in its native format. This means that data in the lake might include everything from highly structured files to completely unstructured data such as videos, emails and images.
In addition, it is not only IT that is now integrating data. Business users are also getting involved with new self-service data preparation tools. The question is, is this the only way to manage data? Is there another level that we can get reach to allow us to more easily manage and govern data across an increasingly complex data landscape?
This seminar/Conference looks at the challenges faced by companies trying to deal with an exploding number of data sources, collecting data in multiple data stores (Cloud and on-premises), multiple analytical systems and at the requirements to be able to define, govern, manage and share trusted high quality information in a distributed and hybrid computing environment.
It also explores a new approach of how IT data architects, business users and IT developers can collaborate together in building and managing a Logical Data Lake to get control of your data. This includes data ingestion, automated data discovery, data profiling and tagging and publishing data in an information catalog.
It also involves refining raw data to produce Enterprise Data Services that can be published in a catalog available for consumption across your company. We also introduce multiple Data Lake configurations including a centralised Data Lake and a ‘logical’ distributed Data Lake as well as execution of jobs and governance across multiple data stores.
Recent times, globally there is an increasing deployment of data lakes in banking and financial services sector is boosting the growth of the market. Data lake describes any large data pool in which the data requirements are not defined until the data is queried. Data lakes provide various benefits like scalability and can accommodate high speed data. It also offers advanced analytics by utilizing the availability of large quantities of coherent data. Global Data Lakes Market was valued at USD 3.24 billion in 2017, and is expected to reach a value of USD 14.01 billion by 2023 at a CAGR of 27.4%, over the forecast period (2018-2023). Data Lakes have become an economical option for many companies rather than an option for data warehousing. The growing use of IoT in many offices and informal spaces has emphasized in the need for data lakes for quicker and efficient manipulation of data. The data lakes market is segmented by software and services. The services segment is expected to grow at the highest CAGR during the forecast period, out of which data lakes services in the services segment is projected to witness the highest demand due to the growing need of data lake software solutions across organizations. Major business functions for which the data lake is deployed are marketing, sales, operations, finance, and human resources. Organizations are fast deploying data lakes solutions either on their premises or on-demand. The demand for on-demand or cloud-based data lakes market solutions is increasing due to the cost-effective and time-efficient features; its growth is specifically high in SME\'s where low cost solutions are much required.
A NEW commodity spawns a lucrative, fast-growing industry, prompting antitrust regulators to step in to restrain those who control its flow. A century ago, the resource in question was oil. Now similar concerns are being raised by the giants that deal in data, the oil of the digital era. These titans—Alphabet (Google’s parent company), Amazon, Apple, Facebook and Microsoft—look unstoppable. They are the five most valuable listed firms in the world. Their profits are surging: they collectively racked up over $25bn in net profit in the first quarter of 2017. Amazon captures half of all dollars spent online in America. Google and Facebook accounted for almost all the revenue growth in digital advertising in America last year.
Such dominance has prompted calls for the tech giants to be broken up, as Standard Oil was in the early 20th century. This newspaper has argued against such drastic action in the past. Size alone is not a crime. The giants’ success has benefited consumers. Few want to live without Google’s search engine, Amazon’s one-day delivery or Facebook’s newsfeed. Nor do these firms raise the alarm when standard antitrust tests are applied. Far from gouging consumers, many of their services are free (users pay, in effect, by handing over yet more data). Take account of offline rivals, and their market shares look less worrying. And the emergence of upstarts like Snapchat suggests that new entrants can still make waves.
Mobile advertisement is increasingly used among many applications and allows developers to obtain revenue through in-app advertising. With the growth of mobile advertising, fraud has become an issue as unscrupulous operators use various means to drive advertising revenue toward them. Mobile advertising is a data rich world, and I will talk about how various signals are combined to detect and mitigate invalid traffic. Different methods of detecting invalid traffic from simple rule based blocks to sophisticated Generative Adversarial Networks for detecting novel types of fraud will be discussed.
Today’s organizations increasingly rely on leveraging their exponentially growing data stores to gain actionable business insights. The shift to a data-driven approach has caused many organizations to store data in large data lakes, yet their analytical platforms struggle with the growing volumes of data. Analytics can take hours or even days, and require time consuming preparation. Some complex analytics simply cannot be done.
This session will demonstrate:
For machine intelligence to work, one needs historical data to build the intelligence and current data (or data as of the present moment) to apply the intelligence. Data is the competitive edge for an organization to build an edge in AI and Machine Learning However, the data in most organizations are extremely disaggregated.
Putting all the data together and resolving data quality challenges is the first and the most difficult step in building machine intelligence. In addition, the ability of tapping into new data sources or obtaining external data remains the key differentiator in building strong machine intelligence that is difficult for a competitor to replicate.
While data consolidation is a foundation for the success of any data driven strategy; yet data consolidation in itself does not provide immediate business value to an organization. To demonstrate the ROI of the data lake, it was critical to allow business users to build it through a drag and drop functionality in such a way that the data lake is built in an agile approach to support key businesses use cases. Hence, instead of taking a conventional approach a use case driven approach was adopted. But a risk of the use case driven approach is the fact that the data lake may become too narrow in scope. Extensive flexibility and a component driven solution was created to ensure that the data lake is not limited to a few use cases, but is built as a generic solution.
The approach for building the data lake and providing business users the power to build it incrementally will be discussed in the presentation.
How to avoid data lake failures
integration, quality and governance
looking into the following aspects:
a) Limitations of open source tooling and hand coding for DIQG
b) Examples of unsuccessful data lake implementations
c) Examples of data lake best practices for DIQG
No. 174, No. 28, Sankey Rd, P.B, Vasanth Nagar, Bengaluru, Karnataka 560052
080 2226 2233
Confirm your CANCELLATION in writing up to 15 working days before the event and receive a refund less a 10% service charge. Regrettably, no refunds can be made for cancellations received less than 15 working days prior to the event.
However, SUBSTITUTIONS are welcome at any time and is done at no extra cost. The organisers reserve the right to amend the programme if necessary.
Important Disclaimer:The organizers reserve the right to make substitutions or alterations and/or cancel a speaker(s) if deemed necessary by circumstances beyond its control.
INDEMNITY: Should for any reason outside the control of UNICOM Training & Seminars (P) ltd (hereafter called UNICOM), the venue or the speakers change, or the event be cancelled due to industrial action, adverse weather conditions, or an act of terrorism, UNICOM will endeavour to reschedule, but the client hereby indemnifies and holds UNICOM harmless from and against any and all costs, damages and expenses, including attorneys fees, which are incurred by the client. The construction validity and performance of this Agreement shall be governed by all aspects by the laws of India to the exclusive jurisdiction of whose court the Parties hereby agree to submit.