Machine Learning (ML) Meets Big Data
In the past few years, more data has been produced than whole human history before. This data represents a gold mine in terms of commercial value and also important reference material for policy makers. But much of this value will stay unexploited until the tools for processing the huge amount of information remain unavailable
What is Machine Learning?
The core of machine learning consists of self-learning algorithms that evolve by continuously improving at their assigned task. W Algorithms fine-tune themselves with the data they train on. They build their own logic and, as a result, create solutions relevant to aspects of our world as diverse as fraud detection, web searches, tumor classification, object recognition and price prediction.
What is Big Data?
Data consists of numbers, words, measurements and observations formatted in ways computers can process. The digital era presents a challenge for traditional data-processing software: information becomes available in such volume, velocity and variety that it ends up outpacing human-centered computation. And we can describe big data using these three “V”s: volume, velocity and variety.
- Volume refers to the scale of available data;
- velocity is the speed with which data is accumulated;
- variety refers to the different sources it comes from.
Two other Vs are often added to previous three:
- Veracity refers to the consistency
- certainty (or lack thereof) in the sourced data, while value measures the usefulness of the data that’s been extracted from the data received.
Big Data Meet Big Data
Machine-learning algorithms become more effective as the size of training datasets grows. So when combining big data with machine learning, we benefit twice:
- the algorithms help us keep up with the continuous influx of data
- the volume and variety of the same data feeds the algorithms and helps them to achive better performance
Value of Data
TBC
For Big Data Analysis we need Machine Learning
more than one hour of video is uploaded to YouTube every second, amounting to 10 years of content every day; the genomes of 1000s of people, each of which has a length of more than a billion base pairs, have been sequenced by various labs and so on. This deluge of data calls for automated methods of data analysis, which is exactly what machine learning provides.
ML for data analysis in physics and other sciences
Some exmple form Phys.org
For ML we need Big Data
TBC