ⓘ Machine-generated data is information automatically generated by a computer process, application, or other mechanism without the active intervention of a human. ..

                                     

ⓘ Machine-generated data

Machine-generated data is information automatically generated by a computer process, application, or other mechanism without the active intervention of a human. While the term dates back over fifty years, there is some current indecision as to the scope of the term. Monash Researchs Curt Monash defines it as "data that was produced entirely by machines OR data that is more about observing humans than recording their choices." Meanwhile, Daniel Abadi, CS Professor at Yale, proposes a narrower definition, "Machine-generated data is data that is generated as a result of a decision of an independent computational agent or a measurement of an event that is not caused by a human action." Regardless of definition differences, both exclude data manually entered by a person. Machine-generated data crosses all industry sectors. Often and increasingly, humans are unaware their actions are generating the data.

                                     

1. Relevance

Machine-generated data has no single form; rather, the type, format, metadata, and frequency respond to some particular business purpose. Machines often create it on a defined time schedule or in response to a state change, action, transaction, or other event. Since the event is historical, the data is not prone to be updated or modified. Partly because of this quality, the U.S. court systems consider machine-generated data as highly reliable.

Machine-generated data is the lifeblood of the Internet of Things IoT.

                                     

2. Growth

In 2009, Gartner published that data will grow by 650% over the following five years. Most of the growth in data is the byproduct of machine-generated data. IDC estimated that in 2020, there will be 26 times more connected things than people. Wikibon issued a forecast of $514 billion to be spent on the Industrial Internet in 2020.

                                     

2.1. Growth Processing

Given the fairly static yet voluminous nature of machine-generated data, data owners rely on highly scalable tools to process and analyze the resulting dataset. Almost all machine-generated data is unstructured but then derived into a common structure. Typically, these derived structures contain many data points/columns. With these data points, the challenge lies mostly with analyzing the data. Given high performance requirements along with large data sizes, traditional database indexing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar databases as only particular "columns" of the dataset would be accessed during particular analysis.

                                     

3. Examples

  • Security information and event management SIEM logs
  • Telemetry collected by the government
  • Financial instrument trades
  • Web server logs
  • Call detail records
  • Network event logs
                                     
  • Media is audience - generated feedback and news coverage. People give their reviews and share stories in the form of user - generated and user - uploaded audio
  • In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update
  • training data set and then test the likelihood of a test instance to be generated by the model. Association rule learning is a rule - based machine learning
  • weights and measures of airplanes prior to take - off. Wire data is distinct from machine - generated data which is system self - reported information typically
  • unstructured data Data science is related to data mining and big data Data science is a concept to unify statistics, data analysis, machine learning and
  • corresponds to the observations on one element of that population. Data sets may further be generated by algorithms for the purpose of testing certain kinds of
  • environment. Experimental data is data that is generated within the context of a scientific investigation by observation and recording. Data has been described
  • directly in hardware Machine - generated data Machines video game a 1999 real - time strategy game for Microsoft Windows The Machine computer architecture
  • personal computer. More recent machine to machine communication has changed into a system of networks that transmits data to personal appliances. The expansion
  • Industrial big data refers to a large amount of diversified time series generated at a high speed by industrial equipment, known as the Internet of thingsThe
  • machine learning algorithms for the analysis of classical data executed on a quantum computer, i.e. quantum - enhanced machine learning. While machine learning