Date analysis Date management — презентация
logo
Date analysis Date management
  • Date analysis Date management
  • Date analysis Date management
  • Date analysis Date management
  • Data Analytics
  • Date analysis Date management
  • 3) Classification, Prediction
  • Classification—A Two-Step Process
  • Classification Process (1): Model Construction
  • Classification Process (2): Use the Model in Prediction
  • Issues (1): Data Preparation
  • Issues (2): Evaluating Classification Methods
  • 4) Classification by Decision Tree Induction
  • Data Mining: A KDD Process
  • Architecture: Typical Data Mining System
  • Data Mining: Confluence of Multiple Disciplines
  • Multi-Dimensional View of Data Mining
  • Date analysis Date management
  • Computational View of Big Data
1/18

Первый слайд презентации: Date analysis Date management

Completed by: Zhienbekova S Group:1701-32 Received by: Zhamalova K 2023 year

Изображение слайда

Слайд 2

1) Data analysis bases 2) Characteristics of data sample 3) Classification, Prediction 4) Classification by Decision Tree Induction 5) What is data mining? 6) What is “big data”?

Изображение слайда

Слайд 3

1) Data analysis bases Data analysis  is a process of inspecting,  cleansing,  transforming, and  modeling   data  with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, while being used in different business, science, and social science domains.

Изображение слайда

Слайд 4: Data Analytics

Accumulation of raw data captured from various sources (i.e. discussion boards, emails, exam logs, chat logs in e-learning systems) can be used to identify fruitful patterns and relationships (Bose, 2009) Exploratory visualization – uses exploratory data analytics by capturing relationships that are perhaps unknown or at least less formally formulated Confirmatory visualization - theory-driven

Изображение слайда

Слайд 5

2) Characteristics of data sample In any report or article, the structure of the sample must be accurately described. It is especially important to exactly determine the structure of the sample (and specifically the size of the subgroups) when subgroup analyses will be performed during the main analysis phase. The characteristics of the data sample can be assessed by looking at: - Basic statistics of important variables - Scatter plots - Correlations and associations - Cross-tabulations

Изображение слайда

Classification - predicts categorical class labels (discrete or nominal) - classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction - models continuous-valued functions, for example, predicts unknown or missing values Typical applications: - Credit approval - Target marketing - Medical diagnosis - Fraud detection

Изображение слайда

Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction: training set The model is represented as classification rules, decision trees, or mathematical formulae Model usage: for classifying future or unknown objects Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set, otherwise over-fitting will occur

Изображение слайда

Слайд 8: Classification Process (1): Model Construction

Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)

Изображение слайда

Classifier Testing Data Unseen Data (George, Professor, 5) Tenured?

Изображение слайда

Слайд 10: Issues (1): Data Preparation

Data cleaning Preprocess data in order to reduce noise and handle missing values Relevance analysis (feature selection) Remove the irrelevant or redundant attributes Data transformation Generalize and/or normalize data

Изображение слайда

Слайд 11: Issues (2): Evaluating Classification Methods

Predictive accuracy Speed and scalability time to construct the model time to use the model Robustness handling noise and missing values Scalability efficiency in disk-resident databases Interpretability: understanding and insight provded by the model Goodness of rules decision tree size compactness of classification rules

Изображение слайда

Слайд 12: 4) Classification by Decision Tree Induction

Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution Decision tree generation consists of two phases Tree construction At start, all the training examples are at the root Partition examples recursively based on selected attributes Tree pruning Identify and remove branches that reflect noise or outliers Use of decision tree: Classifying an unknown sample Test the attribute values of the sample against the decision tree

Изображение слайда

Слайд 13: Data Mining: A KDD Process

13 Data Mining: A KDD Process Data mining —core of knowledge discovery process Data Cleaning Data Integration Databases Data Warehouse Knowledge Task-relevant Data Selection Data Mining Pattern Evaluation

Изображение слайда

Слайд 14: Architecture: Typical Data Mining System

14 Architecture: Typical Data Mining System Data Warehouse Data cleaning & data integration Filtering Databases Database or data warehouse server Data mining engine Pattern evaluation Graphical user interface Knowledge-base

Изображение слайда

Слайд 15: Data Mining: Confluence of Multiple Disciplines

15 Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization

Изображение слайда

Слайд 16: Multi-Dimensional View of Data Mining

16 Multi-Dimensional View of Data Mining Data to be mined - Relational, data warehouse, transactional, stream, object-oriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, WWW Knowledge to be mined - Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc. - Multiple/integrated functions and mining at multiple levels

Изображение слайда

Слайд 17

"Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization”. Complicated (intelligent) analysis of data may make a small data “appear” to be “big”. Bottom line: Any data that exceeds our current capability of processing can be regarded as “big”. 6) What is “big data”?

Изображение слайда

Последний слайд презентации: Date analysis Date management: Computational View of Big Data

Formatting, Cleaning Storage Data Data Understanding Data Access Data Integration Data Analysis Data Visualization

Изображение слайда

Похожие презентации