Data clustering, data deduction and data visualization. Using advnaced skills to encode the free format articles to cluster data by using LLM pre-trained models.
2023 Supervised Learning for Orange3 from scratchFEG
This document provides an overview of supervised learning and decision tree models. It discusses supervised learning techniques for classification and regression. Decision trees are explained as a method that uses conditional statements to classify examples based on their features. The document reviews node splitting criteria like information gain that help determine the most important features. It also discusses evaluating models for overfitting/underfitting and techniques like bagging and boosting in random forests to improve performance. Homework involves building a classification model on a healthcare dataset and reporting the results.
This document provides an overview of unsupervised learning techniques including k-means clustering and association rule mining. It begins with introductions to the speaker and tutorial topics. It then contrasts supervised vs unsupervised learning, describing how k-means is used for clustering without labels and how association rules can discover relationships between items. The document provides examples of applying these techniques in domains like retail, sports, email marketing and healthcare. It also includes visualizations and discusses important concepts for k-means like data transformation and for association rules like support, confidence and lift. Homework questions are asked about preparing data for these algorithms in Orange.
202312 Exploration Data Analysis Visualization (English version)FEG
This document provides an overview of exploratory data analysis (EDA) and visualization techniques that can be performed before building a machine learning model. It introduces the Iris dataset as an example and outlines the key steps of EDA, including loading the data, examining correlations, creating scatter plots, and generating distribution and box plots to understand feature statistics. As homework, students are asked to explore another dataset with a numeric target feature called "housing.tab" and explain the visualizations.
202312 Exploration of Data Analysis VisualizationFEG
This document provides a tutorial on data visualization and analysis using Orange 3. It discusses different types of charts like pie charts, line charts, histograms, bar charts, scatter plots, box plots, and pivot tables. It demonstrates how to visualize survival rates from the Titanic dataset based on features like sex, passenger class, age, and fare paid. Key findings are that women and higher class passengers had higher survival rates, and survival rates also depended on combinations of these features.
Transfer learning (TL) is a research problem in machine learning (ML) that focuses on applying knowledge gained while solving one task to a related task
2. About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
6. 視覺化資料分析
欄位名稱 說明
Sales(銷量) Unit sales (in thousands) at each location
CompPrice(競爭者價格) Price charged by competitor at each location
Income(收入等級) Community income level (in thousands of dollars)
Advertising(廣告預算) Local advertising budget for company at each location (in thousands of dollars)
Population(人口) Population size in region (in thousands)
Price(價格) Price company charges for car seats at each site
ShelveLoc
A factor with levels Bad, Good and Medium indicating the quality of the shelving
location for the car seats at each site
Age(年齡) Average age of the local population
Education(教育程度) Education level at each location
Urban
A factor with levels No and Yes to indicate whether the store is in an urban or
rural location
US A factor with levels No and Yes to indicate whether the store is in the US or not
6