ML.NET使用LearningPipeline类定义执行期望的机器学习任务所需的步骤,让机器学习的流程变得直观。
下面用鸢尾花瓣预测快速入门的示例代码讲解流水线是如何工作的。
using Microsoft.ML; using Microsoft.ML.Data; using Microsoft.ML.Runtime.Api; using Microsoft.ML.Trainers; using Microsoft.ML.Transforms; using System; namespace myApp { class Program { // STEP 1: Define your data structures // IrisData is used to provide training data, and as // input for prediction operations // - First 4 properties are inputs/features used to predict the label // - Label is what you are predicting, and is only set when training public class IrisData { [Column( "0" )] public float SepalLength; [Column( "1" )] public float SepalWidth; [Column( "2" )] public float PetalLength; [Column( "3" )] public float PetalWidth; [Column( "4" )] [ColumnName( "Label" )] public string Label; } // IrisPrediction is the result returned from prediction operations public class IrisPrediction { [ColumnName( "PredictedLabel" )] public string PredictedLabels; } static void Main( string [] args) { // STEP 2: Create a pipeline and load your data var pipeline = new LearningPipeline(); // If working in Visual Studio, make sure the 'Copy to Output Directory' // property of iris-data.txt is set to 'Copy always' string dataPath = "iris-data.txt" ; pipeline.Add( new TextLoader(dataPath).CreateFrom<IrisData>(separator: ',' )); // STEP 3: Transform your data // Assign numeric values to text in the "Label" column, because only // numbers can be processed during model training pipeline.Add( new Dictionarizer( "Label" )); // Puts all features into a vector pipeline.Add( new ColumnConcatenator( "Features" , "SepalLength" , "SepalWidth" , "PetalLength" , "PetalWidth" )); // STEP 4: Add learner // Add a learning algorithm to the pipeline. // This is a classification scenario (What type of iris is this?) pipeline.Add( new StochasticDualCoordinateAscentClassifier()); // Convert the Label back into original text (after converting to number in step 3) pipeline.Add( new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" }); // STEP 5: Train your model based on the data set var model = pipeline.Train<IrisData, IrisPrediction>(); // STEP 6: Use your model to make a prediction // You can change these numbers to test different predictions var prediction = model.Predict( new IrisData() { SepalLength = 3.3f, SepalWidth = 1.6f, PetalLength = 0.2f, PetalWidth = 5.1f, }); Console.WriteLine($ "Predicted flower type is: {prediction.PredictedLabels}" ); } } } |
创建工作流实例
首先,创建LearningPipeline实例
1 | var pipeline = new LearningPipeline(); |
添加步骤
然后,调用LearningPipeline实例的Add方法向流水线添加步骤,每个步骤都继承自ILearningPipelineItem接口。
一个基本的工作流包括以下几个步骤,其中,蓝色部分是可选的。
加载数据集
继承自ILearningPipelineLoader接口。
一个工作流必须包含至少1个加载数据集步骤。
123 | //使用TextLoader加载数据 string dataPath = "iris-data.txt" ; pipeline.Add( new TextLoader(dataPath).CreateFrom<IrisData>(separator: ',' )); |
数据预处理
继承自CommonInputs.ITransformInput接口。
一个工作流可以包含0到多个数据预处理步骤,用于将已加载的数据集标准化,示例代码中就包含2了个数据预处理步骤。
//由于Label文本数据,算法不能识别数据,需要将其转换为字典 pipeline.Add( new Dictionarizer( "Label" )); //算法只能从Features列获取数据,需要数据中的多列连接到Features列中 pipeline.Add( new ColumnConcatenator( "Features" , "SepalLength" , "SepalWidth" , "PetalLength" , "PetalWidth" )); |
选择学习算法
继承自CommonInputs.ITrainerInput接口。
一个工作流必须且只能包含1个学习算法。
12 | //使用线性分类器 pipeline.Add( new StochasticDualCoordinateAscentClassifier()); |
标签转换
继承自CommonInputs.ITransformInput接口。
一个工作流可以包含0到多个标签转换步骤,用于将预测得到的标签转换成方便识别的数据。
12 | //将Label从字典转换成文本数据 pipeline.Add( new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" }); |
执行工作流
最后,调用LearningPipeline实例的Train方法,就可以执行工作流得到预测模型。
1 | var model = pipeline.Train<IrisData, IrisPrediction>(); |