Semi-Supervised Learning

 


Self-training: It is a technique frequently used in semi-supervised learning. In this technique, the classifier is primarily trained with a small size of labeled data. Then, classification is tried using this classifier on unlabeled data. Typically the safest unlabeled points are added to the training set, with their prescribed labels. This process is repeated and classifier training continues [1].

Self-training algorithm [2]:

The data set labeled L and the unlabeled data set U;

REPEAT:

  • Train the training data of a classifier h with L,
  • Classify the data in U with h,
  • Find the U‘ set, which is a subset of U with the safest scores,
  • L + U‘→ L
  • U — U‘→ U

Generative mixture models: It is one of the oldest semi-controlled learning methods. A generative model: P (x, y) = p (y) p (x | y) => where p (x | y) is an identifiable mixing distribution. For example, Gaussian mix models. With a large amount of unlabeled data, components of the mixture can be identified; then ideally only one labeled sample is needed for each ingredient to precisely determine the mix distribution. Mix components can be thought of as “soft clusters”. In this method, attention should be paid to the following [1]:

  • Identifiability,
  • Model accuracy,
  • Expectation-maximization,
  • Set and label.

Semi-supervised support vector machine (S3VM): The semi-supervised support vector machine was proposed by Bennett and Demiriz as a semi-supervised learning method based on the cluster assumption [4]. The main purpose of S3VM is to create a classifier using labeled data and untagged data. Similar to the SVM idea, S3VM requires the maximum margin to separate tagged data and untagged data, and the new optimal classification boundary must meet that the classification in the original unlabeled data has the smallest generalization error [5].

Semi-supervised support vector machines can be widely used in some classification problems. Potential applications are image classification and text classification, and semi-controlled support vector machines show good results in these two areas [5].

Graph-based algorithms: Graph-based semi-controlled methods define a graph in which nodes show tagged and untagged instances in the data set, and edges (can be weighted) reflect the similarity of the instances. These methods often assume label smoothness on the chart. Graphic methods are nonparametric, discriminatory and transductive in nature [1].

In graphical based methods, the label information of each sample is spread over the neighbor sample until a global stable state is reached in the entire data set. Here, a graph of nodes and edges is created where nodes are indicated with untagged and tagged instances, the edges indicate similarities between unlabeled and untagged instances. Here, the label of each data sample is advanced to its neighbor points [6].

Chart structure: It is configured as G = (V, E).
Here:
V: a series of vertices denoting data samples with and without labels,
E: is an edge set showing similarities between labeled and unlabeled samples from the data set [7].

Some graphical semi-supervised learning algorithms are as follows [3]:

  • Harmonic,
  • Local and global consistency,
  • Manifold regularization,
  • Mincut,

Multiview algorithms: The demand for redundant views of the same input data is a big difference between multiview and single view learning algorithms. Thanks to these multiple images, the learning task can be done with plenty of information. However, if the learning method cannot cope properly with multiple images, these images may degrade the performance of multi-image learning [8].



[1] Zhu, X. (2005). Semi-Supervised Learning Literature Survey. University of Wisconsin Madison Computer Sciences Department

[2] Xia, F. (2006). Semi-supervised learning and self-training. Web: http://faculty.washington.edu/fxia/courses/LING572/self-training.ppt

[3] Zhu, X. (2007). ICML. Semi-Supervised Learning Tutorial, 1–135. Madison, USA: University of Wisconsin Department of Computer Sciences.

[4] Bennett, K. P., and Demiriz, A. (1999). Advances in Neural Information Processing Systems (NIPS). Semi-supervised Support Vector Machines, 368–374. MIT Press.

[5] Ding, S., Zhu, Z., and Zhang, X. (2015). An overview on semi-supervised support vector machine. Neural Computing & Applications. doi: 10.1007/s00521–015–2113–7

[6] Sheikhpour, R., Sarram, M. A., Gharaghani, S., and ZareChahooki, M. A. (2017). A Survey on semi-supervised feature selection methods. Pattern Recognition, 141–158. doi: 10.1016/j.patcog.2016.11.003

[7] Sawant, S. S., and Prabukumar, M. (2018). A review on graph-based semi-supervised learning methods for hyperspectral image classification. The Egyptian Journal of Remote Sensing and Space Science. doi: 10.1016/j.ejrs.2018.11.001

[8] Xu, C., Tao, D., and Xu, C. (2013). A Survey on Multi-view Learning. 1–59. arXiv:1304.5634

[9] Savaş, S. (2019), Karotis Arter Intima Media Kalınlığının Derin Öğrenme ile Sınıflandırılması, Gazi Üniversitesi Fen Bilimleri Enstitüsü Bilgisayar Mühendisliği Ana Bilim Dalı, Doktora Tezi, Ankara.

Hiç yorum yok:

Yorum Gönder