Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we developed a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. More detailed information about this tool can be found in the main paper.
![]() | ![]() | ![]() |
Calculation in progress, please wait.
In order to use this application,
(i) load your data set using Data upload tab. Here, users have three options: "Data upload", "Paste your data" and "Single molecule"
(ii) choose statistical machine learning algorithm(s) in the Analyze tab.
(iii) in the Plots tab, users can create dendrogram using Rcpi package and heat map using ChemmineR and gplots packages based on PubChem’s fingerprints . To create dendrogram and heat map from data, it must have PubChem CID numbers. Alternatively, to create a dendrogram, users can upload an SDF file, which contains molecular informations about compounds. Please note that creating dendrogram and heat map may take for a while due to the large number of compounds
(iv) create molecule plot(s) in the PubChem tab. Data must have PubChem CID numbers and 16 molecules can be selected at a time. If users want to download SDF file without plotting, then they can select any number of molecules.
Users can download statistical machine-learning predictions as txt in the Analyze tab, heat map and dendrogram plots as pdf in the Plots tab, molecule plot and molecule SDF file in the PubChem tab.
Please note that data set must have following descriptors in precise order: logP, polar surface area (PSA), donor count (DC), aliphatic ring count (AlRC), aromatic ring count (ArRC) and Balaban index (BI).
If Data has PubChem CID numbers, this must be placed in the first column of the data matrix.
Hacettepe University Faculty of Medicine Department of Biostatistics
selcuk.korkmaz@hacettepe.edu.tr
Hacettepe University Faculty of Medicine Department of Biostatistics
gokmen.zararsiz@hacettepe.edu.tr
Hacettepe University Faculty of Medicine Department of Biostatistics
dincer.goksuluk@hacettepe.edu.tr
(1) MLViS paper published at PLoS ONE. The complete reference information is at the Citation tab
(i) 4 new statistical machine-learning methods have been added
(ii) Plots and PubChem tabs have been added
MLViS web-tool has been released.
easyROC: a web-tool for ROC curve analysis
MVN: a web-tool for assessing multivariate normality
DDNAA: Decision support system for differential diagnosis of nontraumatic acute abdomen
Korkmaz S, Zararsiz G, Goksuluk D (2015) MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development. PLoS ONE 10(4): e0124600. doi: 10.1371/journal.pone.0124600