Data
With the aim to do an in depth screen of potential marine lncRNAs, we used several public sources, the major one being the assembled transcriptomic data of 406 eukaryotic microorganisms species representing all major branches of the eukaryotic tree life provided by the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) assembling a total of 650 transcriptomes.
More details ...
|
|
Workflow
The majority voting procedure is an ensemble machine learning method combining the predictions of 10 coding potential tools including CPC2, CPAT, LncFinder, LncADeep... The aim is to increase the confidence/robustness of lncRNA prediction, and most importantly to promote the diversity of the ensemble model. Briefly, the nature of the transcripts is predicted by each tool and labeled as “coding” or “non-coding”, and the class with highest number of votes is the outcome. Then, two other filters are applied in order to discriminate between lncRNA-like, sncRNA-like, and coding-like transcripts (transcript length and putative ORF length)
More details ...
|