Endre søk
Begrens søket
1 - 5 of 5
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Ahmed, Laeeq
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Scalable Analysis of Large Datasets in Life Sciences2019Doktoravhandling, monografi (Annet vitenskapelig)
    Abstract [en]

    We are experiencing a deluge of data in all fields of scientific and business research, particularly in the life sciences, due to the development of better instrumentation and the rapid advancements that have occurred in information technology in recent times. There are major challenges when it comes to handling such large amounts of data. These range from the practicalities of managing these large volumes of data, to understanding the meaning and practical implications of the data.

    In this thesis, I present parallel methods to efficiently manage, process, analyse and visualize large sets of data from several life sciences fields at a rapid rate, while building and utilizing various machine learning techniques in a novel way. Most of the work is centred on applying the latest Big Data Analytics frameworks for creating efficient virtual screening strategies while working with large datasets. Virtual screening is a method in cheminformatics used for Drug discovery by searching large libraries of molecule structures. I also present a method for the analysis of large Electroencephalography data in real time. Electroencephalography is one of the main techniques used to measure the brain electrical activity.

    First, I evaluate the suitability of Spark, a parallel framework for large datasets, for performing parallel ligand-based virtual screening. As a case study, I classify molecular library using prebuilt classification models to filter out the active molecules. I also demonstrate a strategy to create cloud-ready pipelines for structure-based virtual screening. The major advantages of this strategy are increased productivity and high throughput. In this work, I show that Spark can be applied to virtual screening, and that it is, in general, an appropriate solution for large-scale parallel pipelining. Moreover, I illustrate how Big Data analytics are valuable in working with life sciences datasets.

    Secondly, I present a method to further reduce the overall time of the structured-based virtual screening strategy using machine learning and a conformal-prediction-based iterative modelling strategy. The idea is to only dock those molecules that have a better than average chance of being an inhibitor when searching for molecules that could potentially be used as drugs. Using machine learning models from this work, I built a web service to predict the target profile of multiple compounds against ready-made models for a list of targets where 3D structures are available. These target predictions can be used to understand off-target effects, for example in the early stages of drug discovery projects.

    Thirdly, I present a method to detect seizures in long term Electroencephalography readings - this method works in real time taking the ongoing readings in as live data streams. The method involves tackling the challenges of real-time decision-making, storing large datasets in memory and updating the prediction model with newly produced data at a rapid rate. The resulting algorithm not only classifies seizures in real time, it also learns the threshold in real time. I also present a new feature "top-k amplitude measure" for classifying which parts of the data correspond to seizures. Furthermore, this feature helps to reduce the amount of data that needs to be processed in the subsequent steps.

  • 2.
    Ahmed, Laeeq
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Edlund, Åke
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Spjuth, O.
    Using iterative MapReduce for parallel virtual screening2013Inngår i: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), IEEE Computer Society, 2013, s. 27-32Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Virtual Screening is a technique in chemo informatics used for Drug discovery by searching large libraries of molecule structures. Virtual Screening often uses SVM, a supervised machine learning technique used for regression and classification analysis. Virtual screening using SVM not only involves huge datasets, but it is also compute expensive with a complexity that can grow at least up to O(n2). SVM based applications most commonly use MPI, which becomes complex and impractical with large datasets. As an alternative to MPI, MapReduce, and its different implementations, have been successfully used on commodity clusters for analysis of data for problems with very large datasets. Due to the large libraries of molecule structures in virtual screening, it becomes a good candidate for MapReduce. In this paper we present a MapReduce implementation of SVM based virtual screening, using Spark, an iterative MapReduce programming model. We show that our implementation has a good scaling behaviour and opens up the possibility of using huge public cloud infrastructures efficiently for virtual screening.

  • 3.
    Ahmed, Laeeq
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Edlund, Åke
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Whitmarsh, S.
    Parallel real time seizure detection in large EEG data2016Inngår i: IoTBD 2016 - Proceedings of the International Conference on Internet of Things and Big Data, SciTePress, 2016, s. 214-222Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Electroencephalography (EEG) is one of the main techniques for detecting and diagnosing epileptic seizures. Due to the large size of EEG data in long term clinical monitoring and the complex nature of epileptic seizures, seizure detection is both data-intensive and compute-intensive. Analysing EEG data for detecting seizures in real time has many applications, e.g., in automatic seizure detection or in allowing a timely alarm signal to be presented to the patient. In real time seizure detection, seizures have to be detected with negligible delay, thus requiring lightweight algorithms. MapReduce and its variations have been effectively used for data analysis in large dataset problems on general-purpose machines. In this study, we propose a parallel lightweight algorithm for epileptic seizure detection using Spark Streaming. Our algorithm not only classifies seizures in real time, it also learns an epileptic threshold in real time. We furthermore present "top-k amplitude measure" as a feature for classifying seizures in the EEG, that additionally assists in reducing data size. In a benchmark experiment we show that our algorithm can detect seizures in real time with low latency, while maintaining a good seizure detection rate. In short, our algorithm provides new possibilities in using private cloud infrastructures for real time epileptic seizure detection in EEG data.

  • 4.
    Ahmed, Laeeq
    et al.
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Georgiev, Valentin
    Capuccini, Marco
    Toor, Salman
    Schaal, Wesley
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Beräkningsvetenskap och beräkningsteknik (CST).
    Spjuth, Ola
    Efficient iterative virtual screening with Apache Spark and conformal prediction2018Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, artikkel-id 8Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.

  • 5. Capuccini, Marco
    et al.
    Ahmed, Laeeq
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Schaal, Wesley
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Spjuth, Ola
    Large-scale virtual screening on public cloud resources with Apache Spark2017Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 9, artikkel-id 15Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.

1 - 5 of 5
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf