About the Role
Johns Hopkins, founded in 1876, is America's first research
university and home to nine world-class academic divisions working
together as one university.
The Johns Hopkins Data Science and AI Institute (DSAI) is a new
pan-institutional initiative at Johns Hopkins to advance artificial
intelligence and its applications, in part through investments in
the software engineering, data science, and machine learning space.
DSAI is focused on revolutionizing discovery by advancing
artificial intelligence that evolves collaboratively with human
intelligence, combining the strengths of each for the betterment of
society and the world in which we live. DSAI will bring together
the mathematical, computational, and ethical foundations of AI with
the domains of Health & Medicine, Scientific Discovery,
Engineered Systems, Security & Safety, and People, Policy &
Governance.
DSAI seeks a Research Software Engineer - Clinical NLP
Specialty with strong academic background and relevant experience
in industry or academia focused on designing and building
state-of-the art clinical NLP systems. This position supports
research initiatives in the development and novel application of
NLP and large language models to extract insights from unstructured
clinical text using techniques such as named entity recognition
(NER), negation detection, structured data extraction, diagnosis
prediction, risk stratification, temporal reasoning and
phenotyping. The successful candidate will play a critical role in
designing, implementing, rigorously evaluating, deploying and
maintaining robust and scalable NLP pipelines and models to extract
meaningful information from unstructured clinical text in secure
environments, with the goal of enabling high-impact solutions
across a range of biomedical domains. Experience with large
language models - such as fine-tuning, prompt engineering, model
evaluation, and adapting foundation models for domain-specific
clinical tasks - is desirable, particularly in contexts that demand
privacy, robustness, and interpretability. The clinical NLP RSE
will work closely with clinicians, informatics researchers, data
scientists and other RSEs to ensure NLP systems meet application
goals with methodological rigor and scientific reproducibility.
DSAI engineers are at the forefront of modern data intensive
science, where professionally developed software is rapidly
becoming a key ingredient for success. The DSAI initiative includes
the build-out of a substantive and professional-scale software
engineering capability, and a dramatic increase in infrastructure,
both in hardware and in personnel. JHU has long been a world leader
in the broader domains of medicine and public health as well as a
wide range of science and engineering fields. This combined with
our ethos of building out capabilities to have demonstrable global
impact (e.g., JHUs Coronavirus Resource Center the award-winning
global resource for real-time data and analysis for COVID-19) and
other unique large scientific data sets, like the archives for the
Sloan Digital Sky Survey and several simulations, will be key
leverage points that will make the DSAI successful.
Specific Duties & Responsibilities
• The successful candidates will participate in ground-breaking
research projects that need advanced software solutions requiring
expertise in software engineering not commonly found in scientific
collaborations.
• The projects will require development of state-of-the art
clinical NLP solutions using the latest deep learning libraries
trained on state-of-the-art hardware in secure healthcare computing
environments.
• Projects will involve analysis of massive data sets either in
the cloud or on premises.
• Projects will require development of novel NLP software
pipelines for processing of unstructured clinical notes.
• Some projects may require deep engagement, possibly leading to
co-authorship on scientific publications, while others may involve
a more casual consulting engagement.
• They may require software solutions developed from scratch or
refactoring existing solutions to make them conform to industry
standards (quality, efficiency, reusability, robustness,
portability, documentation, etc.).
• It is a high-level goal of DSAI to translate the efforts for
the individual projects into frameworks and template patterns for
sustainable scientific infrastructure benefiting future
projects.
Special knowledge, skills, and abilities
• Strong NLP, LLM, machine learning and deep learning
skills.
• Practical experience building NLP models and pipelines in a
secure, HIPPA compliant healthcare environment.
• Expert-level knowledge of multiple modern NLP and LLM libraries
and models.
• Hands-on experience adapting and fine-tuning large language
models for domain-specific clinical applications, with attention to
data efficiency, interpretability, and reproducibility.
• Demonstrated expertise in prompt engineering, evaluation, and
benchmarking of large language models, includ