NLP Capstone Workshop 2022

 

About the Workshop

Natural Language Processing combines the academic disciplines of computer science, linguistics, and artificial intelligence to develop computer programs with the ability to understand and generate natural language text. This rapidly growing field provides the key capabilities for many areas of artificial intelligence.


At UCSC, we offer a unique master’s program in Natural Language Processing. Our specialized program design combines theoretical learning with hands-on practice to ensure our students have the right knowledge and skill set to prepare them for a professional career in this fast-growing field.


This year, our NLP students have acquired skills and specialized knowledge in core algorithms, concepts, and industry practices related to NLP, machine learning, and data science and analytics. In this final phase of their graduate program, students complete a three-quarter Capstone project where they work as part of a team to develop a project which addresses a current real-world NLP challenge. The project enables students to gain hands-on experience and industry insight by applying their learning in a practical context. As part of their Capstone project, students develop team-working skills with peers as well as collaborating with experts from industry. Current projects include partnerships with industry giants IBM, Interactions, LinkedIn and Google. The Workshop is advertised widely across NLP networks and is attended by a cross-section of industry experts, faculty, and community members.


Students will showcase their Capstone projects at our annual NLP Capstone Workshop on August 26th, 2022. Each team will be given time to present their work and answer questions from workshop attendees. The workshop presentations will start at 1:00pm Pacific Time and will conclude at 5:00pm Pacific Time.

Registration

To attend the NLP Capstone Workshop online, please complete this registration form by 5pm PDT, August 25th.

Online Workshop Schedule (all times in Pacific Time)

1:00pm - 1:30pm

Welcome Remarks & Keynote Address

1:30pm - 2:00pm

IBM Research

Probing of Language Models for Code

2:00pm - 2:30pm

Interactions

Neural Models of Supertagging for Semantic Role Labeling & Beyond 

2:30pm - 2:40pm

Break

2:40pm - 3:10pm

LinkedIn

Comparing Dictionaries & Word Embeddings

3:10pm - 3:40pm

Google & UCSC

Multimodal Knowledge Extraction & Question Answering in Farming

3:40pm - 4:40pm

NLP Industry Panel Discussion

4:40pm - 5:00pm

Closing Remarks

Contact Us

If you have any questions about the NLP Capstone Workshop, please email the NLP Program Team at nlp@ucsc.edu.


To learn more about the NLP at UCSC, please visit our website.

Preview the 2022 NLP Capstone Projects

Details of Capstone projects developed with faculty and industry mentors are listed below. You can find more information on how the NLP Capstone experience works including previous projects and the 2021 Capstone Workshop on our website.


IBM Research - Probing of Language Models for Code: Exploring Code Style Transfer with Neural Networks


Presenters: Anish Kishor Savla, Chih-Kai Ting, Karl Shen Munson, and Serenity Wade 


Project Mentors: Kiran Kate and Dr. Kavitha Srinivas, IBM Research

 

Abstract:

 

Style is a significant component of natural language text, able to change the tone of text while keeping the underlying information the same. Even though programming languages have strict rules on their grammaticality, they also have style. Code can be written with the same functionality but using different language features. However, style in programming languages is difficult to quantify, and thus as part of this work, we define style attributes, specifically for Python. To build a definition of style, we utilized hierarchical clustering to capture a style definition without needing to specify transformations. In addition to defining style, we explore the capability of a pre-trained code language model to capture information about code style.




Interactions - Neural Models of Supertagging for Semantic Role Labeling & Beyond


Presenters: Anusha Gouravaram, Diji Yang, and Julian Jakob Cremer


Project Mentor: Dr. John Chen, Interactions

 

Abstract:

Recent Transformer-type deep neural network models have shown great success in a variety of NLP tasks such as sequence labeling. One sequence tagging task for which work in Transformers is absent is supertagging, which we investigate here. Supertagging, as an "almost parsing" task, has been previously shown to be helpful for the task of semantic role labeling (SRL). Consequently, we have also investigated how Transformer-type supertagging models can positively affect SRL models. Finally, supertagging has been typically framed in terms of modeling syntactic structure. In contrast, we propose a new Dual Semantic and Syntactic (DSS) supertag grammar which also incorporates semantic information. We incorporate it along with Transformer-based modeling into a three-step pipeline that labels semantic roles when given raw input sentences. Our empirical experiments demonstrate that it can be helpful for semantic parsing tasks. Specifically, it has achieved competitive results on the CoNLL 2009 shared task.


LinkedIn - Comparing Dictionaries & Word Embeddings


Presenters: Anuroop John Abraham, Archit Bose, Kartik, and Utkarsh Garg


Project Mentors: Ashvini Jindal and Dr. Ananth Sankar, LinkedIn

 

Abstract:

 

An extensive amount of NLP research focuses on learning word representations that accurately reflect their semantic properties. We investigate whether these word embeddings capture the same semantics of the word as its dictionary definition/gloss. To accomplish this, we leverage a reverse dictionary word retrieval task (i.e., given a definition, we retrieve the corresponding word by learning the word’s embedding space). Extending the idea to multilingual representation learning, we show the possibility of retrieving a word in any target language given a definition in any source language when trained on a single high-resource language. Through comprehensive experiments in both monolingual and multilingual settings, we show:

 

1. How different model architectures and word embeddings (static vs. contextual) affect the gloss to word preformance.

 

2. How adapter networks compare to fine-tuning for learning word representations, especially in a multilingual context.

 

3. Which layers are best at mapping phrasal semantics (gloss) to lexical semantics (word embeddings).

 

4. How increased model parameters and dataset sizes monotonically improve the model's representation learning ability.

 

To evaluate our system in this many-to-many setting, we release the first gold-standard multilingual parallel test of 1200 sense-aware word-gloss pairs in 6 languages.




Google & UCSC - MultiModal Knowledge Extraction & Question Answering in Farming


Presenters: Brian Mak, Ignacy Tymoteusz Debicki, Juan Sebastian Reyes, and Sriram Mahesh


Project Mentors: Professor Yi Zhang (UCSC), Google X

 

Abstract:

 

We propose a novel visual long-form question answering system for the farming domain, an area of research with no previous baselines or significant work. We scrape and curate our datasets from openly available farming information available through reliable farming forums and agricultural encyclopedias. We propose two methods to design our visual question answering agent with one being a Knowledge graph-based solution and another being an extractive question answering system designed using a DensePhrase retrieval based reading comprehension agent.

Natural Language Processing M.S. Program

Silicon Valley Campus

Email us: nlp@ucsc.edu

Apply Now

About Us

Program Overview

Your Career

Contact Us


Follow us on Twitter

Like our page on Facebook

Connect with us on LinkedIn