Interactive Question Clarification in Dialogue via Reinforcement Learning
Published in COLING, 2020
Question clarification based on MSTC.
Published in COLING, 2020
Question clarification based on MSTC.
Published in ACL, 2021
Proposing an unsupervised structured encoder able to compose low-level constituents into high-level constituents without gold trees. The learned trees are highly consistent with human-annotated ones. The backbone of the encoder is a neural inside algorithm with heuristic pruning, thus the time and space complexity are both in linear.
Published in EMNLP, 2022
Improve the heuristic pruning module used in R2D2 to model-based pruning.
Published in ICLR, 2023
We explore the interpretability of the structured encoder and find that the induced alignment between labels and spans is highly consistent with human rationality.
Published in ICLR, 2024
We reduce the space complexity of the deep inside-outside algorithm from cubic to linear and further reduce the parallel time complexity to approximately log N thanks to the new pruning algorithm proposed in this paper. Furthermore, we find that joint pre-training of Transformers and composition models can enhance a variety of NLP downstream tasks. We push unsupervised constituency parsing performance to 65% and demonstrate that our model could outperform vanillar Trasformers around 5% on span-level tasks.
Published in ACL, 2024
We propose GPST, a syntactic language model which could be pre-trained on raw text efficiently without any human-annotated trees. When GPST and GPT-2 are both pre-trained on OpenWebText from scratch, GPST can outperform GPT-2 on various downstream tasks. Moreover, it significantly surpasses previous methods on generative grammar induction tasks, exhibiting a high degree of consistency with human syntax.
Published in arXiv, 2024
A tokenizer based on an unsupervised tree-based morphological parser.
Published in arXiv, 2024
An efficient approach to train a dense retriever under the supervision from auto-regressive loss.