Read the paper

Introduction

  • This paper looks at the challenges of changing the style of text without using matching input-output examples.
  • It introduces two new ways to measure performance:
    • Transfer Strength: How well the style is changed.
    • Content Preservation: How much of the original content is kept.
  • The methods are tested on two tasks:
    • Turning academic paper titles into news-style titles.
    • Changing reviews from positive to negative.
  • Results show the models balance changing style and keeping content well.
  • The paper talks about earlier work on style transfer in images and text.
  • It explains the difficulties in text style transfer, like not having matching input-output examples and lacking good evaluation methods.
  • Different approaches, like adversarial networks and multi-task learning, are reviewed.
  • The authors point out the need for better ways to keep content and separate style.

Models

  • Two models are proposed:
    • The multi-decoder model, which has separate decoders for each style.
    • The style-embedding model, which adds style information during generation.
  • To test these models, two metrics are used:
    • Transfer strength: Measures how well the new style is applied.
    • Content preservation: Measures how much content stays the same.
  • Tests on two datasets—paper-news titles and positive-negative reviews—show these models balance style change and content retention.
  • The multi-decoder model focuses more on style change, while the style-embedding model keeps more content.
  • Human evaluations show the content preservation metric aligns well with human opinions.

Evaluation

  • The models manage to change style while keeping content to varying levels.
  • The new metrics (transfer strength and content preservation) match human evaluations well, proving their usefulness.
  • The paper notes a trade-off between style change and content retention, suggesting different models for different tasks.

Experiment

  • Two datasets are used:
    • A paper-news title dataset with academic paper titles and their news-style versions.
    • A positive-negative review dataset with Amazon reviews labeled as positive or negative.
  • The datasets were split into training, validation, and test sets.
  • Preprocessing steps included:
    • Filtering sentence lengths.
    • Converting text to lowercase.
    • Replacing numbers with placeholders.
  • Different settings were tested to ensure fair evaluation:
    • Word embedding size: Tried values like 64, 128, etc.
    • Encoder hidden size: Tested sizes like 16, 32, 64, 128.
    • Style embedding size: Checked 32, 64, 128 dimensions.
    • Batch size: Set to 128.
    • Optimizer: Used Adadelta with a learning rate of 0.0001.

Results and Analysis

  • The content preservation metric matched human opinions with a Spearman’s correlation of 0.5656 (p < 0.0001).
  • In the paper-news title task:
    • The style-embedding model got content preservation scores of 0.89-0.95 and transfer strength scores of 0.2-0.6, better than the baseline.
  • In the positive-negative review task:
    • The multi-decoder model got transfer strength of 0.8 and content preservation of 0.85, doing better than the style-embedding model, which scored 0.6 and 0.75.
  • The multi-decoder model improved transfer strength by 50% compared to the auto-encoder in this task, reaching 0.6.
  • Overall, the models improved style change and content retention by 20~50% compared to the baseline.