Text Compression

Decode this mystery text using text compression – similar to how data is compiled and transported across the Internet.

Grades 9-12 55 min Resource by:

Ever wonder why your downloads run slow? At some point, we reach a physical limit of how fast we can send bits. If we want to send a large amount of information faster, we have to find a way to represent the same information with fewer bits. So, we must compress the data. In this lesson, students will develop a deeper understanding of the necessity for text compression and how it works. They will use the Text Compression Widget to compress segments of English text by looking for patterns and substituting symbols for larger patterns of text.

Learning Objectives

  • Collaborate with a peer to find a solution to a text compression problem using the Text Compression Widget (lossless compression scheme).
  • Explain why the optimal amount of compression is impossible or “hard” to identify.
  • Explain some factors that make compression challenging.
  • Develop a strategy (heuristic algorithm) for compressing text.
  • Describe the purpose and rationale for lossless compression.

What You’ll Need

Minds On

  • Collaboration
  • Computational thinking
  • Persistence
  • Problem-solving
  • Heuristic thinking

In this lesson, students will use the Text Compression Widget to compress segments of English text by looking for patterns and substituting symbols for larger patterns of text. After some experimentation, students are asked to come up with a process (or algorithm) for arriving at a “good” amount of compression despite the fact that there is no way to know what is best or optimal. In developing a so-called “heuristic approach” to this problem, students will grapple with the trade-offs in compressing data and begin to develop a sense for computing problems that are “hard” to solve.

  • When you abbreviate or use coded language to shorten the original text, you are “compressing text.” Computers do this, too, in order to save time and space.
  • The art and science of compression is about figuring out how to represent the same data with fewer bits.
  • Why is this important? One reason is that storage space is limited, and you’d always prefer to use fewer bits if you could. A much more compelling reason is that there is an upper limit to how fast bits can be transmitted over the Internet.
  • What if we need to send a large amount of text faster over the Internet, but we’ve reached the physical limit of how fast we can send bits? Our only choice is to somehow capture the same information with fewer bits; we call this compression.
  • Activity Guide – Text Compression 
  • Video: Text Compression Widget With Aloe Blacc
  • What did all groups’ processes for compression have in common?
  • Will following this process always lead to the same compression (i.e., two people following the process for the same poem will result in the same compression)?

For Students

  • How does text compression affect your daily life?
  • How many devices do you interact with during your day that utilize text compression to save memory?
  • How much data can your devices store?

For Teachers

  • Have students experiment with zip compression using text files with different contents. Are the results for small files as good as those for large files? (On Macs, in the Finder, choose “get info” for a file to see the actual number of bytes in the file, since the Finder display will show 4KB for any file that’s less than that.)
    • Important Note: Results may vary. Zip works really well for text, but it might not compress other files very well because they are already compressed or don’t have the same kinds of embedded patterns that text documents do.