Module: wordcloud¶
NOTE Brand new module! This has lots of random parameters that seem to make a good picture.
The hashtag tends to dominate the graph. I like that because it serves as like a title or anchoring word. But some folks want to see it without the hashtag itself dominating. So there's a config option hashtag_fix
that takes one of 3 values. (Default if omitted is as-is
). In this section, I show the same data set from Kung-Fu Saturday, 7 December 2024 visualized 3 different ways.
Alt Text Generation¶
As of version 1.2.0, the wordcloud module now automatically generates descriptive alt text for each wordcloud image. This alt text includes:
- The hashtag and date of the analysis
- Total number of unique words in the wordcloud
- Top 10 most frequent words with their counts
- Information about hashtag treatment method (as-is, remove, reduce)
- List of any custom stop words that were used
The alt text is saved to a text file with the same name as the wordcloud image but with a .txt
extension. For example, if the wordcloud is saved as wordcloud/wordcloud-monsterdon-20250409-as-is.png
, the alt text will be saved as wordcloud/wordcloud-monsterdon-20250409-as-is.txt
.
This feature makes the wordclouds more accessible and provides a quick summary of the key words from the visualization.
Custom Stop Words¶
You can exclude specific words from appearing in your wordcloud by adding a stop_words
parameter to the [wordcloud]
section of your INI file. This is particularly useful for filtering out common words that aren't meaningful to your analysis.
To use this feature:
- Add a
stop_words
parameter to the[wordcloud]
section of your INI file - Provide a comma-separated list of words to exclude
For example:
[wordcloud]
graph_title = Wordcloud
font = /path/to/font.otf
size_x = 1280
size_y = 960
hashtag_fix = remove
stop_words = movie, film, watching, watch, tonight, scene, scenes, actor, actors
These words will be excluded from the wordcloud in addition to the default stop words and any other configured exclusions. This is especially useful for event-specific hashtags where certain common words might dominate the visualization without adding meaningful information.
as-is
¶
Leave the hashtag alone.
remove
¶
Remove all instances of the hashtag
reduce
¶
Remove most (currently hard-coded at 90%) occurrences of the hashtag. It will still be popular enough to be quite large, but it won't dominate. In this example, "KungFuSat" is near the top right, in a dark purple.
Synopsis¶
mastoscore --debug=info ini/monsterdon-20241201.ini wordcloud
Creates a file named {journaldir}/wordcloud-{journalfile}.png
.
A Word about Emoji¶
While it is possible to make a word cloud that includes emoji, it's a bit complicated. See, it really boils down to the font and matplotlib's support for fonts. I think a lot of fancy word processing systems use multiple fonts (one for text, one for rendering symbols like emoji). But matplotlib needs a single font that has everything you want in it. The only one I have found like that is Symbola, which is OK, but the words themselves look pretty terrible. I think the right answer is probably to build emoji support into word_cloud itself to give it some emoji awareness and then use a different font for emojis. For now, I'm just dropping all emojis and punctuation.
Examples¶
Code Reference¶
Module to take the data in from analysis and produce graph files.
write_wordcloud(config)
¶
This is the only function, for now. It invokes get_toots_df()
to get the DataFrame. Then it discards basically everything other than the content
column.
I post-process to remove some weird things (there's lots of emoji-like things). I also remove the
hashtag itself, because it's obviously gonna have the highest frequency.
Parameters¶
- config: A ConfigParser object from the config module
Config Parameters Used¶
Option | Description |
---|---|
graph:journalfile |
Filename that forms the base of the graph's filename. |
graph:journaldir |
Directory where we will write the graph file |
fetch:hashtag |
Hashtag to search for |
wordcloud:font_path |
Path to fonts like Symbola |
wordcloud:hashtag_fix |
What to do with the main hashtag? 'reduce', 'remove', or 'as-is' |
wordcloud:size_x |
Size in pixels for the image. Default 1280 |
wordcloud:size_y |
Size in pixels for the image. Default 960 |
wordcloud:stop_words |
Comma-separated list of words to exclude |
mastoscore:event_year |
Year of the event (YYYY) |
mastoscore:event_month |
Month of the event (MM) |
mastoscore:event_day |
Day of the event (DD) |
Returns¶
None
Writes the graph to a file named wordcloud/wordcloud-hashtag-YYYYMMDD-hashtag_fix.png Writes alt text description to wordcloud/wordcloud-hashtag-YYYYMMDD-hashtag_fix.txt
Source code in mastoscore/wordcloud.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|