ts_shape.features.stats.string_stats
¤
Classes:
-
StringStatistics
–Provides class methods to calculate statistics on string columns in a pandas DataFrame.
StringStatistics
¤
StringStatistics(dataframe: DataFrame, column_name: str = 'systime')
Bases: Base
Provides class methods to calculate statistics on string columns in a pandas DataFrame.
Parameters:
-
dataframe
¤DataFrame
) –The DataFrame to be processed.
-
column_name
¤str
, default:'systime'
) –The column to sort by. Default is 'systime'. If the column is not found or is not a time column, the class will attempt to detect other time columns.
Methods:
-
average_string_length
–Returns the average length of strings in the column, excluding null values.
-
contains_digit_count
–Counts how many strings contain digits.
-
contains_substring_count
–Counts how many strings contain the specified substring.
-
count_most_frequent
–Returns the count of the most frequent string in the column.
-
count_null
–Returns the number of null (NaN) values in the column.
-
count_unique
–Returns the number of unique strings in the column.
-
ends_with_count
–Counts how many strings end with the specified suffix.
-
get_dataframe
–Returns the processed DataFrame.
-
longest_string
–Returns the longest string in the column.
-
lowercase_percentage
–Returns the percentage of strings that are fully lowercase.
-
most_common_n_strings
–Returns the top N most frequent strings in the column.
-
most_frequent
–Returns the most frequent string in the column.
-
shortest_string
–Returns the shortest string in the column.
-
starts_with_count
–Counts how many strings start with the specified prefix.
-
string_length_summary
–Returns a summary of string lengths, including min, max, and average lengths.
-
summary_as_dataframe
–Returns a DataFrame with comprehensive string statistics for the specified column.
-
summary_as_dict
–Returns a dictionary with comprehensive string statistics for the specified column.
-
uppercase_percentage
–Returns the percentage of strings that are fully uppercase.
Source code in src/ts_shape/utils/base.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
average_string_length
classmethod
¤
Returns the average length of strings in the column, excluding null values.
Source code in src/ts_shape/features/stats/string_stats.py
31 32 33 34 |
|
contains_digit_count
classmethod
¤
Counts how many strings contain digits.
Source code in src/ts_shape/features/stats/string_stats.py
94 95 96 97 |
|
contains_substring_count
classmethod
¤
contains_substring_count(dataframe: DataFrame, substring: str, column_name: str = 'value_string') -> int
Counts how many strings contain the specified substring.
Source code in src/ts_shape/features/stats/string_stats.py
61 62 63 64 |
|
count_most_frequent
classmethod
¤
Returns the count of the most frequent string in the column.
Source code in src/ts_shape/features/stats/string_stats.py
20 21 22 23 24 |
|
count_null
classmethod
¤
Returns the number of null (NaN) values in the column.
Source code in src/ts_shape/features/stats/string_stats.py
26 27 28 29 |
|
count_unique
classmethod
¤
Returns the number of unique strings in the column.
Source code in src/ts_shape/features/stats/string_stats.py
10 11 12 13 |
|
ends_with_count
classmethod
¤
Counts how many strings end with the specified suffix.
Source code in src/ts_shape/features/stats/string_stats.py
71 72 73 74 |
|
get_dataframe
¤
get_dataframe() -> DataFrame
Returns the processed DataFrame.
Source code in src/ts_shape/utils/base.py
34 35 36 |
|
longest_string
classmethod
¤
Returns the longest string in the column.
Source code in src/ts_shape/features/stats/string_stats.py
36 37 38 39 |
|
lowercase_percentage
classmethod
¤
Returns the percentage of strings that are fully lowercase.
Source code in src/ts_shape/features/stats/string_stats.py
85 86 87 88 89 90 91 92 |
|
most_common_n_strings
classmethod
¤
Returns the top N most frequent strings in the column.
Source code in src/ts_shape/features/stats/string_stats.py
56 57 58 59 |
|
most_frequent
classmethod
¤
Returns the most frequent string in the column.
Source code in src/ts_shape/features/stats/string_stats.py
15 16 17 18 |
|
shortest_string
classmethod
¤
Returns the shortest string in the column.
Source code in src/ts_shape/features/stats/string_stats.py
41 42 43 44 |
|
starts_with_count
classmethod
¤
Counts how many strings start with the specified prefix.
Source code in src/ts_shape/features/stats/string_stats.py
66 67 68 69 |
|
string_length_summary
classmethod
¤
string_length_summary(dataframe: DataFrame, column_name: str = 'value_string') -> DataFrame
Returns a summary of string lengths, including min, max, and average lengths.
Source code in src/ts_shape/features/stats/string_stats.py
46 47 48 49 50 51 52 53 54 |
|
summary_as_dataframe
classmethod
¤
summary_as_dataframe(dataframe: DataFrame, column_name: str) -> DataFrame
Returns a DataFrame with comprehensive string statistics for the specified column.
Source code in src/ts_shape/features/stats/string_stats.py
120 121 122 123 124 |
|
summary_as_dict
classmethod
¤
Returns a dictionary with comprehensive string statistics for the specified column.
Source code in src/ts_shape/features/stats/string_stats.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
uppercase_percentage
classmethod
¤
Returns the percentage of strings that are fully uppercase.
Source code in src/ts_shape/features/stats/string_stats.py
76 77 78 79 80 81 82 83 |
|